Python Docker images: choosing the right base for production applications

A comprehensive analysis of Python Docker image options for production environments, with practical guidance on image selection, plugin configuration, and performance optimisation.

Introduction

The following quote from FastAPI creator Sebastian Ramírez challenges a common Docker practice. His warning about Alpine images for Python projects initially surprised me, so I decided to test these claims in production environments. Here's what I discovered about optimizing Python container deployments.

THe quote in full:

In short: You probably shouldn't use Alpine for Python projects, instead use the slim Docker image versions.

Do you want more details? Continue reading point_down

Alpine is more useful for other languages where you build a static binary in one Docker image stage (using multi-stage Docker building) and then copy it to a simple Alpine image, and then just execute that binary. For example, using Go.

But for Python, as Alpine doesn't use the standard tooling used for building Python extensions, when installing packages, in many cases Python (pip) won't find a precompiled installable package (a "wheel") for Alpine. And after debugging lots of strange errors you will realize that you have to install a lot of extra tooling and build a lot of dependencies just to use some of these common Python packages. weary

This means that, although the original Alpine image might have been small, you end up with a an image with a size comparable to the size you would have gotten if you had just used a standard Python image (based on Debian), or in some cases even larger. exploding_head

And in all those cases, it will take much longer to build, consuming much more resources, building dependencies for longer, and also increasing its carbon footprint, as you are using more CPU time and energy for each build. deciduous_tree

If you want slim Python images, you should instead try and use the slim versions that are still based on Debian, but are smaller. nerd_face

Source

Selecting the optimal Python Docker image

When containerising Python applications, selecting the appropriate base image is a critical decision that affects build times, deployment efficiency, security posture, and runtime performance. While Alpine Linux has become popular for containerisation across many languages, its usage with Python deserves careful consideration.

In short: You probably shouldn't use Alpine for Python projects, instead use the slim Docker image versions.

This seemingly controversial statement from Sebastián Ramírez (creator of FastAPI) challenges conventional wisdom around Docker image optimisation. This article explores why this recommendation holds true for most Python applications and offers advanced guidance on selecting and configuring the optimal Python container environment.

The Alpine misconception

Alpine Linux's minimalist design and small footprint make it an attractive option for containerisation. Many developers instinctively reach for Alpine-based images assuming they'll achieve optimal efficiency. However, this approach often proves counterproductive for Python applications.

Alpine is more useful for other languages where you build a static binary in one Docker image stage (using multi-stage Docker building) and then copy it to a simple Alpine image, and then just execute that binary. For example, using Go.

The core issue stems from Alpine's use of musl libc instead of the more common glibc. This fundamental difference affects how Python packages with C extensions are compiled and installed.

Understanding the Python packaging ecosystem

To comprehend why Alpine presents challenges for Python applications, we must first understand how Python packages are distributed and installed.

The wheel mechanism

Python's packaging ecosystem relies heavily on wheels (.whl files) – pre-built binary distributions that allow for rapid installation without compilation. When a compatible wheel exists for your platform, pip can install it directly, avoiding the compilation process entirely.

The Python Package Index (PyPI) hosts wheels for popular platforms, primarily:

Windows (various versions)
macOS (various versions)
Linux using glibc (as used by Debian, Ubuntu, CentOS, etc.)

Notably absent from this list is Linux using musl libc (Alpine). When pip runs on Alpine, it frequently fails to find compatible wheels and must fall back to building from source.

But for Python, as Alpine doesn't use the standard tooling used for building Python extensions, when installing packages, in many cases Python (pip) won't find a precompiled installable package (a "wheel") for Alpine. And after debugging lots of strange errors you will realize that you have to install a lot of extra tooling and build a lot of dependencies just to use some of these common Python packages.

This compilation process presents several challenges:

Dependency hell – Building packages from source requires development tools and libraries that aren't included in the base Alpine image
Build failures – Packages may have assumptions about the build environment that don't hold true on Alpine
Extended build times – Compilation significantly increases image build duration
Larger final images – The tools required for compilation often remain in the final image unless carefully removed

Comparative analysis of Python Docker images

Let's examine the primary options for Python Docker images:

1. Standard Python images (`python:3.x`)

The default Python images use Debian as their base. These images include:

A complete Python installation
Common development tools
Libraries required for building extensions

Size: ~900MB-1GB Build speed: Fast (most packages have compatible wheels) Compatibility: Excellent Security: Good, with regular updates

2. Slim variants (`python:3.x-slim`)

These images also use Debian but strip out documentation, localisations, and non-essential packages.

Size: ~150-200MB Build speed: Generally fast (compatible wheels available) Compatibility: Excellent Security: Good, with regular updates

3. Alpine variants (`python:3.x-alpine`)

Based on Alpine Linux with a minimal footprint.

Size: ~45-60MB (base image) Build speed: Often slow (requires building from source) Compatibility: Problematic with many packages Security: Good, with regular updates Final size after dependencies: Often comparable to slim variants

This means that, although the original Alpine image might have been small, you end up with a an image with a size comparable to the size you would have gotten if you had just used a standard Python image (based on Debian), or in some cases even larger.

Performance and environmental impact

The build performance differences between these images translate to practical implications beyond mere convenience:

And in all those cases, it will take much longer to build, consuming much more resources, building dependencies for longer, and also increasing its carbon footprint, as you are using more CPU time and energy for each build.

These considerations are especially relevant for:

CI/CD pipelines where builds occur frequently
Development environments with iterative container rebuilds
Organisations with sustainability commitments

Optimising Python Docker images for production

Having established that slim variants typically offer the best balance for Python applications, let's explore advanced techniques for optimising these images for production use.

Multi-stage builds

Multi-stage builds allow you to use one image for building and another for running your application. This approach enables:

Installing build-time dependencies only in the build stage
Copying only the necessary files to the runtime image
Reducing the attack surface of the final image

# Build stage
FROM python:3.11-slim AS builder

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip wheel --no-cache-dir --no-deps --wheel-dir /app/wheels -r requirements.txt

# Runtime stage
FROM python:3.11-slim

WORKDIR /app

# Create a non-root user
RUN useradd -m appuser && \
    chown -R appuser:appuser /app

# Copy only the built wheels and install
COPY --from=builder /app/wheels /app/wheels
COPY --from=builder /app/requirements.txt .
RUN pip install --no-cache /app/wheels/*

# Copy application code
COPY --chown=appuser:appuser . .

USER appuser

CMD ["python", "main.py"]

Managing package versions with pip-tools

The pip-tools package provides reliable dependency pinning and management. This approach ensures reproducible builds and prevents unexpected changes.

Create a requirements.in file with your direct dependencies:

flask==3.0.1
sqlalchemy
psycopg2-binary

Generate a fully pinned requirements.txt:

pip-compile requirements.in

Use the pinned requirements in your Dockerfile:

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

Setting up useful plugins and extensions

Advanced Python Docker setups often benefit from additional tools that enhance debugging, monitoring, and performance. Here are some valuable additions for production-ready containers:

1. Configuring Python's GC monitoring

Python's garbage collection can be monitored and tuned using the gc module. Creating a simple plugin to expose GC statistics can provide valuable insights:

# gc_monitor.py
import gc
import time
import threading
import json
from pathlib import Path

class GCMonitor:
    def __init__(self, interval=60, output_dir="/app/metrics"):
        self.interval = interval
        self.output_dir = Path(output_dir)
        self.output_dir.mkdir(exist_ok=True)
        self.running = False

    def start(self):
        self.running = True
        threading.Thread(target=self._monitor_loop, daemon=True).start()

    def _monitor_loop(self):
        while self.running:
            stats = {
                "collected": gc.get_count(),
                "thresholds": gc.get_threshold(),
                "objects": len(gc.get_objects()),
                "timestamp": time.time()
            }

            with open(self.output_dir / "gc_stats.json", "w") as f:
                json.dump(stats, f)

            time.sleep(self.interval)

To use this monitor, add it to your application's startup:

from gc_monitor import GCMonitor

# Start GC monitoring
monitor = GCMonitor()
monitor.start()

2. Configuring APM with Python agent

For production monitoring, Application Performance Monitoring (APM) tools provide invaluable insights. The Elastic APM Python agent offers a lightweight solution:

# Add to your Dockerfile
RUN pip install elastic-apm[flask]

Then configure in your Flask application:

from elasticapm.contrib.flask import ElasticAPM

def create_app():
    app = Flask(__name__)

    app.config['ELASTIC_APM'] = {
        'SERVICE_NAME': 'your-service-name',
        'SERVER_URL': os.environ.get('APM_SERVER_URL', 'http://apm-server:8200'),
        'ENVIRONMENT': os.environ.get('FLASK_ENV', 'production'),
    }

    apm = ElasticAPM(app)

    # Rest of your app configuration
    return app

3. Setting up Python profiling with py-spy

For on-demand profiling without modifying your application code, py-spy provides a powerful solution that can be included in your container:

# Install py-spy in your Dockerfile
RUN apt-get update && apt-get install -y --no-install-recommends \
    procps \
    && rm -rf /var/lib/apt/lists/* \
    && pip install py-spy

With this setup, you can run profiling commands when needed:

# From inside the container or via docker exec
py-spy record -o profile.svg --pid 1

Security considerations

When deploying Python containers to production, security must be a priority:

Run as non-root user: Always configure your container to run as a non-privileged user
Pin package versions: Use exact versions for all dependencies to prevent supply chain attacks
Regular updates: Establish a process for updating base images and dependencies
Image scanning: Implement automated vulnerability scanning in your CI/CD pipeline
Minimal images: Include only what's necessary for your application to run

Conclusion

If you want slim Python images, you should instead try and use the slim versions that are still based on Debian, but are smaller.

This recommendation from the source material aligns with our comprehensive analysis. While Alpine images appear attractive initially, the practical challenges they present for Python applications typically outweigh their benefits.

For production Python applications:

Start with python:3.x-slim as your base image
Use multi-stage builds to separate build and runtime concerns
Implement proper dependency management with pip-tools or similar
Configure monitoring and profiling tools appropriate for your environment
Follow security best practices for container deployment

By following these guidelines, you'll achieve a balance of performance, security, and maintainability that serves your Python applications well in production environments.