Container Debug技能使用说明
2026-03-28
新闻来源:网淘吧
围观:14
电脑广告
手机广告
容器调试
调试正在运行的 Docker 容器和 Compose 服务。涵盖日志、执行、网络、资源检查、多阶段构建、健康检查及常见故障模式。
何时使用
- 容器启动后立即退出或崩溃
- 容器内应用程序行为与宿主机环境不一致
- 容器间无法通信
- 容器占用过多内存或CPU资源
- 多阶段Docker构建产生意外结果
- 健康检查持续失败
- Compose服务启动顺序错误或无法连接
容器日志
查看与筛选日志
# Last 100 lines
docker logs --tail 100 my-container
# Follow (stream) logs
docker logs -f my-container
# Follow with timestamps
docker logs -f -t my-container
# Logs since a time
docker logs --since 30m my-container
docker logs --since "2026-02-03T10:00:00" my-container
# Logs between times
docker logs --since 1h --until 30m my-container
# Compose: logs for all services
docker compose logs -f
# Compose: logs for specific service
docker compose logs -f api db
# Redirect logs to file for analysis
docker logs my-container > container.log 2>&1
# Separate stdout and stderr
docker logs my-container > stdout.log 2> stderr.log
检查日志驱动配置
# Check what log driver a container uses
docker inspect --format='{{.HostConfig.LogConfig.Type}}' my-container
# If json-file driver, find the actual log file
docker inspect --format='{{.LogPath}}' my-container
# Check log file size
ls -lh $(docker inspect --format='{{.LogPath}}' my-container)
进入容器执行
交互式终端
# Bash (most common)
docker exec -it my-container bash
# If bash isn't available (Alpine, distroless)
docker exec -it my-container sh
# As root (even if container runs as non-root user)
docker exec -u root -it my-container bash
# With specific environment variables
docker exec -e DEBUG=1 -it my-container bash
# Run a single command (no interactive shell)
docker exec my-container cat /etc/os-release
docker exec my-container ls -la /app/
docker exec my-container env
调试已崩溃容器
# Container exited? Check exit code
docker inspect --format='{{.State.ExitCode}}' my-container
docker inspect --format='{{.State.Error}}' my-container
# Common exit codes:
# 0 = clean exit
# 1 = application error
# 137 = killed (OOM or docker kill) — 128 + signal 9
# 139 = segfault — 128 + signal 11
# 143 = terminated (SIGTERM) — 128 + signal 15
# Start a stopped container to debug it
docker start -ai my-container
# Or override the entrypoint to get a shell
docker run -it --entrypoint sh my-image
# Copy files out of a stopped container
docker cp my-container:/app/error.log ./error.log
docker cp my-container:/etc/nginx/nginx.conf ./nginx.conf
无shell环境调试(精简/scratch镜像)
# Use docker cp to extract files
docker cp my-container:/app/config.json ./
# Use nsenter to get a shell in the container's namespace (Linux)
PID=$(docker inspect --format='{{.State.Pid}}' my-container)
nsenter -t $PID -m -u -i -n -p -- /bin/sh
# Attach a debug container to the same namespace
docker run -it --pid=container:my-container --net=container:my-container busybox sh
# Docker Desktop: use debug extension
docker debug my-container
网络调试
检查容器网络配置
# Show container IP address
docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' my-container
# Show all network details
docker inspect -f '{{json .NetworkSettings.Networks}}' my-container | jq
# List all networks
docker network ls
# Inspect a network (see all connected containers)
docker network inspect bridge
docker network inspect my-compose-network
# Show port mappings
docker port my-container
测试容器间的连接性
# From inside container A, reach container B
docker exec container-a ping container-b
docker exec container-a curl http://container-b:8080/health
# DNS resolution inside container
docker exec my-container nslookup db
docker exec my-container cat /etc/resolv.conf
docker exec my-container cat /etc/hosts
# Test if port is reachable
docker exec my-container nc -zv db 5432
docker exec my-container wget -qO- http://api:3000/health
# If curl/ping not available in container, install or use a debug container:
docker run --rm --network container:my-container curlimages/curl curl -s http://localhost:8080
常见的网络问题
# "Connection refused" between containers
# 1. Check the app binds to 0.0.0.0, not 127.0.0.1
docker exec my-container netstat -tlnp
# If listening on 127.0.0.1 — fix the app config
# 2. Check containers are on the same network
docker inspect -f '{{json .NetworkSettings.Networks}}' container-a | jq 'keys'
docker inspect -f '{{json .NetworkSettings.Networks}}' container-b | jq 'keys'
# 3. Check published ports vs exposed ports
# EXPOSE only documents, it doesn't publish
# Use -p host:container to publish
# "Name not found" — DNS not resolving container names
# Container names resolve only on user-defined networks, NOT the default bridge
docker network create my-net
docker run --network my-net --name api my-api-image
docker run --network my-net --name db postgres
# Now "api" and "db" resolve to each other
捕获网络流量
# tcpdump inside a container
docker exec my-container tcpdump -i eth0 -n port 8080
# If tcpdump not available, use a sidecar
docker run --rm --net=container:my-container nicolaka/netshoot tcpdump -i eth0 -n
# netshoot has: tcpdump, curl, nslookup, netstat, iperf, etc.
docker run --rm --net=container:my-container nicolaka/netshoot bash
资源使用情况
实时统计信息
# All containers
docker stats
# Specific containers
docker stats api db redis
# One-shot (no streaming)
docker stats --no-stream
# Formatted output
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}"
内存问题排查
# Check memory limit
docker inspect --format='{{.HostConfig.Memory}}' my-container
# 0 means unlimited
# Check if container was OOM-killed
docker inspect --format='{{.State.OOMKilled}}' my-container
# Memory usage breakdown (Linux cgroups)
docker exec my-container cat /sys/fs/cgroup/memory.current 2>/dev/null || \
docker exec my-container cat /sys/fs/cgroup/memory/memory.usage_in_bytes
# Process memory inside container
docker exec my-container ps aux --sort=-%mem | head -10
docker exec my-container top -bn1
磁盘使用情况
# Overall Docker disk usage
docker system df
docker system df -v
# Container filesystem size
docker inspect --format='{{.SizeRw}}' my-container
# Find large files inside container
docker exec my-container du -sh /* 2>/dev/null | sort -rh | head -10
docker exec my-container find /tmp -size +10M -type f
# Check for log file bloat
docker exec my-container ls -lh /var/log/
Dockerfile 调试
多阶段构建调试
# Build up to a specific stage
docker build --target builder -t my-app:builder .
# Inspect what's in the builder stage
docker run --rm -it my-app:builder sh
docker run --rm my-app:builder ls -la /app/
docker run --rm my-app:builder cat /app/package.json
# Check which files made it to the final image
docker run --rm my-image ls -laR /app/
# Build with no cache (fresh build)
docker build --no-cache -t my-app .
# Build with progress output
docker build --progress=plain -t my-app .
镜像检查
# Show image layers (size of each)
docker history my-image
docker history --no-trunc my-image
# Inspect image config (entrypoint, cmd, env, ports)
docker inspect my-image | jq '.[0].Config | {Cmd, Entrypoint, Env, ExposedPorts, WorkingDir}'
# Compare two images
docker history image-a --format "{{.Size}}\t{{.CreatedBy}}" > layers-a.txt
docker history image-b --format "{{.Size}}\t{{.CreatedBy}}" > layers-b.txt
diff layers-a.txt layers-b.txt
# Find what changed between builds
docker diff my-container
# A = added, C = changed, D = deleted
健康检查
定义和调试健康检查
# In Dockerfile
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1
# Check health status
docker inspect --format='{{.State.Health.Status}}' my-container
# "healthy", "unhealthy", or "starting"
# See health check log (last 5 results)
docker inspect --format='{{json .State.Health}}' my-container | jq
# Run health check manually
docker exec my-container curl -f http://localhost:8080/health
# Override health check at run time
docker run --health-cmd "curl -f http://localhost:8080/health || exit 1" \
--health-interval 10s my-image
# Disable health check
docker run --no-healthcheck my-image
Docker Compose 调试
服务启动问题
# Check service status
docker compose ps
# See why a service failed
docker compose logs failed-service
# Start with verbose output
docker compose up --build 2>&1 | tee compose.log
# Start a single service (with dependencies)
docker compose up db
# Start without dependencies
docker compose up --no-deps api
# Recreate containers from scratch
docker compose up --force-recreate --build
# Check effective config (after variable substitution)
docker compose config
服务依赖关系和启动顺序
# docker-compose.yml
services:
api:
depends_on:
db:
condition: service_healthy
redis:
condition: service_started
db:
image: postgres:16
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
redis:
image: redis:7
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 5s
retries: 5
# Wait for a service to be healthy before running commands
docker compose up -d db
docker compose exec db pg_isready # Polls until ready
docker compose up -d api
清理
# Remove stopped containers
docker container prune
# Remove unused images
docker image prune
# Remove everything unused (containers, images, networks, volumes)
docker system prune -a
# Remove volumes too (WARNING: deletes data)
docker system prune -a --volumes
# Remove dangling build cache
docker builder prune
提示
docker logs -f是首要检查项。大多数容器故障都可在日志中看到。- 退出码 137 表示因内存不足被终止。请增加内存限制或修复内存泄漏。
- 容器内的应用程序必须绑定到
0.0.0.0不127.0.0.1容器内的本地主机是隔离的。 - 容器名称仅通过DNS在用户定义的网络上解析,而不是在默认的
桥接网络上。对于多容器设置,请始终创建一个自定义网络。 docker exec仅适用于正在运行的容器。对于崩溃的容器,请使用docker cp来提取日志,或者使用docker run --entrypoint sh来覆盖入口点。nicolaka/netshoot是容器网络问题的瑞士军刀。它预装了所有网络工具。--progress=plain在构建过程中显示完整的命令输出,这对于调试构建失败至关重要。- 带有
start-period的健康检查可以防止在应用程序启动缓慢期间出现虚假的不健康状态。
文章底部电脑广告
手机广告位-内容正文底部


微信扫一扫,打赏作者吧~