Loading Large Images in Minikube

Posted on January 4, 2026 by John Humphreys

One thing that can be a little surprising when you use minikube with the docker driver, is that it doesn’t actually use the images from your local docker when it runs. It actually runs with containerd.

Your kubernetes node is “running in your docker” (KIC / Kubernetes-in-Container), but that node uses containerd and its own registry when running pods itself.

Why are large images a problem?

When you go to use an image, you have to do this to move it from your docker registry to your minikube registry.

minikube image load appname:tagname 2>&1

This operation basically does a docker-image-save to a large tar file, copies that file into the container running your kubernetes, and unpacks it into containerd. This is very expensive on memory; it can freeze WSL2 (ubuntu) on my high end laptop on a 8GB image. If you are finding the image load command hangs your WSL, try opening two terminals and use htop to watch the memory usage as the command runs; you’ll see it soak up far more than you would expect, and then it likely starts swapping/etc and slowing everything down until it is unusable.

So… How do we use big images?

Building big images isn’t much of a problem as it happens in a more efficient way. So, the solution is to build directly into minkube instead of building into docker and importing to minikube.

minikube image build -t appname:tagname -f Dockerfile .

With this change, the image builds into minikube quite easily and is readily useable.

Migrate DockerHub Images to GitLab : Script

Posted on May 10, 2023 by John Humphreys

#!/bin/bash

CRED="<your-rw-deploy-token-for-the-gitlab-project>"

# Change these for the target image / group.
DOCKERHUB_IMAGE="openjdk:11-jre-slim"
GITLAB_REGISTRY="registry.gitlab.com/your/project/path"

# Calculate the GitLab image name.
GITLAB_IMAGE="$GITLAB_REGISTRY/$DOCKERHUB_IMAGE"

# Pull the image from DockerHub
docker pull $DOCKERHUB_IMAGE

# Tag the image with the GitLab Container Registry path.
docker tag $DOCKERHUB_IMAGE $GITLAB_IMAGE

# Push the image to the GitLab Container Registry.
docker login registry.gitlab.com -u unused -p $CRED 
docker push $GITLAB_IMAGE

You can generate deploy tokens (R/W) in your project settings. Group level tokens will let this operate across multiple projects in a group.

Sorting S3 Buckets by Size

Posted on May 5, 2023 by John Humphreys

It can be fairly hard to rank your s3 buckets by size, especially with intelligent tiering on. Here is a concise script to find all bucket sizes in your account using cloudwatch metrics, that will output the top 10 in sorted order.

import boto3
import pandas as pd
from datetime import datetime, timedelta
import logging

# Configure logging
logging.basicConfig(format='%(asctime)s %(levelname)s: %(message)s', level=logging.INFO)

# Connect to CloudWatch
cloudwatch = boto3.client('cloudwatch')

# Connect to S3
s3 = boto3.resource('s3')

# Define a function to get the BucketSizeBytes metric data for a given bucket and storage type
def get_metric_data(bucket, storage_type):
    response = cloudwatch.get_metric_statistics(
        Namespace='AWS/S3',
        MetricName='BucketSizeBytes',
        Dimensions=[
            {'Name': 'BucketName', 'Value': bucket},
            {'Name': 'StorageType', 'Value': storage_type}
        ],
        StartTime=datetime.utcnow() - timedelta(days=3),
        EndTime=datetime.utcnow(),
        Period=86400,
        Statistics=['Maximum']
    )
    datapoints = response['Datapoints']
    if datapoints:
        return max([datapoint['Maximum'] for datapoint in datapoints])
    else:
        return 0

# Log before pulling the list of bucket names
logging.info("Getting list of bucket names...")

# Get all buckets in the account
buckets = [bucket.name for bucket in s3.buckets.all()]

# Prepare the MetricDataQueries for all the metrics
metric_data_queries = []
for bucket in buckets:
    logging.info(f"Working on bucket: {bucket}...")
    metric_data_queries.append(get_metric_data(bucket, 'StandardStorage'))
    metric_data_queries.append(get_metric_data(bucket, 'IntelligentTieringIAStorage'))
    metric_data_queries.append(get_metric_data(bucket, 'IntelligentTieringFAStorage'))
    metric_data_queries.append(get_metric_data(bucket, 'IntelligentTieringAIAStorage'))

# Parse the MetricData and sum up the bucket sizes
bucket_sizes = {}
for i in range(0, len(metric_data_queries), 4):
    bucket = buckets[i // 4]
    total_size = sum(metric_data_queries[i:i+4])
    bucket_sizes[bucket] = total_size

# Convert the results to a Pandas dataframe and display without truncation
df = pd.DataFrame.from_dict(bucket_sizes, orient='index', columns=['Size (Bytes)'])
df['Size (TBs)'] = df['Size (Bytes)'] / (1024 ** 4)
df = df[['Size (TBs)']].sort_values(by='Size (TBs)', ascending=False).head(10)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)
pd.set_option('display.width', None)
pd.set_option('display.float_format', '{:.2f}'.format)
print(df)

Kubectl – View Pods Per Node in Kubernetes

Posted on December 19, 2022 by John Humphreys

You can use this command to view how many pods are on each node in Kubernetes using just kubect.

kubectl get pods -A -o=custom-columns=NODE:.spec.nodeName | sort | uniq -c | sort -n

In our case, we have a limit of 25 pods per node, so we have daemon sets fail to roll out if nodes already have 25 pods. So, this is helpful.

It can also be helpful when decoming nodes as you track the removal of pods from them.

Coding Stream of Consciousness

by John Humphreys – Random code from my life.

Category Archives: general