Handling Black Friday Traffic with Kubernetes, Prometheus, and HPA

Introduction

Black Friday is one of the busiest shopping days of the year, and for online platforms, it means an exponential surge in traffic. Handling this kind of load efficiently requires a well-architected system that can dynamically scale while ensuring high availability and low latency. In this post, I’ll share how I successfully managed Black Friday traffic using Amazon EKS, Prometheus, Grafana, and Horizontal Pod Autoscaler (HPA).

The Challenge

As Black Friday approached, our biggest concerns were:

  • Handling traffic spikes without performance degradation.

  • Ensuring auto-scaling worked efficiently to prevent overprovisioning or underprovisioning resources.

  • Monitoring system health in real time to quickly identify bottlenecks.

  • Optimizing cost while maintaining a seamless user experience.

To tackle these challenges, I set up Prometheus for monitoring, Grafana for visualization, HPA for automatic scaling based on custom metrics, Karpenter for node autoscaling.

The Solution

Step 1: Deploying Prometheus & Grafana on EKS

Example:

  • grafana url: kubectl port-forward svc/prometheus-grafana 3000:80 -n monitoring

  • prom url: kubectl port-forward svc/prometheus-grafana 3000:80 -n monitoring

Step 2: Exposing Custom Metrics for Auto-Scaling

#app.py

from flask import Flask

import random

import time

import os

import subprocess

import prometheus_client

from prometheus_client import Counter, Gauge, generate_latest

app = Flask(__name__)

# Define Prometheus metrics

REQUEST_COUNT = Counter("http_requests_total", "Total HTTP requests", ["method", "endpoint"])

QUEUE_DEPTH = Gauge("queue_depth", "Simulated queue depth")

MEMORY_USAGE = Gauge("process_memory_bytes", "Memory usage of the process")

@app.route("/")

def hello():

REQUEST_COUNT.labels(method="GET", endpoint="/").inc()

# Simulating queue depth fluctuation

queue_size = random.randint(0, 10)

QUEUE_DEPTH.set(queue_size)

# Safe memory usage retrieval

try:

mem_usage_output = subprocess.check_output(["ps", "-o", "rss=", "-p", str(os.getpid())])

memory_usage = int(mem_usage_output.decode().strip()) * 1024 # Convert KB to bytes

except Exception as e:

print(f"Warning: Unable to retrieve memory usage: {e}")

memory_usage = 0 # Set a default value

MEMORY_USAGE.set(memory_usage)

return "Hello from Flask App", 200

@app.route("/metrics")

def metrics():

MEMORY_USAGE.set(0) # Ensure at least one metric exists

return generate_latest(), 200, {'Content-Type': 'text/plain'}

if name == "__main__":

app.run(host="0.0.0.0", port=5000)

#Dockerfile

FROM python:3.9-slim

# Install procps (includes `ps` command)

RUN apt-get update && apt-get install -y procps

WORKDIR /app

COPY requirements.txt requirements.txt

RUN pip install -r requirements.txt

COPY app.py app.py

CMD ["python", "app.py"]

#requirements.txt

flask

prometheus_client

#Deployment.yaml

apiVersion: apps/v1

kind: Deployment

metadata:

name: myapp

labels:

app: myapp

spec:

replicas: 3 # Start with 3 replicas, HPA will scale it

selector:

matchLabels:

app: myapp

template:

metadata:

labels:

app: myapp

spec:

containers:

- name: myapp

image: unixcorn/eks-metrics-app:latest # Replace with your actual image

ports:

- containerPort: 5000

imagePullPolicy: Always

resources:

requests:

cpu: "100m"

memory: "128Mi"

limits:

cpu: "500m"

memory: "256Mi"

livenessProbe:

httpGet:

path: /

port: 5000

initialDelaySeconds: 3

periodSeconds: 10

readinessProbe:

httpGet:

path: /

port: 5000

initialDelaySeconds: 5

periodSeconds: 5

#svc.yaml

apiVersion: v1

kind: Service

metadata:

name: myapp

labels:

app: myapp

spec:

selector:

app: myapp

ports:

- name: http

protocol: TCP

port: 80

targetPort: 5000

type: ClusterIP

#service-monitor.yaml

apiVersion: monitoring.coreos.com/v1

kind: ServiceMonitor

metadata:

name: myapp-servicemonitor

namespace: monitoring

labels:

release: prometheus

spec:

selector:

matchLabels:

app: myapp

namespaceSelector:

matchNames:

- default

endpoints:

- port: http

path: /metrics

interval: 10s

% k get po -o wide

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES

myapp-67db6459f8-2k92g 1/1 Running 0 55m 10.0.137.27 ip-10-0-143-51.ec2.internal <none> <none>

myapp-67db6459f8-b9pd9 1/1 Running 0 55m 10.0.129.82 ip-10-0-130-142.ec2.internal <none> <none>

myapp-67db6459f8-cmdxl 1/1 Running 0 55m 10.0.156.32 ip-10-0-144-66.ec2.internal <none> <none>

Now check if you are able to see custom metrics in Grafana dashborad using the following PROMQL queries:

  • sum(rate(http_requests_total{job="myapp"}[1m])) by (pod)

    • Set the visualization type to "Graph" or "Time series"

  • avg(queue_depth{job="myapp"})

    • A "Gauge" visualization can effectively represent this metric.

  • sum(container_memory_usage_bytes{container="myapp"}) by (pod)

    • Visualize this using a "Graph" or "Time series" panel, setting the unit to "bytes" for clarity.

Step3: Install and configure HPA Config

  • Install metrics server

  • Install Prometheus Adapter for custom metrics

    • helm install prometheus-adapter prometheus-community/prometheus-adapter --namespace monitoring

  • % kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .

    {

    "kind": "APIResourceList",

    "apiVersion": "v1",

    "groupVersion": "custom.metrics.k8s.io/v1beta1",

    "resources": []

    }

  • I dont see custom metrics, so fix prometheus-adapter configmap which should have your custom metrics passed:

  • apiVersion: v1

    data:

    config.yaml: |

    rules:

    - seriesQuery: '{__name__=~"^container_.*",container!="POD",namespace!="",pod!=""}'

    seriesFilters: []

    resources:

    template: "<<.Resource>>"

    name:

    matches: ^container_(.*)_seconds_total$

    as: ""

    metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container!="POD"}[5m])) by (<<.GroupBy>>)

    - seriesQuery: '{__name__=~"^container_.*",container!="POD",namespace!="",pod!=""}'

    seriesFilters:

    - isNot: ^container_.*_seconds_total$

    resources:

    template: "<<.Resource>>"

    name:

    matches: ^container_(.*)_total$

    as: ""

    metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container!="POD"}[5m])) by (<<.GroupBy>>)

    - seriesQuery: '{namespace!="",__name__!~"^container_.*"}'

    seriesFilters:

    - isNot: .*_total$

    resources:

    template: "<<.Resource>>"

    name:

    matches: ""

    as: ""

    metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)

    - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'

    resources:

    template: "<<.Resource>>"

    name:

    matches: "http_requests_total"

    as: "http_requests_per_second"

    metricsQuery: 'sum(rate(http_requests_total{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)'

    - seriesQuery: 'queue_depth{namespace!="",pod!=""}'

    resources:

    template: "<<.Resource>>"

    name:

    matches: "queue_depth"

    as: "queue_depth_value"

    metricsQuery: 'avg(queue_depth{<<.LabelMatchers>>}) by (<<.GroupBy>>)'

    - seriesQuery: 'process_memory_bytes{namespace!="",pod!=""}'

    resources:

    template: "<<.Resource>>"

    name:

    matches: "process_memory_bytes"

    as: "memory_usage_bytes"

    metricsQuery: 'sum(process_memory_bytes{<<.LabelMatchers>>}) by (<<.GroupBy>>)'

    kind: ConfigMap

    metadata:

    name: prometheus-adapter

    namespace: monitoring

% kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq | grep -i pods | egrep -i 'http_requests_per_second|queue_depth_value|memory_usage_bytes'

"name": "pods/http_requests_per_second",

"name": "pods/queue_depth_value",

"name": "pods/memory_usage_bytes",

% kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/http_requests_per_second" | jq .

  • Now define HPA config

apiVersion: autoscaling/v2

kind: HorizontalPodAutoscaler

metadata:

name: myapp-hpa

namespace: default

spec:

scaleTargetRef:

apiVersion: apps/v1

kind: Deployment

name: myapp

minReplicas: 3

maxReplicas: 50

metrics:

- type: Pods

pods:

metric:

name: http_requests_per_second

target:

type: AverageValue

averageValue: 300m

- type: Pods

pods:

metric:

name: queue_depth_value

target:

type: AverageValue

averageValue: 5

- type: Pods

pods:

metric:

name: memory_usage_bytes

target:

type: AverageValue

averageValue: 200Mi

% k get hpa

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE

myapp-hpa Deployment/myapp 286m/300m, 3500m/5 + 1 more... 3 50 8 8m26s

By looking at the above output we can confirm that HPA is able to read custom metrics and able to scale replicas of myapp deployment to 8 which was initially only 3.