About Me
I'm Babulal Shaik, a seasoned Cloud & Kubernetes Engineer with deep expertise in AWS, Kubernetes (EKS), and DevOps. With over a decade of experience in cloud infrastructure, containerization, and site reliability engineering (SRE), I specialize in building and managing highly available, scalable, and secure cloud solutions.
Currently, at Amazon Web Services (AWS), I work as a Cloud Support Engineer II, helping customers design and optimize cloud architectures, troubleshoot complex issues, and enhance system reliability. My focus areas include Kubernetes (EKS), Terraform, AWS automation, and CI/CD pipelines. I have also led root cause analysis (RCA) for critical incidents, improved observability with Prometheus & Grafana, and implemented cost optimization strategies using Python-based solutions.
Beyond my AWS role, I have a strong Linux background, having worked as a Sr. Linux/Unix Systems Administrator at IBM, JP Morgan Chase, and Bank of America. My experience ranges from OS-level troubleshooting and network optimization to building highly available Linux environments.
I'm passionate about mentoring engineers, conducting AWS Kubernetes bootcamps, and sharing my expertise through technical training sessions. I hold multiple Kubernetes certifications (CKA, CKAD, CKS), AWS certifications, and a Prometheus Certified Associate (PCA) certification.
Whether it's designing fault-tolerant architectures, automating cloud infrastructure, or solving complex SRE challenges, I thrive in building robust, scalable, and cost-efficient solutions that empower businesses.












My recent blog posts...
My recent Medium Blog Posts...
Handling Black Friday Traffic: Setup with Kubernetes, Prometheus, HPA and Karpenter
Overview of designing a highly available system using Amazon EKS and complementary AWS Services
Ensuring 99.99% Uptime with EKS: A Comprehensive Failover Strategy Overview
Investigating the Side-Effects of AWS Load Balancer Controller Timeouts Due to API Server Throttling
Investigating AWS Load Balancer Controller Timeouts Due to API Server Throttling
Overview of handling traffic surge scenario in containers world