Virtual Machine Setup (Pre-Migration)
- 96 cores per node
-
4 cores for KVM + VPC + Host OS + H/W failure detection agents
- 2 cores for guest OS + kubelet + containerd
-
2.6 cores for daemon set (cilium), node exporter, logging agent,
node problem detector
- 87 cores available for user applications
This setup shows the overhead introduced by virtualization, with
nearly 10% of cores dedicated to virtualization-related tasks.
Bare Metal Setup (Post-Migration)
- 2 cores for host OS
-
2.6 cores for platform-owned workloads (cilium, node exporter,
logging agent, node problem detector)
-
Removed redundant components: VPC, H/W failure detector, VM
orchestrator (saving 4 cores per node)
The bare metal setup significantly reduces overhead, potentially
allowing for more efficient use of hardware resources and improved
performance.
Network Optimization
Flipkart implemented VPC stack for network quality improvement,
utilizing Cilium for bandwidth optimization.
Cilium provides advanced networking capabilities, including enhanced
security and observability, which can be crucial for large-scale
deployments like Flipkart's.
Performance Benchmarking
- Used aerospike ycsb pod for benchmarking (16GB and 16 cores)
-
Compared VHost-net vs SRU VF configured as PCI pass-through device
PCI pass-through can significantly reduce network latency by allowing
direct communication between the NIC and the application, bypassing
the host kernel.
Key Flipkart Services
-
Atina service: 250 instances (product display service), 16 cores and
35 GB each
- Mappy service
These services likely handle critical e-commerce functionalities,
benefiting from the improved performance of bare metal deployments.
Performance Insights
-
Virtualization overhead is more significant for I/O intensive apps
compared to CPU-bound apps
-
aRFS (Accelerated Receive Flow Steering) breaks for a high number of
threads
This observation highlights the importance of tailoring infrastructure
choices to specific application needs, especially for high-traffic
e-commerce platforms.
Tools and Projects
-
Tinker Bell project: Open-Source Bare Metal Provisioning Engine
-
Cluster API project: Kubernetes-native API for managing clusters
across various environments, including bare metal
- https://github.com/tinkerbell
- https://github.com/kubernetes-sigs/cluster-api