Chaos Engineering for Cloud Native Kafka

Kafka is one of the most common middle ware applications needed to deploy cloud native applications. Kafka provides the required reliable communication capability among the service components or applications. Reliability of cloud native services built upon Kafka needs to be continuously validated. Common messaging broker faults such as broker outages, zookeeper outages have to be considered in the chaos testing of the cloud native services. Network latencies also figure as a common cause for many issues around not meeting the SLOs.

Use Cases

  • Continuous validation of SLOs when message brokers fail

  • Continuous validation of cloud native applications on QA test beds against Kafka failures

  • Pre or post CD verification of cloud native applications against Kafka failures

  • Test service reliability under load against Kafka failures

  • Hardening of Kubernetes conformance tests against Kafka faults

kafka-usecase

Some of the chaos experiments available for kafka:

Chaos Experiment

Description

Tunable Parameters

Kafka broker kill

Description

This chaos experiment kills (random or specified) Kafka broker pods.

Tunables

Necessary Inputs:

  • Kafka Namespace

  • Kafka Workload Label

  • Kafka Service

  • Zookeeper Namespace

  • Zookeeper Workload Label


Tunable Inputs:

  • Kafka Port

  • Kafka Instance Name

  • Pod Delete Method (Force /Graceful)

  • Kafka Topic Replication Factor

  • Kafka Test Load (Enabled /Disabled)


Kafka broker disk failure

Description

Detaching a persistent disk from a node /instance for Kafka

Tunables

Necessary Inputs:

  • Kafka Namespace

  • Kafka Workload Label

  • Kafka Service

  • Zookeeper Namespace

  • Zookeeper Workload Label


Tunable Inputs:

  • Kafka Port

  • Kafka Instance Name

  • Pod Delete Method (Force /Graceful)

  • Kafka Topic Replication Factor

  • Kafka Test Load (Enabled /Disabled)

  • Kafka Consumer Timeout

  • Cloud Platform


Kubernetes node kill

Description

Drains the resources in the Kubernetes node on which the Kafka deployment has been made

Tunables

Necessary Inputs:

  • Target Node

  • Node Label


Tunable Inputs:

  • Application Under Test Namespace

  • Application Under Test Label

  • Application Under Test Kind (Deployment /StatefulSet /ReplicaSet etc.)


Network slowness

Description

Causes a network latency to be introduced for the Kafka deployment pods for a certain duration

Tunables

Necessary Inputs:

  • Target Pods

  • Pods Affected Percentage

  • Destination IPs

  • Destination Hosts

  • Network Latency


Tunable Inputs:

  • Application Under Test Namespace

  • Application Under Test Label

  • Application Under Test Kind

  • Network Packet Duplication Percentage

  • Network Packet Loss Percentage

  • Network Packet Corruption Percentage

  • Jitter

  • Container Runtime


Chaos Engineering for cloud native Kafka

Get a demo
Chaos Engineering for cloud native Kafka