☰ See All Chapters |
How to write records to kafka partitions
There are three ways of determining the kafka partition to which a record should go.
Utilizing a partition key to designate the target partition
Permitting Kafka to determine the appropriate partition
Creating a custom partitioner implementation
Utilizing a partition key to designate the target partition
A producer has the capability to employ a partition key in order to route messages to a particular partition. This partition key can encompass any value that can be derived from the application's context. Utilizing a distinct device ID or a user ID serves as a suitable partition key.
By default, the partition key undergoes processing via a hashing function, resulting in the assignment of a specific partition. This mechanism guarantees that all records produced with identical keys will be directed to the same partition. The inclusion of a partition key facilitates the grouping of related events within a single partition, ensuring they maintain their original order of transmission.
For instance, if you aim to write student records and employee records to distinct partitions within Kafka, you can establish a function within your application to accomplish this task. This function will be responsible for generating a partition key derived from the attributes of the records. Subsequently, the partition key will be utilized to determine the appropriate partition assignment.
Let's consider a simplified example in Java. In this example, we have a RecordsProducer class that utilizes the KafkaTemplate to send student and employee records to their respective partitions. The sendMessage method constructs a message with the specified payload, topic, and partition key.
import org.springframework.kafka.core.KafkaTemplate; import org.springframework.kafka.support.KafkaHeaders; import org.springframework.messaging.Message; import org.springframework.messaging.support.MessageBuilder;
import java.util.HashMap; import java.util.Map;
public class RecordsProducer {
private final KafkaTemplate<String, Map<String, String>> kafkaTemplate; private static final String STUDENT_PARTITION_KEY = "xyz"; private static final String EMPLOYEE_PARTITION_KEY = "abc";
public RecordsProducer(KafkaTemplate<String, Map<String, String>> kafkaTemplate) { this.kafkaTemplate = kafkaTemplate; }
public void sendStudentRecord(Map<String, String> event) { String partitionKey = event.getId().startsWith("std") ? STUDENT_PARTITION_KEY : EMPLOYEE_PARTITION_KEY; sendMessage("topic", partitionKey, event); } private void sendMessage(String topic, String partitionKey, Map<String, String> event) { Message<Map<String, String>> message = MessageBuilder .withPayload(event) .setHeader(KafkaHeaders.TOPIC, topic) .setHeader(KafkaHeaders.MESSAGE_KEY, partitionKey) .build(); kafkaTemplate.send(message); } } |
Permitting Kafka to determine the appropriate partition
When a producer doesn't provide a partition key while producing a record, Kafka resorts to a round-robin partition assignment strategy. Consequently, these records will be evenly distributed across all partitions within the designated topic. It's important to note that in the absence of a partition key, the assurance of maintaining record order within a specific partition cannot be upheld. The crucial lesson to derive from this is the significance of employing a partition key. By utilizing a partition key, you can group related events together within a single partition, thereby preserving their original order of transmission.
Creating a custom partitioner implementation
Certainly, in certain scenarios, a producer has the flexibility to employ a custom partitioner implementation that adheres to specific business rules for partition assignment.
This custom partitioner allows the producer to take into account unique business logic and attributes when deciding how to distribute records across partitions. By crafting a personalized partitioner, the producer gains the ability to tailor partition assignments according to the application's distinct requirements, ensuring optimal data distribution and management within the Kafka cluster.
All Chapters