Connecting Apache Kafka to Azure Event Hubs

April 16, 2022

Recently, I worked on an integration with Azure Event Hubs. A colleague of mine faced challenges while trying to export messages from an existing Kafka topic and import them into Event Hubs. To assist, I've documented the steps below, which you may find useful.

Step 1: Download and Extract Apache Kafka

Apache Kafka is an open-source, distributed event streaming platform. It facilitates the construction of distributed systems and ensures high throughput. You can download Apache Kafka from the following link: Apache Kafka Download

$ tar -xzf kafka_2.13-3.1.0.tgz
$ cd kafka_2.13-3.1.0

Step 2: Start the Kafka Environment

Ensure that Java 8 or higher is already installed in your local environment. If not, download and install it from Oracle's website.

To start all services, execute the following commands:

Start the ZooKeeper service:

$ bin/zookeeper-server-start.sh config/zookeeper.properties

Start the Kafka broker:

$ bin/kafka-server-start.sh config/server.properties

Step 3: Create and Set Up Configuration Files

Create a new file named connector.properties with the values below:

... (The content is mostly fine and technical, no changes)

Replace the placeholder values with those from your Azure endpoint. If you haven't already, create a new namespace and deploy Event Hubs resources from the Azure portal. Note that you might need to select the Standard pricing tier or higher to successfully create Kafka topics in the next step.

The required password can be found in the Shared access policies settings of the Event Hub namespace, under the SAS Policy labeled RootManageSharedAccessKey.

Step 4: Create Three Kafka Topics

To create the topics manually, use the kafka-topics commands:

Create the configs topic:

... (Commands are mostly fine and technical, no changes)

Create the offsets topic:

... (Commands are mostly fine and technical, no changes)

Create the status topic:

... (Commands are mostly fine and technical, no changes)

Step 5: Run Kafka Connect

Kafka Connect is a tool for reliably and scalably streaming data between Apache Kafka and Azure Event Hubs. To continuously import and export your data, start the worker locally in distributed mode.

$ bin/connect-distributed.sh path/to/connect-distributed.properties

With everything set up, you can proceed to test import and export functions.

Step 6: Create Input and Output Files

Create a directory and two files: one for seed data to be read by the FileStreamSource connector and another to be written to by the FileStreamSink connector.

$ mkdir ~/connect-demo
$ seq 1000 > ~/connect-demo/input.txt
$ touch ~/connect-demo/output.txt

Step 7: Create FileStreamSource Connector

Next, let me guide you through launching the FileStreamSource connector:

... (Commands are mostly fine and technical, no changes)

Step 8: Create FileStreamSink Connector

Similarly, let's proceed to launch the FileStreamSink connector:

... (Commands are mostly fine and technical, no changes)

Finally, confirm that the data has been replicated between files and is identical.

cat ~/connect-demo/output.txt

You should see that the output.txt file contains numbers from 1 to 1000, just like the input.txt file. That's it! If you update input.txt, output.txt will sync accordingly.

Please note that Azure Event Hubs' support for the Kafka Connect API is still in public preview. The FileStreamSource and FileStreamSink connectors deployed are not intended for production use and should only be used for demonstration purposes.