Consuming logs from a Kafka topic using syslog-ng

19 Jan 2021

Consuming logs from a Kafka topic using syslog-ng

There is no official Kafka source in syslog-ng, but because this question comes up often enough, I created one. It is just a temporary workaround using the program() source, but it works. It involves Java and installing Kafka manually, but it was fast and reliabe in my tests: ingesting 50,000–100,000 messages a second on my laptop in a resource-constrained virtual machine.

Of course, I also tried a more resource-friendly solution, using kafkacat to consume log messages from a Kafka topic. While it worked perfectly on the command line, I could not get it to work with the program() source in syslog-ng.

If you read my blog last week about using templates in the topic() parameter of the Kafka destination, the test environment will look familiar. The only notable difference is that the tool used to consume logs from Kafka is now called within syslog-ng from a program() source.

Before you begin

You do not need the most recent syslog-ng version to use the program() source. Still, I recommend you use recent packages, because they contain many useful bug fixes. You can learn more about where 3^rd party syslong-ng packages for major Linux distributions are available at https://www.syslog-ng.com/3rd-party-binaries

Kafka might be available for your Linux distribution of choice, but it was not available in the distributions I checked. For simplicity’s sake, I use the binary distribution from the Kafka website. At the time of writing, the latest available version is kafka_2.13-2.6.0.tgz and it should work equally well on any Linux host with a recent enough (that is, 1.8+) Java. If you use a local Kafka installation, you might need to modify some of the example command lines.

Downloading and starting Kafka

A proper Kafka installation is outside of the scope of my blog. Here, you will follow relevant parts of the Kafka Quickstart documentation. You will download the archive containing Kafka, extract it, and start its components. You will need network access and four terminal windows.

First, download the latest Kafka release and extract it. The exact version might bedifferent:

wget https://downloads.apache.org/kafka/2.6.0/kafka_2.13-2.6.0.tgz
tar xvf kafka_2.13-2.6.0.tgz

At the end of this process, you will see a new directory named kafka_2.13-2.6.0.

From now on, you will need the 3 extra terminal windows, because first you will start two separate daemons in the foreground to see their messages, and two more windows are required to send messages to Kafka and to receive them.
First, start zookeeper in one of the terminal windows. Change to the new Kafka directory and start the application:

cd kafka_2.13-2.6.0/
bin/zookeeper-server-start.sh config/zookeeper.properties

Now you can start the Kafka server in a different terminal window:

cd kafka_2.13-2.6.0/
bin/kafka-server-start.sh config/server.properties

Both applications print lots of data on screen. Normally, the flood of debug information stops after a few seconds and the applications are ready to be used. If there is a problem, you will get back the command line. In this case, you will have to browse through the debug messages and resolve the problem.

Now you can do some minimal functional testing, without syslog-ng involved yet. This way you can make sure that access to Kafka is not blocked by a firewall or other software.

Open yet another terminal window, change to the Kafka directory and start a script to collect messages from a Kafka topic. You can safely ignore the warning message, it appears because the topic does not exist yet.

cd kafka_2.13-2.6.0/
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic mytest
[2020-12-15 14:41:09,172] WARN [Consumer clientId=consumer-console-consumer-31493-1, groupId=console-consumer-31493] Error while fetching mblog_kafka_source_hack_review.docxetadata with correlation id 2 : {mytest=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)

Now you can start a fourth terminal window to send some test messages. Just enter something after the “>” character and press Enter. Moments later, you should see what you have just entered in the third terminal window:

bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic mytest
>blah
>blahblah
>blahblahblah
>

You can exit with ^D.

Configuring syslog-ng

Now that you have checked that you can send messages to Kafka and pull those messages with another application, it is time to configure syslog-ng. If syslog-ng on your system is configured to include .conf files from the /etc/syslog-ng/conf.d/ directory, create a new configuration file there. Otherwise, append the configuration below to syslog-ng.conf:

source s_kafka {
  program("/root/kafka_2.13-2.6.0/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic mytest");
};
destination d_fromkafka {
  file("/var/log/fromkafka");
};
log {
  source(s_kafka);
  destination(d_fromkafka);
};

The above configuration snippet consumes log messages from Kafka and writes them to a file under the /var/log/ directory. Make sure that settings in the Kafka source match your environment. Here the Kafka archive is extracted under the /root/ directory and the topic name is the same as in the initial tests: “mytest”.

Testing

Once you have reloaded syslog-ng, you are ready for testing.

Staying in the Kafka directory you can start the producer to send messages to Kafka:

leap152:~/kafka_2.13-2.6.0 # bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic mytest
>blah
>blahblah
>

You can now check whether messages successfully arrived to syslog-ng by tailing the log file:

leap152:/etc/syslog-ng/conf.d # tail -f /var/log/fromkafka

Jan 15 13:03:25 leap152 blah

Jan 15 13:03:29 leap152 blahblah

As usual, you can exit from the producer using ^D.

What is next?

This blog is enough to get you started and learn the basic concepts of Kafka. On the other hand, this environment is far from anything production-ready. For that, you will need a proper Kafka installation and most likely the Kafka consumer in the syslog-ng configuration also requires additional settings. This setup was fast and reliable in my test environment, but that is not a guarantee that it also works well in a production environment. Let me know your experiences!

If you have questions or comments related to syslog-ng, do not hesitate to contact us. You can reach us by email or even chat with us. For a list of possibilities, check our GitHub page under the “Community” section at https://github.com/syslog-ng/syslog-ng. On Twitter, I am available as @Pczanik.