Testing the performance of log streaming to Kafka with syslog-ng

In our previous post, we discussed the performance of syslog-ng, streaming logs to HDFS destinations. Now we’ll pick up where we left off, and continue our performance evaluation with the Kafka.

In our Kafka tests, we used syslog-ng Premium Edition 6.0.3 running on a server with two Intel Xeon E5-2620 v3 2,40 GHz CPUs, 16 GB of RAM, a 10 Gbps Ethernet, a 500 GB SSD and Ubuntu Trusty Tahr. The Kafka server ran on VMware ESX with a two-core CPU at 2.6 GHz, 3 GB of RAM, a 1 Gbps Ethernet interface, SSD-based storage, with Ubuntu Trusty Tahr and Kafka 0.10.0.1.

Again, in all our tests, syslog-ng processed real logs originating from Windows Event Log (sent by the syslog-ng Premium Edition Windows agent), the average message size was 400 bytes (ranging between 137 and 2133 bytes, and syslog-ng received logs from 10 parallel TCP connections.

Kafka performance

There are two very important syslog-ng options that have serious impact on the performance:

  • The sync_send(true|false) setting
  • The message template you use

sync_send(true|false)

When sync_send is set to true, syslog-ng PE sends the message reliably, meaning it waits for the confirmation reply from the Kafka server. When sync_send is set to false, syslog-ng PE sends messages asynchronously, and receives responses also asynchronously. This means that sync_send(true) is more reliable, but will cost a big chunk of performance.

With sync_send set to false, we achieved a throughput of 78,000 logs/s (30 MB/s). However, with sync_send set to true, performance dropped dramatically, resulting in 1500 logs/s (600 kB/s).
In the latter scenario syslog-ng PE consumed 1.5 GB memory and roughly 30% CPU, while the Kafka server consumed 100% CPU.

Message template

The second important factor is the template. The more complex the template, the lower the performance. If syslog-ng PE uses the following template (adding only a few values): $(format-json –scope rfc5424 –exclude DATE –key ISODATE), then the result is 26,000 logs/s (11 MB/s). But if syslog-ng PE uses a more complex template that will add all syslog-ng PE macros to JSON: “$(format-json –scope all-macros –key ISODATE)”, the performance drops to 4,000 logs/s (14 MB/s).

The first template generates a 430 byte message, while the second one generates 3500 byte message for the same incoming event. To improve the performance. make sure to only include information in the messages that you really need.

Tips&tricks

Kafka:

If you need high performance with Kafka, set sync_send(false) in syslog-ng PE. Also, as the bottleneck can be the complexity of the message, try to stick with a simple template.

Hadoop

If you are using a firewall, you only have to add the namenode:port pair in the syslog-ng PE configuration file. However, you also have to enable the IP address and port of the data node(s) in the firewall settings as well (typical port numbers are 50010 and 50020). After syslog-ng connects to the namenode, that will provide the IP:port of the datanode to use, and syslog-ng PE will send the data to this datanode.

Because Hadoop works with big block sizes (64/128 MB as default) and flushes the buffer automatically, logs sent by syslog-ng  are not necessarily available immediately on the HDFS server.

syslog-ng PE test configuration

Kafka
 
@version: 6.0
 @module "mod-java"
 options {
 keep_hostname(yes);
 keep_timestamp(no);
 stats_level(2);
 use_dns(no);
 };
 source s_network_aa8212871dbe48a4b3df418d5b59ba7b {
 network(ip(0.0.0.0)
 log_fetch_limit(1000)
 log_iw_size(10000)
 max_connections(10)
 port(514));
 };
 destination d_java_c576a7c038c2412986d1b0e634cfbbd4 {
 java(class_name(org.syslog_ng.kafka.KafkaDestination)
 class_path('/opt/syslog-ng/lib/syslog-ng/java-modules/*.jar:/var/testdb_working_dir/a3a312e0-5696-4f65-afe7-fbfbeebff43f/build/distributions/kafka-libs/lib/*.jar')
 option("kafka_bootstrap_servers", "10.140.32.90:9093")
 option("topic", "t28928a51b8b2187a78b9_testbot")
 option("sync_send", "false")
 #option("template", "$(format-json --scope rfc5424 --exclude DATE --key ISODATE)")
 #option("template", "$(format-json --scope all-macros --key ISODATE)")
 log_fifo_size(20000)
 );
 };
 log {
 source(s_network_aa8212871dbe48a4b3df418d5b59ba7b);
 destination(d_java_c576a7c038c2412986d1b0e634cfbbd4);
 
flags(flow-control);
 };
Related Content