Learn how to send log messages in bulk mode to your Elasticsearch server with syslog-ng. Bulk mode offers better performance, because it sends multiple log messages in a single POST request.

virág

A few years back I wrote that any time a new language binding is added to syslog-ng, someone implements an Elasticsearch destination in it. The good news is that even if it has some limitations, you can now use the http() destination of syslog-ng to feed Elasticsearch without the need for language bindings. Right now sending logs to multiple nodes is not implemented and I did not test encrypted connections or authentication yet. On the other hand it is implemented in efficient C code, and the http() destination can use batching and multiple worker threads if you have lots of logs to store.

When I received this blog idea from Balázs Scheidler, original author of syslog-ng, there was no ready-to-use configuration to send logs to Elasticsearch with HTTP. Instead of that I was pointed to use the Elasticsearch API documentation to create one. In my blog I’ll describe the methods I used to create the configuration, as it might help you when creating a configuration next time.

Here are some rough numbers about sending logs to Elasticsearch using the http() destination, I measured in a virtual machine on my laptop:

  • base: 3000 msg/s
  • using bulk insert: 17000 msg/s
  • enabling also multi-threading: 21000 msg/s

(Note: this blogpost describes sending log messages to Elasticsearch using the http() destination of syslog-ng. The syslog-ng application also has an Elasticsearch destination that uses the official Java client libraries of Elasticsearch.)

Before you start

The http() destination features I show you in this blog were introduced in syslog-ng version 3.18. I used Elasticsearch 6.4.2 for my tests, but most likely any version should be OK, as the bulk API was already introduced in the 1.X versions.

I used the following two documents to create the configuration:

First working configuration

The road to the first working configuration was rough. The Elasticsearch bulk API expects logs in the following format:

{ "index":{} }
{ "name":"Peter","age":30 } 
{ "index":{} } 
{ "name":"Paul","age":31 }

A line starting with index information and a line with the actual data to insert. This can be repeated many times and is terminated with an empty line. Instead of providing index name and type information in the index line, you can put that in the URL. It saves a few bytes as this information is sent only once in the URL instead of each line of data. In the syslog-ng configuration the line with the index is static, while the line with actual data is generated using a format-json() template function.

My first template did not work, as index was enclosed in the same type of quotation mark, as the template. So it was a syntax error. Next I tried single quotation marks around index, but Elasticsearch did not like it. Using single quotation marks around the template opened up another can of worms: special characters treated literally. The \n marking a new line was sent as is, so Elasticsearch still did not like it. I had to you a real line break in the body template. Finally I ended up with this configuration:

destination d_http {
    http(url("http://localhost:9200/czp/test/_bulk")
        method("POST")
        flush-lines(100)
        workers(4)
        headers("Content-Type: application/x-ndjson")
        body-suffix("\n")
        body('{ "index":{} }
$(format-json --scope rfc5424 --key ISODATE)')
    );
};

Yes, it works. And here are a few notes how it works:

  • In the URL “czp” is the index name, “test” is the type and “_bulk” gives access to the bulk API of Elasticsearch. This way index in the body can be kept short.
  • Method is “POST”
  • Bulk mode (batching) is turned on by “flush-lines”, this time 100 log messages are sent together
  • Four workers push messages in parallel to Elasticsearch
  • Starting with Elasticsearch 6.0 the application/x-ndjson content type is required by the bulk API
  • “body-suffix” in this case is an empty line, this marks end of transfer for the bulk API. “body-prefix” is not used in this case.
  • “body” contains the actual message, in this case name-value pairs from an RFC-5424 compliant log message and the date using the ISODATE macro, as required by Elasticsearch.

But it is also ugly with the line break in the template.

Debugging

I did not stop at the above configuration, as I did not like it. I did many things previously with trial and error, even trying to escape the double quotation marks around index, but I was doing it blindly. Two things came to my rescue: trace messages and using a file() destination with a template.

Trace

Trace enables an extra amount of information during debugging. You can enable it in two ways. First of all by adding a “-t” to the syslog-ng command line parameters:

syslog-ng -Fvdet

The other possibility is to use syslog-ng-ctl:

syslog-ng-ctl trace --set=on

Once you enable it, you can see not just the HTTP response code, but the whole HTTP communication, including part of what is sent and the response:

[2018-10-18T18:01:04.655927] cURL debug; worker='0', type='text', data='About to connect() to localhost port 9200 (#0).'
[2018-10-18T18:01:04.655969] cURL debug; worker='0', type='text', data='  Trying ::1....'
[2018-10-18T18:01:04.656131] cURL debug; worker='0', type='text', data='Connected to localhost (::1) port 9200 (#0).'
[2018-10-18T18:01:04.656275] cURL debug; worker='0', type='header_out', data='POST /czp/test/_bulk HTTP/1.1..User-Agent: syslog-ng 3.18.1/libcurl 7.29.0..Host: localhost:9200..Accept: */*..Content-Type: application/x-ndjson..Content-Length: 720....'
[2018-10-18T18:01:04.656304] cURL debug; worker='0', type='data_out', data='{ "index":{} }{"SOURCE":"s_loggen","PROGRAM":"root","PRIORITY":"notice","MESSAGE":"test","LEGACY_MSGHDR":"root: ","ISODATE":"2018-10-18T18:01:02+02:00","HOST_FROM":"127.0.0.1","HOST":"127.0.0.1","FACILITY":"kern","DATE":"Oct 18 18:01:02"}..{ "index":{} }{"SOURCE":"s_loggen","PROGRAM":"root","PRIORITY":"notice","MESSAGE":"test","LEGACY_MSGHDR":"root: ","ISODATE":"2018-10-18T18:01:03+02:00","HOST_FROM":"127.0.0.1","HOST":"127.0.0.1","FACILITY":"kern","DATE":"Oct 18 18:01:03"}..{ "index":{} }{"SOURCE":"s_loggen","PROGRAM":"root","PRIORITY":"notice","MESSAGE":"test","LEGACY_MSGHDR":"root: ","ISODATE":"2018-10-18T18:01:04+02:00","HOST_FROM":"127.0.0.1","HOST":"127.0.0.1","FACILITY":"kern","DATE":"Oct 18 18:01:04"}..'
[2018-10-18T18:01:04.656322] cURL debug; worker='0', type='text', data='upload completely sent off: 720 out of 720 bytes.'
[2018-10-18T18:01:04.667616] cURL debug; worker='0', type='header_in', data='HTTP/1.1 200 OK..'
[2018-10-18T18:01:04.667636] cURL debug; worker='0', type='header_in', data='content-type: application/json; charset=UTF-8..'
[2018-10-18T18:01:04.667642] cURL debug; worker='0', type='header_in', data='content-length: 556..'
[2018-10-18T18:01:04.667646] cURL debug; worker='0', type='header_in', data='..'
[2018-10-18T18:01:04.667652] cURL debug; worker='0', type='data_in', data='{"took":8,"errors":true,"items":[{"index":{"_index":"czp","_type":"test","_id":"-sboh2YBhnXK3rFRrH6S","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse, document is empty"}}},{"index":{"_index":"czp","_type":"test","_id":"-8boh2YBhnXK3rFRrH6S","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse, document is empty"}}},{"index":{"_index":"czp","_type":"test","_id":"_Mboh2YBhnXK3rFRrH6S","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse, document is empty"}}}]}'
[2018-10-18T18:01:04.667673] cURL debug; worker='0', type='text', data='Connection #0 to host localhost left intact.'
[2018-10-18T18:01:04.667706] curl: HTTP response received; url='http://localhost:9200/czp/test/_bulk', status_code='200', body_size='720', batch_size='3', redirected='0', total_time='0.016', location='/etc/syslog-ng/conf.d/http_elastic.conf:12:5'

While you cannot see the whole message sent to Elasticsearch, and the formatting does not follow the template, you can at least see the response.

File destination

While trace output does not follow template formatting, part of the template (the “body”) can be reproduced using a file destination with a template. The hard task is to keep the two templates in sync. This is how I figured out, that double quotation marks can be escaped within the template.

Here is a file destination mimicking the body template:

destination d_file {
     file("/var/log/czp_elastic" template("{ \"index\":{} }\n$(format-json --scope rfc5424 --scope nv-pairs --key ISODATE)\n") );
};

Full configuration including debug code

Here is the full configuration, which includes the file destination used for template debugging.

# network source
source s_net {
    tcp(ip("0.0.0.0") port("515"));
};

# Elasticsearch destination
destination d_http {
    http(url("http://localhost:9200/czp/test/_bulk")
        method("POST")
        flush-lines(3)
        workers(4)
        headers("Content-Type: application/x-ndjson")
        body-suffix("\n")
        body("{ \"index\":{} }\n$(format-json --scope rfc5424 --scope nv-pairs --key ISODATE)\n")
    );
};

# debug output to file
destination d_file {
     file("/var/log/czp_elastic" template("{ \"index\":{} }\n$(format-json --scope rfc5424 --scope nv-pairs --key ISODATE)\n") );
};

# combining all of above in a log statement
log {
    source(s_net);
    destination(d_http);
    destination(d_file);
};

Note that before you can use it in your own environment, you most likely have to change many of the http() destination parameters. The host name, index name and type in the URL. Experiment with flush-lines and workers. And most likely you also need to change the scope in format-json, depending on your log messages.

If you have questions or comments related to syslog-ng, do not hesitate to contact us. You can reach us by email or you can even chat with us. For a list of possibilities, check our GitHub page under the “Community” section at https://github.com/balabit/syslog-ng. On Twitter, I am available as @PCzanik.

Related Content