3 Apr 2018

Splunk HEC: Sending logs using the program() destination of syslog-ng

Recently Splunk started to recommend the use of the HTTP Event Collector (HEC) instead of forwarders. Syslog-ng supports this in multiple ways. Last time I showed you how to use the http() destination of syslog-ng. This time I introduce you to another possibility: using an external python script to send logs to HEC.

Why use an external script?

You might wonder why use an external script, when syslog-ng already has an HTTP destination built in. The HTTP destination works fine for HEC as long as you do not need encryption (syslog-ng and Splunk on the same host) and you have a low (<10000 events per second) message rate. The problem is that there is no batching implemented inside the HTTP destination. Each message is sent in a separate connection to Splunk. It works fine up to a point, but with a larger message rate or especially when you enable encryption, it becomes very inefficient and an abuse of resources.

As a workaround to this problem, you can use the program() destination with a Python script to send messages to HEC. You lose some of the advantages of a native implementation as syslog-ng only knows that the script received the message, but receives no acknowledgment if it actually reached Splunk. On the other hand, forwarding messages to HEC becomes faster both for plain and encrypted connections thanks to batching.

Before you begin

For my tests, I used the latest available releases of Splunk and syslog-ng running on CentOS 7. I used Splunk 7.0.2 and syslog-ng 3.14.1. You do not have to run the latest and greatest, as HEC was already available in later Splunk 6.x releases and I cannot even recall a syslog-ng version without the program() destination.

The Python script recommended by Splunk is available at https://bitbucket.org/rfaircloth-splunk/rsyslog-omsplunk/src/cca949cd5896d5a34be1e7358b3f3467977a4e1f/omsplunkhec.py?at=master&fileviewer=file-view-default. Download the raw version and save it to a location where syslog-ng can access it and make the script executable.

For simplicity, I use unencrypted connections, but unless syslog-ng and Splunk are running on the same machine, you’d better enable SSL.

Configure Splunk

In order to receive messages over HTTP, you need to enable the HTTP Event Collector in Splunk:

a. Log in as administrator, and choose Settings > Data inputs > HTTP Event Collector.

b. In the upper right corner, click Global Settings. Here, click Enable and remove the check mark from Enable SSL.

c. If you modify an already existing Splunk installation, make any further site-specific changes as necessary.

The easiest way to use the HTTP destination is to send syslog messages as is, without any modifications. This might also be a requirement by some of the Splunk applications, which expect unmodified syslog messages.
Splunk needs a token in log messages to figure out their format and intended destination. To create a token:

a. Go to the HEC page in Splunk, and click the New Token button in the upper right corner.

b. First, give the token a name.

c. On the next screen, select the “syslog” source type.

d. At the end of the process, you will see a token similar to this: 94476318-fc2c-410b-a9a8-5796585ffc9e. Make a note of it as you will need it in the syslog-ng configuration.

e. Keep this tab open, you will need it later on.

Configure syslog-ng

You can append the configuration snippet below to the end of your current syslog-ng configuration, or create a new .conf file under /etc/syslog-ng/conf.d/ if this possibility is enabled in your distribution.

source s_network {
    network(
        transport("tcp")
	  port(514)
    );
};
destination d_hecpy
{
  program("/usr/local/bin/omsplunkhec.py fcddc233-a7c4-43fa-903a-0654622c5093 127.0.0.1"
  template("original_host=${HOST} <${PRI}>${DATE} ${HOST} ${MSG}\n") ); 
};
log { source(s_network); destination(d_hecpy); flags(flow-control); };

Let’s take a look at the details:

The network source is not really necessary, I put it into the example only to make it self-contained.
The most important part is the destination called “d_hecpy”, as this is what sends log messages to Splunk.
The above configuration assumes that the script is saved to the /usr/local/bin directory and Splunk is running on the local host.
Of course the token above will not work, you need to replace it with the one you generated in the previous step. If you still have the tab in your browser open, it is just a copy and paste exercise.
The last line – the log statement – connects the source with the destination. Here the flags(flow-control) is the interesting part. It is a mechanism that makes sure that syslog-ng does not collect more logs on the source side than can be pushed out on the destination side. It does not work in all situations – for example with a UDP source, that cannot apply backwards pressure – but works fine for TCP or file sources. You can read more about it at https://www.syslog-ng.com/technical-documents/doc/syslog-ng-open-source-edition/3.16/administration-guide.

Testing

Once you restarted syslog-ng, you should test if the above configuration works. In our example, we used a network source, so testing should work from remote machines as well. To perform the test, just replace the IP address in the sample command below:

logger -T -n 127.0.0.1 -P 514 "This is a test"

In a few seconds, you should be able to see your test message in the Splunk search interface.

Splunk HEC: Sending logs using the program() destination of syslog-ng

Why use an external script?

Before you begin

Configure Splunk

Configure syslog-ng

Testing