Optimize your Splunk infrastructure using new syslog-ng features

Learn how to use less resources for better performance in Splunk! Many people have been using syslog-ng for decades without knowing that it receives new features as well as bugfixes. While many Linux utilities are practically in maintenance mode, syslog-ng keeps evolving constantly. A strong focus in recent years has been on message parsing and destination drivers.

After my talk at Suricon, Splunk users explained how they will change their syslog-ng configurations to optimize their Splunk infrastructure.

Before you begin

New features land in syslog-ng with every release, that is, every two months. In this blog post I will describe features from the latest open source release, version 3.18. If you need commercial level support, the latest version of syslog-ng Premium Edition (PE) also supports all features listed in this blog post. Some of the features arrived earlier, of course, but high performance http() destination to feed Splunk HEC (HTTP Event Collector) became available only in the releases mentioned before. If your Linux distribution comes with an earlier version of syslog-ng, check https://syslog-ng.com/3rd-party-binaries for up-to-date packages.

Message parsing

Splunk parses incoming log messages automatically when it identifies some structure (for example, the “ = “ sign for name-value pairs) in them. The problem is that automatic parsing does not always produce the expected results. Message parsing is one of the parts of syslog-ng that evolve best. Application Adapters parse some data automatically and when you use the right parsers for the rest of the messages, syslog-ng provides more accurate results than Splunk (while using less resources, as an additional perk).

For example, you can parse iptables log messages by using the key=value parser. You can also parse JSON formatted messages, like those coming from Suricata. For a complete list of parsers, check http://support.oneidentity.com/technical-documents/syslog-ng-open-source-edition/3.18/administration-guide/parser-parse-and-segment-structured-messages

If none of them fit your need, you can develop your own parser in Python.

Enriching messages

You can enrich messages in Splunk using lookup, but it has quite an overhead, given that it is done during search time. There are several ways in syslog-ng to enrich messages in real-time (as the message arrives) with minimal overhead.

  • The PatternDB message parser is designed for unstructured log messages (for example, ssh login messages). You can use it for extracting important information from messages (for example, user names or IP addresses) in addition to creating new name-value pairs based on message content (fxample, authentication for the action of the message and failure for the result of the message.

  • You can add geographical information to IP addresses using the GeoIP parser. In addition to longitude / latitude, continent, country, county and city names arealso available now as name-value pairs.

  • You can also extract contextual data to name-value pairs from CSV files. For example, you can add machine functions, operator e-mail and other information to local IP addresses, which can help you fine tune graphs and send alerts to the right person in real-time.

JSON formatting

Traditionally, syslog-ng receives and saves log messages in a “date + hostname + application name + message” format, where the message part can often look nearly as complex as a fully functional English sentence. As the message part is often in free form or barely structured, automatic parsing often leads to unexpected results. Luckily, with a bit of extra work, you can configure syslog-ng to parse messages and create name-value pairs from them.

You can use the JSON template function of syslog-ng to save log messages in JSON format. This allows you to save the name-value pairs that you received previously through parsing or enrichment.

Splunk HEC: simplify

For many years, Splunk recommended saving log messages to a directory structure where Splunk forwarders can read and forward them to Splunk. With the recent introduction of the HTTP Event Collector (HEC), this recommendation has changed and you do not need forwarders anymore. Using syslog-ng, you can now send log messages directly to Splunk HEC. Skipping the file writing and Splunk forwarder step greatly simplifies your log collection layer. You can use JSON formatting here as well.

Starting with syslog-ng PE 7.0.12, a dedicated Splunk destination building became available, building on the http() destination but masking some of its complexity.

The syslog-ng application supports encrypted connections as well as sending messages in batches. Load-balancing between multiple Splunk indexer nodes is also coming soon.

Send only data required

The syslog-ng application has been one of the best tools for optimizing Splunk workloads for the past several years. Originally, simple filtering was used. Message parsing enhanced filtering, as instead of focusing on message parameters (for example, priroty or facility),filtering could be based on the extracted information (for example, user names or IP addresses). Enrichment and black list filtering help fine tuning filtering rules even further.

You have yet another option to send less data to Splunk. The example I was told at Suricon was about iptables. It generates several different fields about each event, but, fortunately, in most cases you only need a small subset of this data. You can use the key=value parser to extract information from the log messages and forward only a fraction of those name-value pairs to Splunk.

With all the above features introduced, syslog-ng just got an order of magnitude better in helping you achieve more optimal operations and cost efficiency with your Splunk deployment.

If you have questions or comments related to syslog-ng, do not hesitate to contact us. You can reach us by email or you can even chat with us. For a list of possibilities, check our GitHub page under the “Community” section at https://github.com/balabit/syslog-ng. On Twitter, I am available as @PCzanik.

Anonymous