Type support: getting started with syslog-ng 4.0

Version 4.0 of syslog-ng is right around the corner. It hasn’tyet been released; however, you can already try some of its features. The largest and most interesting change is type support. Right now, name-value pairs within syslog-ng are represented as text, even if the PatternDB or JSON parsers could see the actual type of the incoming data. This does not change, but starting with 4.0, syslog-ng will keep the type information, and use it correctly on the destination side. This makes your life easier, for example when you store numbers to Elasticsearch or to other type-aware storage.

From this blog, you can learn how type support makes your life easier and helps you to give it a testdrive on your own hosts.

Before you begin

Some of the 4.0 features are already available in syslog-ng 3.37, but the latest git snapshots have even more to give you. The PatternDB feature I’ll show was only merged to syslog-ng git master on the first week of August. You can build syslog-ng yourself or use one of the pre-built packages. The syslog-ng team makes nightly builds of git snapshots available for Debian and Ubuntu: https://www.syslog-ng.com/community/b/blog/posts/nightly-syslog-ng-builds-for-debian-and-ubuntu

There are regularly published git snapshot builds for openSUSE / SLES and Fedora / RHEL as well at: https://www.syslog-ng.com/community/b/blog/posts/rpm-packages-from-syslog-ng-git-head/

Enabling syslog-ng 4.0 features

You can try the 4.0 features by changing the version string in syslog-ng.conf. Open the file in a text editor, and you will see that it starts with a line like this:

@version:3.37

As long as the version number here is 3.x, it will work as any other 3.x release. But as soon as you change the number to 4.0, syslog-ng will enable the new, currently still experimental features.

@version:4.0

You can check if rewriting the configuration file was successful by doing a quick configuration check. Even with a syntactically correct configuration, it will print a warning message on the terminal:

tumbleweed:~ # syslog-ng -s
[2022-08-04T12:57:46.547841] WARNING: experimental behaviors of the future syslog-ng 4.0 are now enabled. This mode of operation is meant to solicit feedback and allow the evaluation of the new features. USE THIS MODE AT YOUR OWN RISK, please share feedback via GitHub, Gitter.im or email to the authors; config-version='4.0'

In this blog, I will change the configuration version number quite often to be able to demonstrate the difference between the 3.x and 4.0 behavior.

Initial testing

My first example is taken from a blog written by Bazsi, the original author of syslog-ng. It is probably the best example to demonstrate the difference between the 3.x and 4.0 behavior. The input and the syslog-ng configuration were the same in both cases, except for the version string at the beginning of the configuration.

He wrote a completely standalone configuration; however I prefer to extend the existing configuration. So, I created a new .conf file under the /etc/syslog-ng/conf.d/ directory, with the following content:

# basic JSON parsing
log {
  source { tcp(port(2000) flags(no-parse)); };
  parser { json-parser(prefix('.json.')); };
  destination { file("/var/log/json.out" template("$(format-json .json.* --shift-levels 2)\n")); };
};

It will listen on port 2000 (make sure that your firewall does not block it), parse it with the JSON parser and store the results in /var/log/json.out using the JSON template function.

Using netcat, first, I sent a JSON formatted log message to port 2000. Then I changed the version string from 3.37 to 4.0, reloaded the configuration and repeated the same log message. Here is the command line to send a JSON formatted log message:

echo '{"text": "string", "number": 5, "bool": true, "thisisnull": null, "list": [5,6,7,8]}' | nc -q0 localhost 2000

As you can see, it has many different values: text, number, boolean, list and even a null. However, when you take a careful look at how syslog-ng 3.x handles it, you will see that no matter what the original data type was, everything is written as text:

tumbleweed:/etc/syslog-ng/conf.d # cat /var/log/json.out
{"thisisnull":"","text":"string","number":"5","list":"5,6,7,8","bool":"true"}

Of course, type hinting was available in syslog-ng for a while, but using it is pretty inconvenient, especially if you know that the type information was readily available with the incoming data.

Now, rewrite the version string from 3.x to 4.0 and send the same log message again to syslog-ng. You should see a marked difference:

{"thisisnull":null,"text":"string","number":5,"list":["5","6","7","8"],"bool":true}

Almost everything is formatted as expected. The only difference is the list, where all values are treated as text.

When you read the JSON directly, it does not make a real difference. The human brain interprets a number as a number, even if it is between quotation marks. However, if you forward these logs to Elasticsearch or any kind of log analysis software, you will appreciate that data is formatted properly. You can create graphs from numbers without any further type hinting in syslog-ng or mapping on the destination side.

Sudo

If you have sudo version 1.9.4 or later installed, then you can also play with sudo logs. Just enable JSON formatted logging. See: https://www.sudo.ws/posts/2022/05/sudo-for-blue-teams-how-to-control-and-log-better/ for details. Configure it and use the logs from syslog-ng 3.X.

If you use 4.0 as version string in syslog-ng, then syslog-ng properly uses integer instead of writing text everywhere in JSON formatted logs.

PatternDB

The PatternDB parser was type-aware right from the beginning: syslog-ng was looking not just for text in log messages, but also numbers and other types. However, the type information collected was never really used by syslog-ng.

For this blog, I prepared a simplified PatternDB XML file to parse successful SSHD login messages and a syslog-ng configuration that uses it. You can find the content of the XML file below. I saved it under /etc/syslog-ng/sshd.xml:

<?xml version='1.0' encoding='UTF-8'?>
<patterndb version='3' pub_date='2010-07-13'>
  <ruleset name='opensshd' id='2448293e-6d1c-412c-a418-a80025639511'>
    <description>
      This ruleset covers the OpenSSH server.
    </description>
    <url>www.openssh.com</url>
    <pattern>sshd</pattern>
    <rules>
      <rule provider="patterndb" id="4dd5a329-da83-4876-a431-ddcb59c2858c" class="system">
        <patterns>
          <pattern>Accepted @ESTRING:usracct.authmethod: @for @ESTRING:usracct.username: @from @ESTRING:usracct.device: @port @NUMBER:usracct.port@ @ANYSTRING:usracct.service@</pattern>
        </patterns>
        <examples>
          <example>
           <test_message program="sshd">Accepted password for bazsi from 127.0.0.1 port 48650 ssh2</test_message>
           <test_values>
            <test_value name="usracct.username">bazsi</test_value>
            <test_value name="usracct.authmethod">password</test_value>
            <test_value name="usracct.device">127.0.0.1</test_value>
            <test_value name="usracct.port">48650</test_value>
            <test_value name="usracct.service">ssh2</test_value>
           </test_values>
          </example>
        </examples>
        <values>
          <value name="usracct.type">login</value>
          <value name="usracct.sessionid">$PID</value>
          <value name="usracct.application">$PROGRAM</value>
          <value name="secevt.verdict">ACCEPT</value>
        </values>
        <tags>
          <tag>usracct</tag>
          <tag>secevt</tag>
        </tags>
      </rule>
    </rules>
  </ruleset>
</patterndb>

For the curious: it is taken from the syslog-ng PatternDB collection available at https://github.com/balabit/syslog-ng-patterndb/ It is not maintained anymore, but still serves as a good starting point if you want to create your own patterns. This one is based on the SSHD pattern, with most rules removed and in the remaining rule, an extra name-value pair was added as the port number. It is not much useful, as you rarely want to graph these values, but as an easy-to-generate log message, it is good for testing purposes.

Just as previously, I saved the configuration in a .conf file under /etc/syslog-ng/conf.d/. I will remind you about this later, but for now: the source in the log path is system-specific. The source for local system logs is called src on SUSE, s_sys on Fedora / RHEL and other names on other distributions. Check syslog-ng.conf for the source collecting local log messages.

# sshd patterndb
parser p_patterndb {
    db-parser(
        file(
            '/etc/syslog-ng/sshd.xml'
        ),
        drop-unmatched(
            yes
        )
    );
};

destination d_json {
  file("/var/log/json" template("$(format-json --scope rfc5424 --scope dot-nv-pairs
       --rekey .* --shift 1 --scope nv-pairs --exclude MESSAGE --exclude .journal*)\n\n"));
};

log {
  source(src);
  parser(p_patterndb);
  destination(d_json);
};

The first block is a PatternDB parser. It loads a pattern database and drops any messages that do not match any of the rules.

Next comes a file destination. It uses the JSON template function, including all syslog fields and name-value pairs parsed from messages. It excludes fields from the Journal to make the logs shorter and easier to read.

Finally comes the log path. As mentioned earlier: make sure that the name of the source matches that in syslog-ng.conf on your system. Logs are then parsed by PatternDB and saved to a JSON formatted file.

Now that your configuration is ready, reload syslog-ng. Login using SSH and check the logs:

{"usracct":{"username":"root","type":"login","sessionid":"4799","service":"ssh2","port":"41658","device":"172.16.167.1","authmethod":"keyboard-interactive/pam","application":"sshd"},"secevt":{"verdict":"ACCEPT"},"classifier":{"rule_id":"4dd5a329-da83-4876-a431-ddcb59c2858c","class":"system"},"SOURCE":"src","PROGRAM":"sshd","PRIORITY":"info","PID":"4799","HOST_FROM":"tumbleweed","HOST":"tumbleweed","FACILITY":"auth","DATE":"Aug  5 16:14:36"}

As you can see, syslog-ng was running in 3.x mode: all data is saved as text. Now rewrite the version string to 4.0 and try again. You should see a minor difference:

{"usracct":{"username":"root","type":"login","sessionid":"5130","service":"ssh2","port":43032,"device":"172.16.167.1","authmethod":"keyboard-interactive/pam","application":"sshd"},"secevt":{"verdict":"ACCEPT"},"classifier":{"rule_id":"4dd5a329-da83-4876-a431-ddcb59c2858c","class":"system"},"SOURCE":"src","PROGRAM":"sshd","PRIORITY":"info","PID":"5130","HOST_FROM":"tumbleweed","HOST":"tumbleweed","FACILITY":"auth","DATE":"Aug  5 16:17:29"}

It is nothing really visually outstandingat first, but if you look carefully enough, you will see that the port number is not quoted. It is treated as a number.

The session ID, derived from the PID, is still text. The reason is simple. Name-value pairs created in the <values> part of the rule inherit the type of the original data. Even if we know that it is a number, syslog-ng treats it as text.

This is where a rewrite operation can come handy, it can also set types. Insert the following line before the PatternDB parser in the log path:

rewrite { set(int("${PID}") value("PID")); };

Now check the logs:

{"usracct":{"username":"root","type":"login","sessionid":5309,"service":"ssh2","port":41404,"device":"172.16.167.1","authmethod":"keyboard-interactive/pam","application":"sshd"},"secevt":{"verdict":"ACCEPT"},"classifier":{"rule_id":"4dd5a329-da83-4876-a431-ddcb59c2858c","class":"system"},"SOURCE":"src","PROGRAM":"sshd","PRIORITY":"info","PID":5309,"HOST_FROM":"tumbleweed","HOST":"tumbleweed","FACILITY":"auth","DATE":"Aug  5 16:23:58"}

As you can see, both PID and sessionid are now treated as numbers. Similar rewrite rules work also if you have name-value pairs that do not come from the JSON or PatternDB parsers. In that case, you have to set the type yourself.

You might already have a huge database and related tools that expect PID to be a text field. Starting just last week (on 4th August), you can set the type in PatternDB <value> as well. Remove the rewrite part from the configuration, and instead, edit sshd.xml to set the type of sessionid:

          <value name="usracct.sessionid">int($PID)</value>

After reloading syslog-ng, log in again using SSH. You should see something similar:

{"usracct":{"username":"root","type":"login","sessionid":5495,"service":"ssh2","port":48236,"device":"172.16.167.1","authmethod":"keyboard-interactive/pam","application":"sshd"},"secevt":{"verdict":"ACCEPT"},"classifier":{"rule_id":"4dd5a329-da83-4876-a431-ddcb59c2858c","class":"system"},"SOURCE":"src","PROGRAM":"sshd","PRIORITY":"info","PID":"5495","HOST_FROM":"tumbleweed","HOST":"tumbleweed","FACILITY":"auth","DATE":"Aug  5 16:29:05"}

PID is still text, but sessionid is properly set as number.

What is next?

Now that you have seen syslog-ng writing logs to JSON formatted files with proper typing, it is time to test another destination. It could be Elasticsearch, MongoDB, or an SQL database. I did my tests using AWS’s OpenSearch using the elasticsearch-http() destination of syslog-ng, and everything worked as expected. Getting started with OpenSearch was a bit rough, but then I remembered that I already documented how to get started with it:  Opensearch and syslog-ng 

Related Content