Syslog-ng 101, part 10: Parsing

7 Mar 2023

This is the tenth part of my syslog-ng tutorial. Last time, we learned about syslog-ng filters. Today, we learn about message parsing using syslog-ng.

You can watch the video or read the text below.

Parsing

Parsing createsname-value pairs from log messages using parsers. It is probably the most interesting but also the most complex part of syslog-ng.

By default, syslog-ng tries to parse all incoming log messages as if they were formatted according to the RFC 3164 or old/BSD syslog specification. This creates a number of macros, including MESSAGE, which contains the actual log message. You can then use other parsers to further parse the content of the MESSAGE macro. It does not stop here: you can parse the content of the resulting macros as well. This way, you can create complex parser chains that extract useful information from log messages.

When we learned about sources, I mentioned the no-parse flag. This way, RFC 3164 parsing is disabled, and you can parse the whole message. This is useful for a JSON or CSV formatted log message.

Why is message parsing important? There are two main use cases. Having log messages available as name-value pairs allows a lot more precise filtering. For example, you can create alerts within syslog-ng for a specific username in login messages. It is also possible to save/forward only relevant data from a longer log message, saving significant amount of storage and/or network traffic.

PatternDB parser

The PatternDB message parser can extract information from unstructured messages into name-value pairs. Not just that, as it can also add status fields to log messages based on message text and do message classification, like LogCheck.

Of course, syslog-ng does not know what is inside the log messages by itself. It needs an XML database describing log messages. There are some sample XML databases available online, but mostly you are on your own creating these databases for your logs. For example, in case of an SSH login failure can be described as:

Parsed: app=sshd, user=root, source_ip=192.168.123.45
Added: action=login, status=failure
Classified as “violation”

JSON parser

The JSON parser turns JSON-based log messages into name-value pairs. Yes, JSON is a structured log format. However, all incoming log messages are treated by syslog-ng as plain text. You have to instruct syslog-ng to use a parser and turn the message into name-value pairs.

CSV parser

The CSV parser can parse columnar data into name-value pairs. A typical example is the Apache access log file, even if the fields are not separated by commas. In this example, you can see that each column has a name. Later, one of the resulting name-value pairs, the name of the authenticated user, is used in a file name.

parser p_apache {
    csv-parser(columns("APACHE.CLIENT_IP", "APACHE.IDENT_NAME", "APACHE.USER_NAME",
        "APACHE.TIMESTAMP", "APACHE.REQUEST_URL", "APACHE.REQUEST_STATUS",
        "APACHE.CONTENT_LENGTH", "APACHE.REFERER", "APACHE.USER_AGENT",
        "APACHE.PROCESS_TIME", "APACHE.SERVER_NAME")
         flags(escape-double-char,strip-whitespace) delimiters(" ") quote-pairs('""[]')
         );
};
destination d_file { file("/var/log/messages-${APACHE.USER_NAME:-nouser}"); };
log { source(s_local); parser(p_apache); destination(d_file);};

Key=value parser

The key=value parser can find key=value pairs in log messages. It was introduced in syslog-ng 3.7. This format is typical for firewall logs, but also used by many other applications. Here are some example log messages:

Aug 4 13:22:40 centos kernel: IPTables-Dropped: IN= OUT=em1 SRC=192.168.1.23 DST=192.168.1.20 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=59228 SEQ=2

Aug 4 13:23:00 centos kernel: IPTables-Dropped: IN=em1 OUT= MAC=a2:be:d2:ab:11:af:e2:f2:00:00 SRC=192.168.2.115 DST=192.168.1.23 LEN=52 TOS=0x00 PREC=0x00 TTL=127 ID=9434 DF PROTO=TCP SPT=58428 DPT=443 WINDOW=8192 RES=0x00 SYN URGP=0

Further parsers

There are several other parsers in syslog-ng. The XML parser can parse XML formatted log messages, typically used by Windows applications. There is a dedicated parser for Linux Audit logs. There are many non-standard date formats. The date parser can help in this case, which can be configured using templates. It saves the date to the sender date.

SCL: syslog-ng configuration library

As mentioned earlier, the syslog-ng configuration library has many parsers. These are implemented in configuration, combining several of the existing parsers.

The Apache parser can parse Apache access logs. It builds on the CSV parser, but also combines it with the date parser and rewrites part of the results to be more human readable.

The sudo parser can extract information from sudo log messages, so it is easy to alert on log messages if something nasty happens.

Log messages from Cisco devices are similar to syslog messages, however, quite often they cannot be parsed by syslog parsers, as they do not comply with specifications. The Cisco parser of syslog-ng can parse many kinds of Cisco log messages. Of course, not all Cisco log messages, only those for which we received log samples.

Python parser

The Python parser was first released in syslog-ng 3.10. It can parse complex data formats, where simply combining various built-in parsers is not enough. It can also be used to enrich log messages from external data sources, like SQL, DNS or whois.

The main drawback of the Python parser is speed and resource usage. C is a lot more efficient. However, for the vast majority of users, this is not a bottleneck. Python also has the advantage that it does not need compilation or a dedicated development environment. For these reasons, the Python scripts are also easier to spread among users than native C.

Application adapters, Enterprise wide message model

As mentioned earlier, the syslog-ng configuration library contains a number of parsers. These are also called Application Adapters. There is a growing list of parsers. Using these you can easily parse log messages automatically, without any additional configuration. This is possible, because Application Adapters are enabled for the system() source since syslog-ng version 3.13.

The Enterprise wide message model (EWMM) allows forwarding name-value pairs between syslog-ng instances. It is made possible by using JSON formatting. It can also forward the original raw message. It is important, as by default, syslog-ng does not send the original message, but what it can reconstruct form it using templates. The original, often broken, formatting is lost. However, some log analytics software expects to receive the broken message format instead of the standards compliant one.

Example

You might have seen this example configuration a few times before if you followed my tutorial series. This is a good example for Application Adapters. You do not see any parser declarations in the configuration, but you can see working with the results of message parsing. In this case, name-value pairs are parsed from sudo log messages:

@version:3.21
@include "scl.conf"
source s_sys { system(); internal();};
destination d_mesg { file("/var/log/messages"); };
log { source(s_sys); destination(d_mesg); };
filter f_sudo {program(sudo)};
destination d_test {
  file("/var/log/sudo.json"
  template("$(format-json --scope nv_pairs --scope dot_nv_pairs --scope rfc5424)\n\n"));
};
log {
    source(s_sys);
    filter(f_sudo);
    if (match("czanik" value(".sudo.SUBJECT"))) {
        destination { file("/var/log/sudo_filtered"); };
    };
    destination(d_test);
};

If you have any questions or comments, leave a comment on YouTube or reach out to me on Twitter / Mastodon.

If you have questions or comments related to syslog-ng, do not hesitate to contact us. You can reach us by email or even chat with us. For a list of possibilities, check our GitHub page under the “Community” section at https://github.com/syslog-ng/syslog-ng. On Twitter, I am available as @PCzanik, on Mastodon as @Pczanik@fosstodon.org.