Syslog-ng 101, part 11: Enriching log messages

This is the eleventh part of my syslog-ng tutorial. Last time, we learned about message parsing using syslog-ng. Today, we learn about enriching log messages.

You can watch the video or read the text below.

Enriching log messages

You can also enrich log messages using syslog-ng. Enriching in this case means, that you can create additional name-value pairs based on message content. There are several ways how you can enrich log messages using syslog-ng.

The PatternDB parser can not just parse out interesting information from log messages but can also create additional name-value based on message content. You can add fields in the XML database that describe the content of the message. For example you can mark any login related events with “action=login” and if the message is about an unsuccessful login then “status=failure”.

The GeoIP parser can find geolocation of an IP address. The software itself is freely available, but the database it uses requires registration. It is no more distributed as part of Linux distributions. The original implementation just returned the country and longitude / latitude information. The current implementation returns many more information and in multiple languages.

Geographical information can help to find anomalies, like a user logging in from two distant locations at once. A probably less useful but lot more popular usage of GeoIP is displaying the location of IP addresses on a map. It is mostly eye-candy for C-levels, but a spectacular map can help you to get extra funding for security :-)

The add-contextual-data() parser can add metadata to log messages from CSV files. For example you can add a host role or a contact person to a log message. This way you can see the extra information already while browsing your log messages, without needing an additional lookup. The additional information can also enable more accurate alerts and dashboards.

Using loggen to read a file

We have already seen earlier how to use loggen to send synthetic log messages to a network source. Here we extend the command line we used previously by adding file reading to the mix:

loggen -i -S -d -R /root/iptables_nohead_short localhost 514

The options used here are:

  • -i: Internet

  • -S: TCP (and unix-stream)

  • -d: do not parse

  • -R /path/to/file: read log messages from a file

  • Host & port

Why do we use the do not parse option here? For the sample configuration we want to send logs without the original date, so just the message part. The original date is not a real problem here, but will be a problem when we send the logs to Elasticsearch.

Iptables sample logs

Working with iptables logs are nice way to get started with message parsing. They follow the key=value formatting and can easily parsed by syslog-ng. Then we can use the results of the parsing and find the geographical location of the source IP address:

Feb 27 14:31:01 bridge kernel: INBOUND UDP: IN=br0 PHYSIN=eth0 OUT=br0 PHYSOUT=eth1 SRC=212.123.153.188 DST=11.11.11.82 LEN=404 TOS=0x00 PREC=0x00 TTL=114 ID=19973 PROTO=UDP SPT=4429 DPT=1434 LEN=384  
Feb 27 14:34:41 bridge kernel: INBOUND TCP: IN=br0 PHYSIN=eth0 OUT=br0 PHYSOUT=eth1 SRC=206.130.246.2 DST=11.11.11.100 LEN=40 TOS=0x00 PREC=0x00 TTL=51 ID=9492 DF PROTO=TCP SPT=2577 DPT=80 WINDOW=17520 RES=0x00 ACK FIN URGP=0  
Feb 27 14:34:55 bridge kernel: INBOUND TCP: IN=br0 PHYSIN=eth0 OUT=br0 PHYSOUT=eth1 SRC=4.60.2.210 DST=11.11.11.83 LEN=48 TOS=0x00 PREC=0x00 TTL=113 ID=3024 DF PROTO=TCP SPT=3124 DPT=80 WINDOW=64240 RES=0x00 SYN URGP=0 

To avoid duplicating the date part in the logs, we remove that before sending the logs to syslog-ng using loggen. Loggen generates proper message headers based on the current date:

kernel: INBOUND UDP: IN=br0 PHYSIN=eth0 OUT=br0 PHYSOUT=eth1 SRC=212.123.153.188 DST=11.11.11.82 LEN=404 TOS=0x00 PREC=0x00 TTL=114 ID=19973 PROTO=UDP SPT=4429 DPT=1434 LEN=384  
kernel: INBOUND TCP: IN=br0 PHYSIN=eth0 OUT=br0 PHYSOUT=eth1 SRC=206.130.246.2 DST=11.11.11.100 LEN=40 TOS=0x00 PREC=0x00 TTL=51 ID=9492 DF PROTO=TCP SPT=2577 DPT=80 WINDOW=17520 RES=0x00 ACK FIN URGP=0  
kernel: INBOUND TCP: IN=br0 PHYSIN=eth0 OUT=br0 PHYSOUT=eth1 SRC=4.60.2.210 DST=11.11.11.83 LEN=48 TOS=0x00 PREC=0x00 TTL=113 ID=3024 DF PROTO=TCP SPT=3124 DPT=80 WINDOW=64240 RES=0x00 SYN URGP=0 

Example

This example configuration collects iptables logs over the network, parses them, and adds geographical information to source IP addresses using the GeoIP parser. Finally it writes the resulting name-value pairs into a JSON formatted file.

@version:3.19
source s_sys { system(); internal();};
destination d_mesg { file("/var/log/messages"); };
log { source(s_sys); destination(d_mesg); };

parser p_kv {kv-parser(prefix("kv.")); };
parser p_geoip2 { geoip2( "${kv.SRC}", prefix( "geoip2." ) database( "/usr/share/GeoIP/GeoLite2-City.mmdb" ) ); };

source s_tcp { tcp(port(514)); };
destination d_file {
  file("/var/log/fromnet" template("$(format-json --scope rfc5424
        --scope dot-nv-pairs --rekey .* --shift 1 --scope nv-pairs 
        --exclude DATE @timestamp=${ISODATE})\n\n")  ); 
};
log {
  source(s_tcp); 
  parser(p_kv);
  parser(p_geoip2);
  destination(d_file);
};

Support for GeoIP is usually a separate sub/package on Linux systems. On FreeBSD it is not part of the default package configuration, which means that you cannot use the package but have to compile syslog-ng from ports yourself.

Note, that downloading the database for the GeoIP parser is not the scope of this tutorial.

As usual, the first few lines of the configuration deal with local log messages. The interesting part comes afterwards. Let’s follow the log statement at the end, as this is what connects all the building blocks together.

The first line opens a TCP source on port 514. The next line is where things start to get interesting. It calls a key=value parser on incoming log messages. prefix(“kv.”) here means, that the name of all resulting name-value pairs will start with “kv.”.

Next the GeoIP parser is called. It looks for the IP address in the kv.SRC name-value pair and stores the various information in name-value pairs under the geoip2 prefix.

Finally log messages are written to a file using JSON formatting. You can find more information about template functions in the documentation. Here I want to point you to the --rekey operator, which removes the leading dot from the name of name value pairs. The leading dot is normally replaced by an underscore by syslog-ng, but it has a special meaning in Elasticsearch. We also remove the DATE macro and include ISODATE with the name expected by Elasticsearch. Finally we add two line breaks for better readability. Of course we will remove those when we use the same template to send logs to Elasticsearch.

If you have any questions or comments, leave a comment on YouTube or reach out to me on Twitter / Mastodon.

-

If you have questions or comments related to syslog-ng, do not hesitate to contact us. You can reach us by email or even chat with us. For a list of possibilities, check our GitHub page under the “Community” section at https://github.com/syslog-ng/syslog-ng. On Twitter, I am available as @PCzanik, on Mastodon as @Pczanik@fosstodon.org.-

Related Content