Sometimes you have many log messages from an app, but none of them have the exact content you need. This is where the grouping-by() parser of syslog-ng can help. It allows you to aggregate information from multiple log messages into a single message.

In this blog, I will show you how to parse sshd logs using the patterndb parser of syslog-ng, and then create an aggregate message from the opening and closing log message using grouping-by.

Before you begin

The grouping-by() parser of syslog-ng arrived in version 3.8, which means that all currently available syslog-ng versions should support it, except for official SLES 12 and EPEL 7 packages.

You also need patterns (XML files describing the content of log messages) to parse sshd log messages. These are available in the syslog-ng example patterns project from 2010: https://github.com/balabit/syslog-ng-patterndb/tree/master/access It is over a decade old, but it still works as expected.

The logs

Here are some sample log messages from sshd. The first one is generated when a user logs in, the second one is generated when the session is closed.

Mar  4 09:02:43 172.16.167.182 sshd[1295]: Accepted password for root from 172.16.167.1 port 37766 ssh2
Mar  4 09:02:56 172.16.167.182 sshd[1295]: pam_unix(sshd:session): session closed for user root

We want a log messagethat contains both the user name and source host when the user logs out. It should look something like this:

Mar  4 09:02:56 172.16.167.182 sshd[1295]: CzP ssh user root from 172.16.167.1 logged out

Of course this is probably not the most useful log message, but it is good enough to demonstrate the possibilities in syslog-ng.

Configuring patterndb

As a first step, we should build a configuration, which collects logs from a network source, parses the logs using patterndb, and saves them to a JSON-formatted file, so we can see the parsed name-value pairs.

# source for Linux clients, RFC3164
source s_lin {
  tcp(port(514));
};

# patterndb parser for sshd connection logs
parser p_sshd {
  db-parser(file("/opt/syslog-ng/etc/sshd.pdb"));
};

# destination for Linux logs
destination d_fromlin {
  file("/var/log/fromlin");
  file("/var/log/fromlin.json" template("$(format-json --scope rfc5424 --scope dot-nv-pairs
        --rekey .* --shift 1 --scope nv-pairs)\n") );
};

# log path for Linux logs
log {
  source(s_lin);
  parser(p_sshd);
  destination(d_fromlin);
};

Here are the JSON-formatted sshd logs:

{"usracct":{"username":"root","type":"login","sessionid":"1295","service":"ssh2","device":"172.16.167.1","authmethod":"password","application":"sshd"},"secevt":{"verdict":"ACCEPT"},"classifier":{"rule_id":"4dd5a329-da83-4876-a431-ddcb59c2858c","class":"system"},"SOURCE":"s_lin","PROGRAM":"sshd","PRIORITY":"info","PID":"1295","MESSAGE":"Accepted password for root from 172.16.167.1 port 37766 ssh2","LEGACY_MSGHDR":"sshd[1295]:","HOST_FROM":"172.16.167.182","HOST":"172.16.167.182","FACILITY":"authpriv","DATE":"Mar  4 09:02:43"}
{"usracct":{"username":"root","type":"logout","sessionid":"1295","application":"sshd"},"classifier":{"rule_id":"9febec68-13ef-4ed2-97f3-689df4d49a8a","class":"system"},"SOURCE":"s_lin","PROGRAM":"sshd","PRIORITY":"info","PID":"1295","MESSAGE":"pam_unix(sshd:session): session closed for user root","LEGACY_MSGHDR":"sshd[1295]: ","HOST_FROM":"172.16.167.182","HOST":"172.16.167.182","FACILITY":"authpriv","DATE":"Mar  4 09:02:56"}

Obviously, JSON-formatted log messages are a pain to read from text files. However, this is the easiest way to store name-value pairs from message parsing. Another option is to store parsed logs on Elasticsearch or MongoDB. Either way, you can now check if message parsing works as expected.

Configuring grouping-by()

Message aggregation was originally only part of the PatternDB parser. However, there are now many more parsers in syslog-ng, so making it independent from PatternDB was necessary. This way, you can correlate values from different parsers.

You can find the grouping-by() rule for aggregating sshd logs below:

# aggregating sshd logs
parser p_groupsshd {
   grouping-by(
     key("${PID}")
     scope("process")
     timeout(3600)
     trigger("${usracct.type}" eq "logout")
     aggregate(
       value("MESSAGE" "CzP ssh user ${usracct.username} from ${usracct.device} logged out\n\n")
       inherit-mode("context")
     )
     inject-mode("pass-through")
     where("${PROGRAM}" eq "sshd")
   );
};

You also need to change the log path to include the grouping-by() parser:

# log path for Linux logs
log {
  source(s_lin);
  parser(p_sshd);
  parser(p_groupsshd);
  destination(d_fromlin);
};

You can find a lot more information about the grouping-by parser in the documentation: https://www.syslog-ng.com/technical-documents/doc/syslog-ng-open-source-edition/3.38/administration-guide/100#TOPIC-2026512 In this blog, I want to focus on the key configuration options:

  • All sshd messages have the same process ID (PID). Therefore, key() is set to PID and scope to process.

  • We only want to check sshd logs, so the where() filter is set to sshd.

  • The user is logged out when the name-value pair “useracct.type” is set to “logout”. We use it as a trigger to stop searching and create an aggregate message.

  • The timeout() option is mandatory and set to one hour, which should be fine in most cases.

  • The aggregate log message contains both the user name and the source host in a single message. And if you wonder what “CzP” stands for: all my log messages contain my initials to find them easier :-)

Testing

The above configuration expects us to receive logs over the network. On a RHEL machine a similar configuration snippet can forward all logs:

destination d_net {
  tcp("172.16.167.170" port(514));
};
log {
  source(s_sys);
  destination(d_net);
};

On other operating systems the source name for local logs might be different. Of course, the IP address of the syslog-ng server on your network is different, and on a production system you might want to filter what to forward, use disk-buffer, encryption, and so on. For testing, the above configuration is good enough.

Once you have checked that log forwarding works as expected, connect to the host using ssh. After logging out, you should see a similar message in your logs:

Mar  4 09:02:56 172.16.167.182 sshd[1295]: CzP ssh user root from 172.16.167.1 logged out

What is next?

Using the sshd pattern from GitHub and the configuration from above is an easy way to get started with grouping-by. The next step is getting it to work with your own logs. If your logs are unstructured, you might need to write patterns for them yourself. The documentation and the example patterns can help with that. If you work with structured logs, you can focus on the grouping-() rules as soon as you understand your log messages.

-

If you have questions or comments related to syslog-ng, do not hesitate to contact us. You can reach us by email or even chat with us. For a list of possibilities, check our GitHub page under the “Community” section at https://github.com/syslog-ng/syslog-ng. On Twitter, I am available as @PCzanik, on Mastodon as @Pczanik@fosstodon.org.

Related Content