The grouping-by() parser in syslog-ng 3.8

Until recently, the correlation and aggregation of information from multiple messages was within the domain of the PatternDB parser. The limitation of this implementation is that it only worked for data extracted by PatternDB. There are now many more parsers: the CSV parser for columnar data, the JSON parser for logs in JSON format or the recently introduced key=value parser.

In my previous blog about parsing Zorp logs, I have already demonstrated that in some cases it is necessary to use multiple parsers. If you take another look at the log messages, you will see that while all log messages contain information that uniquely identifies a session, the rest of the information is spread across multiple log messages:

2016-03-04T07:10:19-05:00 zorp/http[3486]: core.session(3): (svc/http#0/http/intraPLUGinter:267346): Starting proxy instance; client_fd='32', client_address='AF_INET(', client_zone='Zone(office)', client_local='AF_INET(', client_protocol='TCP'     2016-03-04T07:10:19-05:00 zorp/http[3486]: core.session(3): (svc/http#0/http/intraPLUGinter:267346/http#0/plug): Server connection established; server_fd='35', server_address='AF_INET(', server_zone='Zone(internet)', server_local='AF_INET(', server_protocol='TCP'     2016-03-04T07:10:19-05:00 zorp/http[3486]: core.summary(4): (svc/http#0/http/intraPLUGinter:267346): Connection summary; rule_id='51', session_start='1451980783', session_end='1451980784', client_proto='TCP', client_address='', client_port='56084', client_zone='office', server_proto='TCP', server_address='', server_port='443', server_zone='internet', client_local='', client_local_port='443', server_local='', server_local_port='46472', verdict='ACCEPTED', info=''     2016-03-04T07:10:19-05:00 zorp/http[3486]: core.accounting(4): (svc/http#0/http/intraPLUGinter:267346/plug/client): accounting info; type='ZStreamFD', duration='1', sent='14643', received='984'     2016-03-04T07:10:19-05:00 zorp/http[3486]: core.accounting(4): (svc/http#0/http/intraPLUGinter:267346/http#0/plug/server): accounting info; type='ZStreamFD', duration='1', sent='984', received='14643'

Now I want to introduce you to a new parser, called grouping-by(). It was added during the development of syslog-ng 3.8, which is not yet released, thus documentation for grouping-by() is not yet available. Fortunately, the commit message has a great deal of information on how it works: It can correlate and aggregate information independent from PatternDB, but as much of the code is shared with PatternDB, PatternDB documentation is also a useful source on grouping-by() functionality:

In our case, the session identifier comes from PatternDB, while the values to be merged are coming from the key=value parser. Below you can see the configuration of grouping-by(), which aggregates the client’s IP address (zorpkv.client_address ) and the number of downloaded bytes (zorpkv.received) into a single message using the session identifier (zorppdb.sessnum) as a key. The process is used as a scope (for further information on what this means, see PatternDB documentation). The context is timed out after five seconds, which allows us to use this key even if theoretically zorppdb.sessnum could be the same for multiple sessions. To generate an aggregate message, the zorpkv.client_address needs to be present in the third message counting from the back of the message list.

parser p_groupingby {
having("${zorpkv.client_address}@3" ne "")
value("MESSAGE" "CzP session: ${zorppdb.sessnum} client: ${zorpkv.client_address}@3 down: ${zorpkv.received}\n\n")

You can try this new feature if you build syslog-ng yourself from
it sources, or use one of my unofficial RPM packages built from random Git HEAD snapshots:

If you run into any trouble, contact us live on IRC or Gitter or write to our mailing list. More information about these is available at

Related Content