Using the regexp-parser of syslog-ng

For many years, you could use the match() filter of syslog-ng to parse log messages with regular expressions. However, the primary function of match() is filtering. Recent syslog-ng versions now have a dedicated regular expression parser, the regexp-parser(). So, you should use match() only if your primary use case is filtering. Otherwise, use the regexp-parser for parsing, as it is a lot more flexible.

Before you begin

As usual, I always recommend using the latest syslog-ng release available. However, it is not strictly necessary. Anything I write about the match() filter is valid for syslog-ng version 3.22 and newer. The regexp-parser was introduced in syslog-ng 3.34.1 with minor bug fixes applied in syslog-ng 3.35.1. If your operating system of choice features an earlier version, you can find up-to-date third party packages for many platforms at https://www.syslog-ng.com/products/open-source-log-management/3rd-party-binaries.aspx

The match() filter

The match() filter has gone through quite a few changes over the years. However, one thing never changed: even if using it as a parser is possible, its primary feature is filtering. Which means that by default, the parsed values are discarded, you can only use a single pattern, and even if you decide to keep the parsing results, you cannot specify a prefix for the resulting name-value pairs.

The regexp-parser()

Unlike the match() filter, the regexp-parser() was designed from the ground up for message parsing and storing the results. Name-value pairs parsed from log messages are saved, you can configure a prefix for the name-value pairs, and you can use multiple patterns in the same parser definition. Best of all: even if the regexp-parser() can only parse log messages, you can still build more flexible filters using the resulting name-value pairs in if statements.

But why use the regexp-parser? Doesn’t syslog-ng have more than enough parsers alreadyDon’t we always say that the PatternDB message parser is faster than any regular expression parsers? Yes, we have many parsers. However, we cannot use them in all situations. We have a key-value parser, but it does not work if the key-value pairs are embedded deeply in a longer log message. Patterndb is amazingly fast, but creating an XML database for a single log message can be an overkill.

Testing

Consider this message format from an application:

[MyApp a=b b=c c=d ABC fb679]

You need the name-value pairs and the hexa value from the end stored in JSON format. With some pain, you could most likely solve this problem with the match() filter or PatternDB. However, using the combination of the regexp-parser() and the key-value parser is easier and more straight-forward in this case.

As a first step, we create a configuration which extracts the collection of name-value pairs and the hexa value into name-value pairs. We use the example-msg-generator() as log source, so we do not have to enter a new log message each time while testing. The results are sent to stdout (to the terminal, where syslog-ng was started).

@version: 3.36
@include "scl.conf"

log {
    source {
        example-msg-generator(num(1) template("[MyApp a=b b=c c=d ABC fb679]"));
    };

    parser {
        regexp-parser(patterns('\[MyApp (?<KVPAIRS>.*)\ ABC (?<HEXA>[a-f0-9]+)]') template("$MESSAGE") prefix(".regexp."));
    };

    destination {
        file("/dev/stdout"
            template("!!!!!!!!!!!!!!!!! $(format-json --scope dot-nv-pairs)\n")
        );
    };
};

Of course, the most important part of this configuration is the regexp-parser(). It has three parameters:

  • patterns() contains the regular expressions, in this case just one.

  • template() contains the message template on which the parser operates.

  • prefix() contains the prefix used in front of the names extracted from the message.

A few words about the patterns. You better use single quotes instead of double quotes around the regular expressions, otherwise you need to escape characters a lot more. The [ is a special character in regular expressions, so you need to escape it even here. You can store part of the message in name-value pairs with expressions similar to (?<KVPAIRS>.*), where the name is set between <> and the matching expression comes right after that. Explaining regular expressions more in depth is not the scope is this blog: if you need more information about them, you should read the documentation of PCRE (Perl Compatible Regular Expressions), the variant of regular expressions syslog-ng uses by default.

You are now ready for some testing. Instead of starting the syslog-ng service in the background, we start it in the foreground, so we see debug and log messages on screen. The following capture is shortened, most of the startup messages are omitted:

czanik@czplaptop:~> /usr/sbin/syslog-ng -Fvdte -f regexp.conf --persist-file=/tmp/persist
syslog-ng: Error setting capabilities, capability management disabled; error='Operation not permitted'
[2022-04-01T18:06:21.413937] Unable to detect fully qualified hostname for localhost, use_fqdn() will use the short hostname;
[2022-04-01T18:06:21.414946] Systemd is detected as the running init system;
[…]
[2022-04-01T18:06:21.439038] syslog-ng starting up; version='3.36.1'
[2022-04-01T18:06:21.439079] Running application hooks; hook='2'
[2022-04-01T18:06:21.439106] Setting value; name='MESSAGE', value='-- Generated message. --', msg='0x55cfbde01050'
[2022-04-01T18:06:21.439115] Setting value; name='MESSAGE', value='[MyApp a=b b=c c=d ABC fb679]', msg='0x55cfbde01050'
[2022-04-01T18:06:21.439120] Incoming generated message; msg='[MyApp a=b b=c c=d ABC fb679]'
[2022-04-01T18:06:21.439128] >>>>>> Source side message processing begin; instance='internal', location='regexp.conf:10:9', msg='0x55cfbde01050'
[2022-04-01T18:06:21.439133] Setting value; name='HOST_FROM', value='czplaptop', msg='0x55cfbde01050'
[2022-04-01T18:06:21.439136] Setting value; name='HOST', value='czplaptop', msg='0x55cfbde01050'
[2022-04-01T18:06:21.439145] Setting value; name='SOURCE', value='#anon-source0', msg='0x55cfbde01050'
[2022-04-01T18:06:21.439149] >>>>>> parser rule evaluation begin; rule='#anon-parser0', location='regexp.conf:14:9', msg='0x55cfbde01050'
[2022-04-01T18:06:21.439156] regexp-parser message processing started; input='[MyApp a=b b=c c=d ABC fb679]', prefix='.regexp.', msg='0x55cfbde01050'
[2022-04-01T18:06:21.439160] regexp-parser message processing for; input='[MyApp a=b b=c c=d ABC fb679]', pattern='\[MyApp (?<KVPAIRS>.*)\ ABC (?<HEXA>[a-f0-9]+)]'
[2022-04-01T18:06:21.439172] Setting value; name='0', value='[MyApp a=b b=c c=d ABC fb679]', msg='0x55cfbde01050'
[2022-04-01T18:06:21.439176] Setting value; name='1', value='a=b b=c c=d', msg='0x55cfbde01050'
[2022-04-01T18:06:21.439179] Setting value; name='2', value='fb679', msg='0x55cfbde01050'
[2022-04-01T18:06:21.439186] Setting value; name='.regexp.HEXA', value='fb679', msg='0x55cfbde01050'
[2022-04-01T18:06:21.439190] Setting value; name='.regexp.KVPAIRS', value='a=b b=c c=d', msg='0x55cfbde01050'
[2022-04-01T18:06:21.439214] Initializing destination file writer; template='/dev/stdout', filename='/dev/stdout', symlink_as='(null)'
[2022-04-01T18:06:21.439261] affile_open_file; path='/dev/stdout', fd='11'
[2022-04-01T18:06:21.439290] <<<<<< parser rule evaluation result; result='Forwarding message to the next LogPipe', rule='#anon-parser0', location='regexp.conf:14:9', msg='0x55cfbde01050'
[2022-04-01T18:06:21.439295] <<<<<< Source side message processing finish; instance='internal', location='regexp.conf:10:9', msg='0x55cfbde01050'
[2022-04-01T18:06:21.439483] Outgoing message; message='!!!!!!!!!!!!!!!!! {"_regexp":{"KVPAIRS":"a=b b=c c=d","HEXA":"fb679"}}\x0a'
!!!!!!!!!!!!!!!!! {"_regexp":{"KVPAIRS":"a=b b=c c=d","HEXA":"fb679"}}
[2022-04-01T18:06:21.439504] Window size adjustment; old_window_size='99', window_size_increment='1', suspended_before_increment='FALSE', last_ack_type_is_suspended='FALSE'
^C[2022-04-01T18:06:24.455648] Running application hooks; hook='3'
[2022-04-01T18:06:24.455690] syslog-ng shutting down; version='3.36.1'

From this output, you can see the example message to arrive, then see as the parser creates the various name-value pairs, and finally the output message starting with tons of exclamation marks and a JSON formatted message containing the two name value pairs.

The next step is a configuration, which adds a key-value parser and extracts key-value pairs from the KVPAIRS macro:

@version: 3.36
@include "scl.conf"

log {
    source {
        example-msg-generator(num(1) template("[MyApp a=b b=c c=d ABC fb679]"));
    };

    parser {
        regexp-parser(patterns('\[MyApp (?<KVPAIRS>.*)\ ABC (?<HEXA>[a-f0-9]+)]') template("$MESSAGE") prefix(".regexp."));
    };

    parser {
        kv-parser(template("${.regexp.KVPAIRS}") prefix(".kv."));
    };

    destination {
        file("/dev/stdout"
            template("!!!!!!!!!!!!!!!!! $(format-json --scope dot-nv-pairs)\n")
        );
    };
};

The log output in this case will also contain the key-value pairs extracted by the second parser:

!!!!!!!!!!!!!!!!! {"_regexp":{"KVPAIRS":"a=b b=c c=d","HEXA":"fb679"},"_kv":{"c":"d","b":"c","a":"b"}}

What is next?

As I mentioned earlier, match() is a filter function, but you can create more flexible filters using the regexp-parser(). Why? Because you do not always want to discard the rest of the log messages and parsing the logs multiple times is not too efficient. With the regexp-parser(), you parse the log messages only once, and then you can use the resulting name-value pairs in many if statements further in the configuration. You could try this as a next step and then try to use the regexp-parser() with some real logs.

I need your feedback!

Finally, I would like to ask some feedback from you! Normally, I use logger to send log messages and store the results in files. While we developed the regular expression with a colleague last week, I learned about the example-message-generator, using trace messages and logging to stdout. I would like to share how a configuration is developed, as it would help you to learn how to do it next time on your own. But looking back, the trace messages, exclamation marks also can be confusing a bit. What is your experience?

-

If you have questions or comments related to syslog-ng, do not hesitate to contact us. You can reach us by email or even chat with us. For a list of possibilities, check our GitHub page under the “Community” section at https://github.com/syslog-ng/syslog-ng. On Twitter, I am available as @PCzanik.

Related Content