Parsed web server logs to the cloud: syslog-ng SCL

The syslog-ng configuration library (SCL) can help you to configure syslog-ng a lot more easily. These configuration snippets can hide away the complexity of collecting, parsing or storing log messages. From this blog you can learn how to parse web server logs and store the results at a Logging as a Service (LaaS) provider in a structured form. You will use SCL both for message parsing and the LaaS destination, and also utilize the wildcard-file() source introduced in syslog-ng 3.10.

This is the second, advanced part of the Wildcard posts. Read the first, beginner part here.

Before you begin

Before configuring syslog-ng you should have a web server already up and running. In my example I use Apache HTTPD access logs running on openSUSE, but you should be able to adopt the configuration to any software that is logging in the Apache Combined Log Format easily just by changing file names.

If you want to test the cloud part you will also need an account at one of the providers with ready-to-use SCL. In my examples I use Loggly, but it should work with minimal modifications with other providers too.

Configuring syslog-ng

The following configuration reads any file that has a filename ending with “access_log” from the /var/log/apache2 directory. It parses them and then sends all messages in JSON format to Loggly. You should append these configuration snippets to your syslog-ng.conf or in a separate .conf file under /etc/syslog-ng/conf.d/ if supported by your Linux distribution.

First, define a wildcard-file() source. There are two mandatory parameters:

  • base-dir() configures the directory where syslog-ng looks for log files to read. In this case, it is the /var/log/apache2 directory.
  • filename-pattern() accepts a simple glob pattern which defines files to search for. A “*” represents zero or more characters, while a “?” represents a singe character. In this case, it is any file name that ends with “access_log”. If you have another naming structure for file names, make sure that the glob does not include error logs, because the apache-accesslog-parser() cannot parse those. It will be used later in this example.

The no-parse flag is necessary in this example, because by default syslog-ng parses messages using the syslog parser, but Apache HTTPD uses its own format for logging. For a complete list of wildcard-file() options check the documentation at https://www.syslog-ng.com/technical-documents/doc/syslog-ng-open-source-edition/3.16/administration-guide

source s_apache2 {
  wildcard-file(
    base-dir("/var/log/apache2/")
    recursive(no)
    filename-pattern("*access_log")
    flags(no-parse)
  );
};

Next, parse the log messages. Fortunately, web server logs are created in an easy-to-parse format to facilitate log analysis. The Apache Combined Log has many fields which can be parsed using a CSV parser. If you take a closer look, you can see that some of the fields can be parsed further. You can use the apache-accesslog-parser() parser instead. It is not another parser implemented in C, but an SCL that calls the csv-parser() with all the necessary parameters to create name-value pairs. It even does extra parsing on the request and date to provide more information.

Note: if you do not configure prefix(), it is set to “.apache.” by default. Names starting with a dot (such as the default “.apache.”) might cause unforeseen problems for example in Elasticsearch.

parser p_apache2 {
  apache-accesslog-parser(
    prefix("apache.")
  );
};

Next, define a destination. Here I am using the loggly() destination of syslog-ng. It is another snippet from the SCL. Loggly requires messages in a specific format and also a token, a unique identifier from Loggly. You can pass it on using the token() parameter.

By default the message sent as-is to Loggly. As you already have the logs parsed and available as name-value pairs, it is better to use the template() parameter and send logs in JSON format. Loggly parses JSON messages automatically, so you can query your web server logs based on name-value pairs without further configuration on the Loggly side.

destination d_loggly {
  loggly(
    token("TOKEN_FROM_LOGGLY")
    template("$(format-json --scope nv-pairs)")
  );
};

Finally, define a log statement that connects the source, parser and destination together:

log {
  source(s_apache2);
  parser(p_apache2);
  destination(d_loggly);
};

Save the configuration and reload it using “syslog-ng-ctl reload”.

Verifying your configuration

Log on to Loggly (or your LaaS provider of choice) and check if messages arrive. You should see the basic syslog fields followed by JSON data. Fields extracted by the apache-accesslog-parser() are listed under “apache:” because this is the prefix we configured.

How to write your own SCL

You can create an SCL – a reusable configuration snippet – also yourself. If you have a data format that can be analyzed by combining a couple of existing parsers, or a new destination that needs a bit more complex configuration, writing an SCL can help you in the long run. Getting started does not require programming knowledge just understanding syslog-ng configuration basics.

There are many resources to get you started:

Related Content