9 Mar 2016

Figuring out how to parse your firewall logs is not always easy. This blogpost shows you some useful log-parsing techniques. For the examples, I use the logs of the Zorp proxy firewall, developed now by Balasys, the Hungarian distributor of Balabit products.

Zorp is a next generation proxy firewall with deep protocol analysis. It allows you to inspect, control, and modify traffic on the application layer of the ISO/OSI model (Layer 7). You can make decisions about the traffic based on application-level data.For example, you can replace the value of a specific HTTP header or you can allow full access to an FTP server for a group of users, but permit only read-only access for others – without having to modify the FTP server. Being able to inspect SSL/TLS encrypted channels, Zorp allows you to process the encrypted traffic with external virus or spam filtering engines, to avoid downloading malware in secure connections. The core of Zorp is implemented in C/C++11, making it very fast, while the Python-based configuration language makes it endlessly flexible. Just like syslog-ng, Zorp has two editions. The open source version called Zorp GPL can be configured from the command line, and supports HTTP, FTP, SMTP and POP3 protocols and their encrypted versions. The commercially supported version (called the Zorp Gateway) has more protocols, centralized antivirus and spamfiltering, central management for several firewalls, and a GUI.

Some log samples

The next few lines show some of the log messages related to a HTTPS connection in Zorp. Logging in Zorp is very flexible, you can configure the log level for its components separately. This ensures that only data necessary for daily operation is preserved in the logs and, also greatly helps debug firewall rules if necessary.

These are some raw log messages from Zorp covering a single session:

2016-03-04T07:10:19-05:00 127.0.0.1 zorp/http[3486]: core.session(3): (svc/http#0/http/intraPLUGinter:267346): Starting proxy instance; client_fd='32', client_address='AF_INET(172.168.65.4:56084)', client_zone='Zone(office)', client_local='AF_INET(173.252.120.68:443)', client_protocol='TCP'    2016-03-04T07:10:19-05:00 127.0.0.1 zorp/http[3486]: core.session(3): (svc/http#0/http/intraPLUGinter:267346/http#0/plug): Server connection established; server_fd='35', server_address='AF_INET(173.252.120.68:443)', server_zone='Zone(internet)', server_local='AF_INET(91.120.23.97:46472)', server_protocol='TCP'    2016-03-04T07:10:19-05:00 127.0.0.1 zorp/http[3486]: core.summary(4): (svc/http#0/http/intraPLUGinter:267346): Connection summary; rule_id='51', session_start='1451980783', session_end='1451980784', client_proto='TCP', client_address='172.168.65.4', client_port='56084', client_zone='office', server_proto='TCP', server_address='173.252.120.68', server_port='443', server_zone='internet', client_local='173.252.120.68', client_local_port='443', server_local='91.120.23.97', server_local_port='46472', verdict='ACCEPTED', info=''    2016-03-04T07:10:19-05:00 127.0.0.1 zorp/http[3486]: core.accounting(4): (svc/http#0/http/intraPLUGinter:267346/plug/client): accounting info; type='ZStreamFD', duration='1', sent='14643', received='984'    2016-03-04T07:10:19-05:00 127.0.0.1 zorp/http[3486]: core.accounting(4): (svc/http#0/http/intraPLUGinter:267346/http#0/plug/server): accounting info; type='ZStreamFD', duration='1', sent='984', received='14643'

If you look closer, you will see an ISO time stamp, that it was received from localhost and that the message was sent by a process identifying itself as “zorp/http”. The actual firewall log starts after the colon. It starts with something what looks like a module and component name followed by a log level in parentheses. Zorp terminology calls it the log tag. The next section is also enclosed in parentheses. When you browse your logs, you might think, that the number after the colon can be used as a single session identifier. While in practice it is mostly true, you also need the preceding fields as well to be on the safe side. (These fields describe some of Zorp’s configuration internals: Zorp rules are organized into groups called “instances”. Instances not only allow the grouping of rules, but also separate the processing of traversing traffic into different system processes to enhance confidentiality and availability of these services. The instance name is included in both the process name (with the zorp/ prefix) and in the session identifier.) After a short text description, the rest of the log consists of name-value pairs. You can read more about the format here: https://www.balabit.com/documents/zorp-6.0-guides/en/zorp-gateway-guide-log/html/preface.html

Parsers of syslog-ng

Before going further with Zorp logs, I would like to introduce you to the parsers of syslog-ng. When a log messages reaches syslog-ng, it is usually parsed as a syslog message. This creates a basic set of name-value pairs: date, host name, program name and the actual message as a single value. On the other hand, some of the logs might arrive in other formats, like Apache access logs, or contain important information like user names, IP addresses, and so on. Parsers of syslog-ng interpret the content of messages and create name-value pairs from them. The resulting name-value pairs can be used in filtering, naming files or be stored into a database (usually NoSQL, like MongoDB or Elasticsearch), where it can be searched more efficiently than flat files.

CSV parser

While CSV stands for comma separated values, this parser can handle any kinds of logs which have a fixed columnar structure. A typical example is Apache access logs. For more details, check the documentation at https://www.syslog-ng.com/technical-documents/doc/syslog-ng-open-source-edition/3.16/administration-guide

JSON parser

The JSON parser can separate parts of JSON-encoded log messages to name-value pairs. You can refer to the separated parts of the JSON message using the key of the JSON object. For more details, check the documentation at https://www.syslog-ng.com/technical-documents/doc/syslog-ng-open-source-edition/3.16/administration-guide

Key-value parser

The key-value parser is the most recent parser in syslog-ng OSE. It was introduced in syslog-ng 3.7.2. It can find key=value pairs separated by comma or white space in log messages. For more details, check the documentation at https://syslog-ng.com/documents/html/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html-single/index.html#key-value-parser

PatternDB parser

While all of the previously mentioned parsers can locate information in structured log messages, the PatternDB parser can use unstructured log messages as input by comparing them to predefined message patterns. As it uses previously known messages as a basis, it can also classify log messages or create additional name-value pairs based on the message content. For more details, check the documentation at https://www.syslog-ng.com/technical-documents/doc/syslog-ng-open-source-edition/3.16/administration-guide

Parsing Zorp log messages

Now, that we have seen the parsers available in syslog-ng, take another look at the sample log messages. The first part looks ugly, but the rest of the lines contain key-value pairs. Let’s start with the easy part. Add a parser, which finds us all of the key value pairs in the logs. The configuration could not be simpler:

parser p_kv {      kv-parser (prefix("zorpkv."));  };

This is a key-value parser, which prefixes all found keys with “zorpkv.” to make sure that none of the found keys clash with any of the existing names. If you look at the sample parsed message in the next section, you will find the parsed values under the “zorpkv” token in the JSON formatted message.

Half of our job is done: key-value pairs describing the network connections are now found by syslog-ng. What is missing? A session identifier linking all of these information together. This is where the PatternDB parser can help us. The initial configuration is simple:

parser p_patterndb {    db-parser(file("/etc/syslog-ng/conf.d/zorp.xml"));  };

The more difficult part is writing an XML file describing the log messages. The sample XML below works for HTTP connections and is a bit simplified, knowing that some of the fields never changed in the logs I checked:

<?xml version='1.0' encoding='UTF-8'?>  <patterndb version='3' pub_date='2016-03-03'>    <ruleset name='zorp' id='0d0264f7-e25e-4d60-b1f6-4c23c8aeff8b'>      <description>        A very basic Zorp pattern...      </description>      <url>https://www.balabit.com/network-security/zorp-gateway</url>      <pattern>zorp/http</pattern>      <rules>        <rule provider="CzP" id="dc947a99-7526-40dc-aec3-ab105c6c241d" class="system">          <patterns>            <pattern>@STRING:zorppdb.module@.@STRING:zorppdb.component@(@NUMBER:zorppdb.loglevel@): (svc/http#0/http/@STRING:zorppdb.instance@:@NUMBER:zorppdb.sessnum@@ANYSTRING@</pattern>          </patterns>          <examples>            <example>             <test_message program="zorp/http">core.accounting(4): (svc/http#0/http/intraPLUGinter:268899/http#0/plug/server): accounting info; type='ZStreamFD', duration='61', sent='1097', received='4889'</test_message>             <test_values>              <test_value name="zorppdb.module">core</test_value>              <test_value name="zorppdb.component">accounting</test_value>              <test_value name="zorppdb.loglevel">4</test_value>              <test_value name="zorppdb.instance">intraPLUGinter</test_value>              <test_value name="zorppdb.sessnum">268899</test_value>             </test_values>            </example>          </examples>        </rule>      </rules>    </ruleset>  </patterndb>

In the first <pattern> field you can configure the program name. In our case it is “zorp/http”. Next come the rules, well, in this case just one. It contains a single rule and a single pattern. The inner <pattern> field is the most important part of this database, as this line describes the fix and variable parts of the log message. I used the “zorppdb.” prefix in field names to see, that these values are originating from PatternDB. Providing a test message and test values is an optional, but very useful functionality of the pattern database. For more details on how to create a pattern database read the documentation at https://www.syslog-ng.com/technical-documents/doc/syslog-ng-open-source-edition/3.16/administration-guide

Storing the logs in Elasticsearch

Parsing only makes sense if you can make use of the created name-value pairs. One of the most popular log destinations in syslog-ng recently is Elasticsearch. One of the reasons is that unlike a traditional SQL databases, it can store any number of name-value pairs. The other one is Kibana, a web-based search and visualization interface for data stored in Elasticsearch.

Before sending logs into Elasticsearch, you might want to check if parsing works properly. An easy way is to use templates combining the original log message with the JSON template function and see if all name value pairs appear in the JSON formatted logs as expected. Here is the configuration:

destination d_json {      file("/tmp/test.json"          template("$ISODATE $HOST $MSGHDR$MSG\n$(format-json --scope rfc5424 --scope nv-pairs --exclude DATE --key ISODATE @timestamp=${ISODATE})\n\n"));  };

And how it looks like in the log file:

2016-03-04T07:10:19-05:00 127.0.0.1 zorp/http[3486]: core.session(3): (svc/http#0/http/intraPLUGinter:267346): Starting proxy instance; client_fd='32', client_address='AF_INET(172.168.65.4:56084)', client_zone='Zone(office)', client_local='AF_INET(173.252.120.68:443)', client_protocol='TCP'  {"zorppdb":{"sessnum":"267346","module":"core","loglevel":"3","instance":"intraPLUGinter","component":"session"},"zorpkv":{"client_zone":"Zone(office)","client_protocol":"TCP","client_local":"AF_INET(173.252.120.68:443)","client_fd":"32","client_address":"AF_INET(172.168.65.4:56084)"},"SOURCE":"s_kv","PROGRAM":"zorp/http","PRIORITY":"info","PID":"3486","MESSAGE":"core.session(3): (svc/http#0/http/intraPLUGinter:267346): Starting proxy instance; client_fd='32', client_address='AF_INET(172.168.65.4:56084)', client_zone='Zone(office)', client_local='AF_INET(173.252.120.68:443)', client_protocol='TCP'","LEGACY_MSGHDR":"zorp/http[3486]: ","ISODATE":"2016-03-04T07:10:19-05:00","HOST_FROM":"127.0.0.1","HOST":"127.0.0.1","FACILITY":"daemon","@timestamp":"2016-03-04T07:10:19-05:00"}

As everything looks like as expected we can configure an Elasticsearch destination, based on the above JSON template:

destination d_elasticsearch {  elasticsearch(      client_lib_dir(/usr/share/elasticsearch/lib/)      index("syslog-ng_${YEAR}.${MONTH}.${DAY}")      type("test")      cluster("zorpes")      flush_limit("1000")      template("$(format-json --scope rfc5424 --scope nv-pairs --exclude DATE --key ISODATE @timestamp=${ISODATE})")  );  };

Finally here is the log statement gluing all of these parts together:

log {      source(s_zorp);      parser(p_kv);      parser(p_patterndb);  #    destination(d_json);      destination(d_elasticsearch);  };

Now you should be able to search your logs in Kibana. As all of the important data is stored as name-value pairs, the next step is to create diagrams and dashboards from your firewall logs.

Read our Elasticsearch white paper for practical details about how to get started storing logs in Elasticsearch using syslog-ng.

Bonus: grouping_by

For a long time, correlation of log messages was limited to PatternDB. With the upcoming syslog-ng version 3.8 there will be a new tool for correlation called the grouping_by() parser. It is not bound to PatternDB, and can use any name-value pairs. This is important, as parsing Zorp messages involves both PatternDB and the key-value parser. I am still experimenting with this new feature, so it will be the topic for another blog. For now see the commit message for details: https://github.com/balabit/syslog-ng/commit/662aab3df504a03bdb8d6908bf8117154d41599e

Making sense of Zorp firewall logs using syslog-ng