Analyze your Suricata logs in real-time using syslog-ng

Last week I presented syslog-ng at SuriCon 2018 in Vancouver. In this blog post you can read a slightly modified version of that talk: a bit less emphasis on the introduction and a bit more on the explanation of the syslog-ng configuration part. The configuration uses a number of less known or quite new possibilities, so even seasoned syslog-ng admins can learn new tricks from it.

virag

Inspiration

The inspiration for this talk came from my Turris Omnia router at home. It runs Linux on an ARM platform and works as a router, firewall and container platform. At the same time, it also uses the two softwares mentioned in the title: Suricata (for network monitoring) and syslog-ng (to collect and process log messages). As it turned out, people use Suricata and syslog-ng together not only on Turris Omnia, but also on larger installations. Suricata creates JSON formatted log messages that syslog-ng can parse and do all kinds of magic to it.

Before you begin

Logs in my setup were coming from Suricata running on my Turris Omnia home router. Then again, Suricata saves logs in JSON format on any device. I showcased some features that are only available in the latest syslog-ng version (3.18). This means that I only used syslog-ng on the Turris Omnia to forward logs to another machine running 3.18. The configuration included in this blog was running on a CentOS machine. There is a good chance that your Linux distribution of choice does not have this version yet. Check https://www.syslog-ng.com/products/open-source-log-management/3rd-party-binaries.aspx for a list of 3rd party package sources for up-to-date packages.

Introduction

If you are new to syslog-ng, let me give a quick introduction to it. The syslog-ng application is an enhanced logging daemon with a focus on portability and high-performance central log collection. It has four major roles:

  • Collecting messages
    The syslog-ng application can collect log messages from a wide variety of platform-specific log sources, like /dev/log, systemd-journal(), sun-streams(). As a central syslog server, it can handle both legacy and new syslog protocols over UDP, TCP and encrypted connections. It can also read files, sockets, pipes and if none of these fit your need, you can create your own source driver in Python.

  • Processing messages
    This is probably the most important role.
    Using built-in parsers, syslog-ng can classify, normalize and structure log messages. The JSON parser is the most important one from the Suricata point of view, but there are parsers for columnar data (CSV files or Apache Access Logs), free-form text (like an SSH login message), key=value parser (like iptables logs) and more. You can also develop your own parser in Python.
    You can also rewrite log messages. This does not mean falsifying but for example, anonymization, as required by compliance regulations (PCI-DSS, GDPR, and so on).
    You can also enrich log messages with GeoIP or create additional fields based on message content.
    Finally, you can reformat log messages using templates as required by the destination you use. For example, use a specific date format or use JSON formatting.

  • Data filtering
    It has two main uses: discarding surplus log messages (for example, debug level messages) and message routing (making sure that messages reach the right destinations).
    Filtering can be based on message content and parameters using comparisons, wildcards, regular expressions and many different filtering functions. Best of all, any of these functions can be combined using Boolean operators.

  • Log storage
    This is the last step. Traditionally logs were saved locally into text files or sent over the network using the syslog protocol and saved there. Big Data destinations were added to syslog-ng a few years ago, including HDFS, Elasticsearch and Kafka. Talking about message queuing: I learned at Suricon, that many sites utilize the AMQP (RabbitMQ) destination of syslog-ng.
    The HTTP destination improved significantly in recent releases, making it both faster and more flexible. You can now use it to send logs to Elasticsearch in bulk mode and feed Splunk HTTP Event Collector at blazing speeds.
    If none of these fit your need, you can develop your own destination driver in Python or Java.

Configuring syslog-ng

My initial advice when it comes to configuring syslog-ng comes from the The Hitchhiker's Guide to the Galaxy: “Don’t Panic”. Configuring syslog-ng is simple and logical, even if it looks difficult at first sight. It is using a pipeline model. There are several different building blocks, for example, sources, destination, filters, parsers, and so on. These building blocks are connected into a pipeline using “log” statements.

A (very) basic syslog-ng.conf

By default,, when syslog-ng starts it reads the syslog-ng.conf file. The syslog-ng package in most Linux distributions contains a complex configuration that sorts logs to many different files under the /var/log/ directory. The next configuration snippet is a simplified version of the syslog-ng conf that demonstrates the major features. In addition, it saves Suricata-specific configuration to a separate file.

@version:3.18
@include "scl.conf"

# this is a comment

options { chain_hostnames(off); use_dns(no); use_fqdn(no);};

source s_sys { system(); internal();};
destination d_mesg { file("/var/log/messages"); };
filter f_default { level(info..emerg) and not (facility(mail)); };

log { source(s_sys); filter(f_default); destination(d_mesg); };

@include "/etc/syslog-ng/conf.d/*.conf"

In the previous example, the configuration starts with a version number declaration:

@version:3.18

Based on this version number, syslog-ng can suggest changes in your configuration or new default values.

You can include other configuration files from the main configuration, too. The scl.conf file includes the syslog-ng configuration library, providing you withuseful configuration snippets.

You can use comments in your configuration:

# this is a comment

While the syslog-ng configuration is pretty self-explanatory, sometimes it is still better to write a short reminder about the purpose of a given line in the configuration.Global configuration options come next. You can override most of these later while declaring the individual building blocks.

options { chain_hostnames(off); use_dns(no); use_fqdn(no);};

The following three lines show a few typical building blocks of syslog-ng:

source s_sys { system(); internal();};
destination d_mesg { file("/var/log/messages"); };
filter f_default { level(info..emerg) and not (facility(mail)); };

You definitely need a source to collect messages and a destination to save messages.There are several further options, such as filtering and parsing your log messages.

Log statements are the heart of the syslog-ng configuration:

log { source(s_sys); filter(f_default); destination(d_mesg); };

They connect all the building blocks. A typical problem is declaring a source or destination but failing to add it to at least one of the available log paths.

The last line adds another include statement to include any .conf files from the given directory:

@include "/etc/syslog-ng/conf.d/*.conf"

Most people do not modify the syslog-ng.conf that comes with with their Linux distribution of choice. Instead, they drop a new .conf file to the @include directory specified in syslog-ng.conf. This is exactlywhat I did, storing the Suricata-specific part of my configuration in suricata.conf under the /etc/syslog-ng/conf.d/ directory.

Building blocks of suricata.conf

I used the following building blocks in suricata.conf. As you can see, You can refer to more than one of each building block type, for example, you can include two destinations in a single log path. You can usemultiple log statements in your configuration However,combining everything in a single log statement made this configuration easier to read and understand. In fact,there are multiple log statements in the configuration because syslog-ng merges all the configurations on startup.

First, you have to declare a source. Technically the order of declaration in the config file is not important because only the order in the log statement makes a difference..For example, I do not keep building block declarations in the same order as in the log statement. Still, I normally start with the sources.

In this example, I use a tcp() source to receive log messages on port 514. By default,, syslog-ng parses incoming messages as if they were formatted according to syslog standards. Because I have sent pure JSON logs over the network, I use flags(no-parse) to disable parsing of messages here.

# receive Suricata logs
source s_suricata {
    tcp(ip("0.0.0.0") port("514") flags(no-parse));
};

If you run syslog-ng on the same machine as Suricata, you can replace this source with a source reading the log files, or, even better, with a socket source.


By default, syslog-ng treats log messages as text. Using the JSON parser of syslog-ng, you can turn the log messages from Suricata into name-value pairs. Once you have name-value pairs, you can work with the individual fields instead of the raw message. For example, instead of searching for an IP address in your logs, you can limit your search to source or destination addresses. Alternatively, you can save only a smaller subset of fields to your long-term storage instead of the whole message.

Adding the “suricata.” prefix to names extracted from the logs ensures that they do not collide with the built-in names of syslog-ng. The refix can be removed before storing / forwarding messages.

# parse JSON into name-value pairs
parser p_json {
    json-parser (prefix("suricata."));
};

The new GeoIP parser of syslog-ng can resolve IP addresses to geographical locations. The original GeoIP parser only resolved IP addresses to longitude/latitude. The new GeoIP parser also includes country / county / city names and more. You can use this information for anomaly detection. In addition, you can also display IP addresses found by Suricata on a map. In this case, the parser is works on the destination IP address that is recorded by Suricata and stores the result using the prefix “parsed.dest.”. The rewrite next to the GeoIP parser reformats location data to the format that is expected by Elasticsearch.

# add GeoIP information
parser p_geoip2 {
    geoip2( "${suricata.dest_ip}", prefix( "parsed.dest." ) database( "/usr/share/GeoIP/GeoLite2-City.mmdb" ) );
};
rewrite r_geoip2 {
    set(
        "${parsed.dest.location.latitude},${parsed.dest.location.longitude}",
        value( "parsed.dest.ll" ),
        condition(not "${parsed.dest.location.latitude}" == "")
    );
};

Normally, I perform a test with a file destination before using any other destinations.. This ensures that I have all the sources, filters, parsers working properly.With network-based destinations, there are a lot more possibilities for error,, such as SELinux and firewalls. It is easier to debug these errors when you are already sure that the rest of your configuration works fine.

By default, syslog-ng stores log messages in syslog format. The template below overrides and uses JSON formatting. It includes both name-value pairs parsed directly from JSON and those created by syslog-ng-based on these name-value pairs. ISODATE is the time when syslog-ng has received the log message. The following example uses almost the same template for Elasticsearch destination, except for a small difference: to make reading the text file easier, I have added \n for a line break.

destination d_suricata {
    file("/var/log/suricata.log" template("$(format-json --key suricata.* --key parsed.* --key ISODATE)\n"));
};

When I prepared this configuration, the latest release of syslog-ng was not yet available. I still used the Java-based driver for Elasticsearch. Starting with syslog-ng version 3.18, you can also use the http() destination to feed Elasticsearch with logs in bulk mode: https://www.syslog-ng.com/community/b/blog/posts/bulk-mode-message-sending-to-elasticsearch-with-syslog-ng-http-destination

destination d_elastic {
    elasticsearch2 (
      cluster("syslog-ng")
      client_mode("http")
      index("syslog")
      time-zone(UTC)
      type("syslog")
      flush-limit(1)
      server("192.168.1.187")
      template("$(format-json --key suricata.* --key parsed.* --key ISODATE)")
      persist-name(elasticsearch-syslog)
    )
};

DNS names are easy to falsify, therefore Suricata only stores IP addresses. Still, it is good to see hostnames, even if you know that you should take them with a grain of salt. In the example below, I first declare a Python parser and class name implementing the resolver code. Next, you can see the simplest possible resolver code in Python, in-line in the syslog-ng configuration. Note that it can be annoyingly slow, but works just fine for no more than a few hundred log messages per second. You can learn more about the Python parser at https://www.syslog-ng.com/community/b/blog/posts/parsing-log-messages-with-the-syslog-ng-python-parser

# resolve non-local destination IP addresses
# using Python parser
parser p_resolver {
    python(
        class("SngResolver")
    );
};
python {
"""
simple syslog-ng Python parser example
resolves IP to hostname
value pair names are hard-coded
"""

import socket

class SngResolver(object):
    def parse(self, log_message):
        """
        Resolves IP to hostname
        """

        ipaddr_b = log_message['suricata.dest_ip']
        ipaddr = ipaddr_b.decode('utf-8')

        # try to resolve the IP address
        try:
            resolved = socket.gethostbyaddr(ipaddr)
            hostname = resolved[0]
            log_message['parsed.dest.hostname'] = hostname
        except:
            pass

        # return True, other way message is dropped
        return True

};

The last building block in my syslog-ng configuration for Suricata is using add-contextual-data() to add machine function and owner data to local IP addresses. This way, you can query your logs based on machine owner name or create alerts if your print servers start to communicate with the Internet.

# add-contextual-data based on local IP address
parser p_localsrc_info {
    add-contextual-data(selector("${suricata.src_ip}"), default-selector("unknown"), database("/etc/syslog-ng/conf.d/context-info-db.csv"), prefix("parsed.src."));
};

Log statement of suricata.conf

Until now, I have only listed the building blocks for my logging pipeline, but I have not connected them yet. Only thosebuilding block get used that are in a log statement.In most cases, we simply reference the previously declared building blocks in the log statement. The included comments are more than enough in those cases. I will explain the more interesting cases:

log {

    # receive Suricata logs
    source(s_suricata);

    # parse JSON into name-value pairs
    parser(p_json);

The recently introduced if/else syntax can greatly simplify your configuration. The configuration below checks the value of the destination IP address and calls the Python parser to resolve IP addresses to names only for non-local destination IP addresses.

    # resolve non-local destination IP addresses
    # using Python parser
    if (not match("^192.168" value("suricata.dest_ip"))) {
        parser(p_resolver);
    };

The next one does just the opposite: it calls add-contextual-data() only for local source IP addresses.

    # add-contextual-data based on local IP address
    if (match("^192.168" value("suricata.src_ip"))) {
        parser(p_localsrc_info);
    };

As I have already described: first, you declare the building blocks, then you connect them using log statements. This is the traditional way of writing syslog-ng configuration for syslog-ng, but in more recent syslog-ng versions you can declare the building blocks within the log statements as well. In the next example I have defined a (quite dummy) alert that check TLS logs and alerts if anybody is reading slashdot.org. Matching lines are saved to a file in this case, but you could easily set up an e-mail or Telegram alert as well.

    # send alert if someone is reading slashdot
    if (match("slashdot.org" value("suricata.tls.sni"))) {
        destination { file("/var/log/slashdot"); };
        # ToDo: change to smtp destination
    };

There are several freely available IP address lists for malware command and control machines, or other points of interest. The configuration below compares the destination IP address with a long list of IP addresses extracted from a file, and creates a new name-value pair based on the result.

    # talking to a malware C&C
    if  {
        filter { in-list("/etc/syslog-ng/conf.d/malwarecc.list", value("suricata.dest_ip")) };
        rewrite { set("Problem", value("parsed.malware")); };
    } else {
        rewrite { set("OK", value("parsed.malware")); };
    };

The rest of the configuration calls the GeoIP-related blocks and saves log messages locally and to Elasticsearch, too.

    # add GeoIP information
    parser(p_geoip2);
    rewrite(r_geoip2);

    # save results locally
    destination(d_suricata);

    # save results to Elasticsearch
    destination(d_elastic);
};

The whole config for a better copy & paste experience

# receive Suricata logs
source s_suricata {
    tcp(ip("0.0.0.0") port("514") flags(no-parse));
};

# parse JSON into name-value pairs
parser p_json {
    json-parser (prefix("suricata."));
};

# add GeoIP information
parser p_geoip2 {
    geoip2( "${suricata.dest_ip}", prefix( "parsed.dest." ) database( "/usr/share/GeoIP/GeoLite2-City.mmdb" ) );
};
rewrite r_geoip2 {
    set(
        "${parsed.dest.location.latitude},${parsed.dest.location.longitude}",
        value( "parsed.dest.ll" ),
        condition(not "${parsed.dest.location.latitude}" == "")
    );
};

destination d_suricata {
    file("/var/log/suricata.log" template("$(format-json --key suricata.* --key parsed.* --key ISODATE)\n"));
};

destination d_elastic {
    elasticsearch2 (
      cluster("syslog-ng")
      client_mode("http")
      index("syslog")
      time-zone(UTC)
      type("syslog")
      flush-limit(1)
      server("192.168.1.187")
      template("$(format-json --key suricata.* --key parsed.* --key ISODATE)")
      persist-name(elasticsearch-syslog)
    )
};


# resolve non-local destination IP addresses
# using Python parser
parser p_resolver {
    python(
        class("SngResolver")
    );
};
python {
"""
simple syslog-ng Python parser example
resolves IP to hostname
value pair names are hard-coded
"""

import socket

class SngResolver(object):
    def parse(self, log_message):
        """
        Resolves IP to hostname
        """

        ipaddr_b = log_message['suricata.dest_ip']
        ipaddr = ipaddr_b.decode('utf-8')

        # try to resolve the IP address
        try:
            resolved = socket.gethostbyaddr(ipaddr)
            hostname = resolved[0]
            log_message['parsed.dest.hostname'] = hostname
        except:
            pass

        # return True, other way message is dropped
        return True

};

# add-contextual-data based on local IP address
parser p_localsrc_info {
    add-contextual-data(selector("${suricata.src_ip}"), default-selector("unknown"), database("/etc/syslog-ng/conf.d/context-info-db.csv"), prefix("parsed.src."));
};

log {

    # receive Suricata logs
    source(s_suricata);

    # parse JSON into name-value pairs
    parser(p_json);

    # resolve non-local destination IP addresses
    # using Python parser
    if (not match("^192.168" value("suricata.dest_ip"))) {
        parser(p_resolver);
    };

    # add-contextual-data based on local IP address
    if (match("^192.168" value("suricata.src_ip"))) {
        parser(p_localsrc_info);
    };

    # send alert if someone is reading slashdot
    if (match("slashdot.org" value("suricata.tls.sni"))) {
        destination { file("/var/log/slashdot"); };
        # ToDo: change to smtp destination
    };

    # talking to a malware C&C
    if  {
        filter { in-list("/etc/syslog-ng/conf.d/malwarecc.list", value("suricata.dest_ip")) };
        rewrite { set("Problem", value("parsed.malware")); };
    } else {
        rewrite { set("OK", value("parsed.malware")); };
    };

    # add GeoIP information
    parser(p_geoip2);
    rewrite(r_geoip2);

    # save results locally
    destination(d_suricata);

    # save results to Elasticsearch
    destination(d_elastic);
};

What is next?

I hope my blog post made you interested in the latest features of syslog-ng (or syslog-ng in general, if you are new to it). This blog post is quite long but still only enough to scratch the surface of syslog-ng. For details on how you can configure each feature, check the documentation: https://www.syslog-ng.com/technical-documents/list/syslog-ng-open-source-edition/ and look around in the blog section of https://syslog-ng.com/ at https://www.syslog-ng.com/community/.

I have tested my configuration using the latest version of syslog-ng Open Source Edition, but it should work without any modifications with syslog-ng Premium Edition, too. If you need commercial level support, check out syslog-ng Premium Edition: https://www.syslog-ng.com/products/log-management-software/.

If you have questions or comments related to syslog-ng, do not hesitate to contact us. You can reach us by email or you can even chat with us. For a list of possibilities, check our GitHub page under the “Community” section at https://github.com/balabit/syslog-ng. On Twitter, I am available as @PCzanik.

Anonymous