The syslog-ng python-fetcher(): collecting load average data

Using python-fetcher() simplifies developing a source driver for syslog-ng even further. You do not have to implement your own eventloop, since syslog-ng does it for you. You only need to focus on what information you need and how you (or your code) can fetch it.

In this blog I will show you two examples. The first one is a dead end: it is a project that looked simple at first but turned out to be problematic later on. The second one is simple but still manages to illustrate most features of the python-fetcher.

Example 1 (Dead end: a much too simple MQTT source)

As a few blogs ago I implemented an MQTT destination for syslog-ng in Python, creating a MQTT source just as simply seemed like a no-brainer. . A quick check at the Paho MQTT documentation revealed that there is a method I could use with a promising name: simple(). The method does just what I needed for the python-fetcher: a one-liner that can collect one or more messages from MQTT.

Code

from syslogng import LogFetcher
from syslogng import LogMessage

import paho.mqtt.subscribe as subscribe

# read from any topics
topics = ['#']

class MQTTfetch(LogFetcher): # refer to this from syslog-ng.conf
    def fetch(self): # mandatory
        print("fetch")
        m = subscribe.simple(topics, hostname="127.0.0.1", retained=False, keepalive=1, msg_count=1)
        msg = LogMessage(m.topic + " " + m.payload.decode('utf-8'))
        return LogFetcher.FETCH_SUCCESS, msg

Configuration

source s_mqtt {
    python-fetcher(
      class("mqttfetch.MQTTfetch")
    );
};

destination d_file {
    file("/var/log/mqtt")
    );
};

log {
  source(s_mqtt);
  destination(d_file);
};

Problem

It looks simple and it IS quite simple, actually.There is one minor problem, though: you cannot easily stop syslog-ng. The subscribe.simple() is a blocking method and even though syslog-ng provides tooling for it (namely, the request_exit() method), you cannot stop it from an other method. What does it mean in practice? It means that a simple ^C cannot stop syslog-ng unless an MQTT message arrives quickly after ^C. You need kill -9 to stop it, which is far from an elegant solution, to say the least. Actually, this is what systemctl does after 90 seconds if syslog-ng does not quit for a gentle first request.

If you really want to try the code above, check the MQTT destination blog (referenced at the beginning of the blog post) for requirements. To learn about the parts of the code that are specific to syslog-ng, read the explanations below.

If you need an MQTT source, you should implement it using the regular Python source and without using the subscribe.simple() method.

Example 2 (Collecting load average data)

Collecting load average data using syslog-ng is probably not the most useful idea ever, but it is definitely an example you can easily reproduce on your own system. Moreover, you can use it as a starting point for your own application. All you need is a recent enough syslog-ng (version 3.17 or later) with Python support enabled running on Linux.

The configuration and code below read /proc/loadavg at predefined time intervals, parse it into name-value pairs and save the results into JSON format.

Configuration

As in any other case of configuring the Python binding of syslog-ng, it is mandatory to set the class name containing the code. When the code is in an external file and not included in the syslog-ng configuration, you also need to include the name of the Python file here (this time without the extension, though). You can learn more about how it works from my Python destination blog at https://www.syslog-ng.com/community/b/blog/posts/python-destination-getting-into-details.

Depending on your CPU, the python-fetcher() can execute the fetcher code hundreds of thousands times per second. In case of reading /proc/loadavg, it is probably a bit of oversampling and also affects the measured results. In the non-mandatory “options” configuration item you can set the interval in seconds. Actually, while it is optional from the python-fetcher() point of view (where only the class name is a mandatory setting),it is a mandatory option from the Python code’s point of view. As you will see later, initialization of the Python code and thus starting syslog-ng fails if this option is not configured correctly.

You can forward the parsed values to different monitoring software or store in Elasticsearch. Here I use a simple file destination and use JSON formatting to be able to save and view name-value pairs.

Finally, the source and the destination are connected using a log statement.

source s_loadavg {
    python-fetcher(
      class("loadavg.Loadavg")
      options("interval" "1")
    );
};

destination d_file {
    file("/var/log/loadavg"
      template("$(format-json --scope rfc5424 --scope nv-pairs)\n")
    );
};

log {
  source(s_loadavg);
  destination(d_file);
};

Code

Below you can read the code part of the python-fetcher that implements load average data collection. To make it easier to read, I broke the code into small pieces, shortly clarifying what the next few lines of code do.

I know quite a few people who say that code itself is documentation, so you should not clutter your code with comments. Still, my own experience and pylint scoring has a different view on the topic, so I put comments in my code.

"""
syslog-ng python-fetcher example
reads /proc/loadavg in intervals and
turns data into name-value pairs
"""

Importing time is not mandatory from the python-fetcher point of view. Here we will use it, nevertheless, as it makes it possible to configure an interval of data collection frequency.

import time

The following two imports are mandatory for python-fetcher. In case I have already mentioned pylint, note that pylint will give you error messages about these imports, as they are declared within syslog-ng and not available as “real” Python code.

from syslogng import LogFetcher

from syslogng import LogMessage

The name of the class is important, as this is what you refer to from the configuration part. From the syslog-ng point of view, the name of this class is the only mandatory configuration option. As you will see later, it is occasionally different from the Python code’s point of view.

class Loadavg(LogFetcher):
    """
    refer to this class from syslog-ng.conf
    """

On the one hand, the __init__ method is completely optional, as it is not used by syslog-ng directly. On the other hand, if you use pylint, class variables are expected to be initialized here.

    def __init__(self): # optional
        """
        initializes variables
        """
        print("constructor")
        self.fname = '/proc/loadavg'
        self.interval = 0

The init method is optional. It can be used to pass options from the syslog-ng configuration to the Python code. Instead of hard coding everything into your code, you can use these options to make your code more generic and to reuse the Python code without modifications on multiple machines.

Here it is used to set the interval. The init method fails (and thus syslog-ng does not start) if it does not receive an “interval” setting among the options and if it cannot be turned into an integer.

    def init(self, options): # optional
        """
        sets interval based on syslog-ng
        configuration options
        """
        print("init")
        print(options)
        try:
            self.interval = int(options["interval"])
            return True
        except:
            print("configure 'interval' in syslog-ng.conf as a positive number")
            return False

The open method is optional. It can be used by syslog-ng to open the source (in this case virtual file containing load average data).

    def open(self): # optional
        """
        opens the file
        """
        print("open")
        self.fhandle = open(self.fname)
        return True

The close method is optional. It can be used by syslog-ng to close the source.

    def close(self): # optional
        """
        closes the file
        """
        print("close")
        self.fhandle.close()

The only mandatory method from the syslog-ng point of view is the fetch method. This is where the Python code fetches data, creates a syslog message and, optionally, name-value pairs from the syslog message and forwards the results to syslog-ng.

Here we start with a bit of sleep, as defined in the configuration. As the file is kept open while syslog-ng is running, we need to jump to the beginning each time the fetch method is run.

While reading the file, we split the line read right away and store the results in a variable. Then we will split one of the fields even further.

Once we have all the fields from /proc/loadavg available, it is time to create a log message out of them. As a first step, we create an empty log message. This already has the date, time and hostname included, but no message yet. Actually, it will stay practically the same, except for a small difference. Namely, the next few lines create name-value pairs from the values parsed from /proc/loadavg. However, there is no actual message. This is quite OK, as the less free form log message we see, the better!

Finally, we return a status message and the log message itself.

    def fetch(self): # mandatory
        """
        reads /proc/loadavg, turns into name-value pairs
        and returns the data
        """
        print("fetch")

        time.sleep(self.interval)

        self.fhandle.seek(0, 0)
        line = self.fhandle.readline()
        loadavgtmp = line.split()
        runtmp = loadavgtmp[3].split("/")

        msg = LogMessage()
        msg["loadavg.load1"] = loadavgtmp[0]
        msg["loadavg.load5"] = loadavgtmp[1]
        msg["loadavg.load15"] = loadavgtmp[2]
        msg["loadavg.runcurr"] = runtmp[0]
        msg["loadavg.runproc"] = runtmp[1]
        msg["loadavg.lastpid"] = loadavgtmp[4]
        return LogFetcher.FETCH_SUCCESS, msg

You can learn more about the python-fetcher and the possible return codes from the documentation: https://www.syslog-ng.com/technical-documents/doc/syslog-ng-open-source-edition/3.19/administration-guide/24#TOPIC-1094549

Testing

The /proc/loadavg format looks like this:

[root@localhost ~]# cat /proc/loadavg 
0.49 0.18 0.07 1/198 1055

If you saved your Python code under /etc/syslog-ng/py, you can set up your Python environment with the following command (in Bash, your shell might be different):

export PYTHONPATH=$PYTHONPATH:/etc/syslog-ng/py

Now you can start syslog-ng in the foreground. You will see some debug messages coming from the Python code, but nothing more on screen. After a few seconds, stop syslog-ng using ^C.

[root@localhost ~]# syslog-ng -F
constructor
init
{'interval': '1'}
open
fetch
fetch
fetch
fetch
fetch
^Cclose
[root@localhost ~]# 

In /var/log/loadavg you should see some similar JSON formatted messages:

[root@localhost ~]# tail -2 /var/log/loadavg 
{"loadavg":{"runproc":"192","runcurr":"1","load5":"0.03","load15":"0.01","load1":"0.01","lastpid":"1122"},"SOURCE":"s_loadavg","PRIORITY":"emerg","HOST_FROM":"localhost","HOST":"localhost","FACILITY":"kern","DATE":"Mar  5 14:12:07"}
{"loadavg":{"runproc":"192","runcurr":"1","load5":"0.03","load15":"0.01","load1":"0.01","lastpid":"1122"},"SOURCE":"s_loadavg","PRIORITY":"emerg","HOST_FROM":"localhost","HOST":"localhost","FACILITY":"kern","DATE":"Mar  5 14:12:08"}

Code in one block (for your copy & paste convenience)

"""
syslog-ng python-fetcher example
reads /proc/loadavg in intervals and
turns data into name-value pairs
"""

import time

from syslogng import LogFetcher
from syslogng import LogMessage

class Loadavg(LogFetcher):
    """
    refer to this class from syslog-ng.conf
    """
    def __init__(self): # optional
        """
        initializes variables
        """
        print("constructor")
        self.fname = '/proc/loadavg'
        self.interval = 0

    def init(self, options): # optional
        """
        sets interval based on syslog-ng
        configuration options
        """
        print("init")
        print(options)
        try:
            self.interval = int(options["interval"])
            return True
        except:
            print("configure 'interval' in syslog-ng.conf as a positive number")
            return False

    def open(self): # optional
        """
        opens the file
        """
        print("open")
        self.fhandle = open(self.fname)
        return True

    def close(self): # optional
        """
        closes the file
        """
        print("close")
        self.fhandle.close()

    def fetch(self): # mandatory
        """
        reads /proc/loadavg, turns into name-value pairs
        and returns the data
        """
        print("fetch")

        time.sleep(self.interval)

        self.fhandle.seek(0, 0)
        line = self.fhandle.readline()
        loadavgtmp = line.split()
        runtmp = loadavgtmp[3].split("/")

        msg = LogMessage()
        msg["loadavg.load1"] = loadavgtmp[0]
        msg["loadavg.load5"] = loadavgtmp[1]
        msg["loadavg.load15"] = loadavgtmp[2]
        msg["loadavg.runcurr"] = runtmp[0]
        msg["loadavg.runproc"] = runtmp[1]
        msg["loadavg.lastpid"] = loadavgtmp[4]
        return LogFetcher.FETCH_SUCCESS, msg

If you have questions or comments related to syslog-ng, do not hesitate to contact us. You can reach us by email or you can even chat with us. For a list of possibilities, check our GitHub page under the “Community” section at https://github.com/balabit/syslog-ng. On Twitter, I am available as @PCzanik.

Anonymous