7 Oct 2016

Enriching log messages with additional information

Log messages already contain a wealth of information about what is happening in your systems. Still, logs can be enriched with additional information – like the geo-location belonging to an IP address – which can improve the usability of logs considerably. As these additional details are added to log messages by syslog-ng in real-time, they can help in message routing or alerting, and can be also useful for making dashboards more to the point.

Enrichment is often done based on already available data (like the output of the $HOST macro), but even more often on some information embedded in the message part. For this, you need to parse the message first. There are many possibilities for message parsing in syslog-ng. Unstructured messages – like an SSH login message – can be parsed using the PatternDB parser. Structured log parsers in syslog-ng include:

CSV parser: for columnar data, like Apache access logs
key=value parser: many firewalls log in this format
JSON parser
Linux audit.log parser

These and more are described in detail in the syslog-ng documentation.

GeoIP – find the geo-location of an IP address

If you have an IP address in your log message, it is often interesting to know its geo-location. Sometimes it is simple curiosity, like you want to see on a map where people read your latest blog. But it can also be important from a security perspective: you can be more than suspicious when all of your colleagues are in Europe, yet there is a successful login to your company CRM from the other end of the world.

Linux distributions usually include only a simple database, which provides only country information. This is sufficient in many cases, but keep in mind that more precise data is also available.

Read the syslog-ng documentation for details on how you can add geo-locations to your syslog-ng configuration.

Adding metadata from csv files

Log messages contain a lot of technical data, like user names or IP addresses. On the other hand, they miss most of the contextual information, like what is the function of the system, who administers it or what is the role of the user who just logged in. This information is often available in spreadsheets, databases or directories. Writing a connector for real-time access to each of these formats would be time-consuming and querying these sources would slow down event processing considerably. The syslog-ng application resolves this problem by loading the database at startup from a CSV file and keeping it in memory for fast access.

Take the following example:

192.168.1.1,host-role,webserver    192.168.1.1,contact-person,"John Doe"    192.168.1.1,contact-email,johndoe@example.com    192.168.1.2,host-role,mailserver    192.168.1.2,contact-person,"James Bond"    192.168.1.2,contact-email,jamesbond@example.com

For example, using the above database you can use the “host-role” field to route logs in real-time to different destinations. Also, storing the contact information along with the log message can save a lot of time when quick action is needed. It can also help to create a dashboard that enables you to track the number of problems related to a given administrator.

Read the syslog-ng documentation for information on how you can enrich your log messages with contextual data.

Additional name-value pairs based on message content

PatternDB is mostly known for being able to extract interesting fields from log messages. For example, from an SSH login message, it can extract the authentication method, the user name, the source IP address and more. The problem is that extracted fields are just about the same for both a successful and a failed login.

A lesser known feature of PatternDB is that you can also create additional name-value pairs based on message content. As PatternDB knows the context, that is, it knows where these fields were found, you can use it to add contextual information. For example, in the case of a successful SSH login:

action=login status=failure application=sshd

This way when we search for information, we can easily separate login and logout events, successful and failed logins, as well as SSH and FTP.

Conclusion

Enriching log messages opens up exciting new possibilities with syslog-ng. It can facilitate working with logs in many ways be it real-time or later analysis. Consider combining the methods mentioned in this blog. For example, you could set up e-mail alerts on successful SSH root logins, with the alert containing the IP address of the machine where the connection was initiated, together with the geo-location of the machine and contact details.

debnoblique1982oblique over 4 years ago

I think they miss most of the contextual information, like what is the function of the system, who administers it or what is the role of the user who just logged in.

- Plumbers
- Cancel
- Vote Up 0 Vote Down
- More
- Cancel

Enriching log messages with additional information

GeoIP – find the geo-location of an IP address

Adding metadata from csv files

Additional name-value pairs based on message content

Conclusion