Show Transcript
Hide Transcript
My name is Craig Finn, and I'm a systems engineer with One Identity, specializing in syslog-ng. And in this session, I'm going to describe how you can utilize syslog-ng with the Google Cloud Pub/Sub application.
So syslog-ng, as you probably well know, is an application that allows you to centrally collect, process, and securely store log messages. It's an application that is suitable for being your enterprise log management layer. So you only need one application that gives you high performance, high reliability, and it's very feature rich and easy to use. But that's the only tool that you will, in fact, need for log collection.
And what it does, is not only collects, processes, and stores logs from all of the devices and applications in your enterprise, it can also send those messages, or subsets of those messages, to other applications-- other security tools, SIEM tools, databases-- many, many other applications that you might want to use to analyze and process the logs in other ways. And it's very often used to optimize SIEM ingestion. So in other words, we take care of ingesting logs, doing some processing, categorization, and filtering, then send them on down to your SIEM tool, which can then run in a much more efficient manner, because it doesn't have to do what it's run not really optimized for. It can specialize on what it's designed to do, which is analyze and provide actionable intelligence on a data that syslog-ng is providing to it.
And syslog-ng itself-- and in particular, I'm talking about syslog-ng premium edition, which is the edition of syslog-ng that comes with commercial support from One Identity. You may be familiar with the open source version of syslog-ng. Premium edition is very much in that same vein, but it has many enhancements, and, of course, it obviously is supported by One Identity-- with up to 24 by 7 support, and in fact, enhanced support levels even above that. But syslog-ng is-- essentially, it's a Linux application, and you can run it in one of three modes-- server mode, which is the most common. That's where you would ingest logs, process, and store them. but you can also deploy it in what we call relay mode and client mode.
I'll have more to say about relay mode in a couple of slides. Client mode is a means by which you can install it on a Linux VM or other instance and use it to replace the native syslog daemon that may have come with that distribution that you're using. So it's not required, you don't have to have our code running on an endpoint to collect logs from it, but it gives you some additional capabilities when you do deploy syslog-ng in the client.
And you know, I promise to talk about relays. Relays have a couple of very important uses. One of them is to enhance-- greatly enhance-- the reliability of log collection from devices or applications that may be constrained for one reason or another to utilize UDP as the transport mechanism. And UDP is, of course, it's very efficient. It was the network protocol that was designed into the syslog protocol to begin with, back in the late 80s. But it does have one problem.
UDP is, by design, an unreliable protocol. It's really fire and forget. It's when a sender sends a datagram via UDP, he sends it, it gets into the network, neither the sender nor the receiver is aware of any problems with that datagram. It could get dropped somewhere along the network at a router interface, and no one's any the wiser. And that could be a problem, because you're depending on your log analysis tools and security tools to really be aware of what's happening in your network. These UDP datagrams, they may be carrying essential information, and that needs to be attended to immediately. And if they get dropped, that's obviously not available to you.
So what you can do is, in cases where you have UDP sources that maybe have to traverse a complex network with a lot of network hops, and they're using UDP, you can locate a syslog-ng relay close to those sources-- say it maybe it's in a remote site. The relay-- since it's close to the sources, from a network topology sense-- will have a high chance that the UDP packets will make it intact to the relay. The relay then can resend those messages using a more reliable protocol-- like TCP/IP.
It can also help you if you have extremely large organizations, with perhaps multi tens of thousands of simultaneous TCP connections. In that case, you can have a TC-- relay multiplex that multiple tens of thousands of TCP connections down to one connection to your server at your central or primary data center. So they're very useful in very large environments.
And of course, you can integrate syslog-ng with other parts of your network infrastructure. So you can also have syslog-ng instances running behind a network load balancer. And that could provide, obviously, load balancing and scaling, and also reliability. The load balancer could have health checks, and if it determines that one of the servers is down, it could make sure that all the messages get sent to the surviving instance.
Now I'm going to describe syslog-ng in a little more detail here. So this block diagram shows the essential part of syslog-ng, which is that it's a log or message processing pipeline. In fact, it's an oversimplification in some ways, because it's not just one pipeline, but it's many, many parallel pipelines, many log processing paths, and then many parallel network destinations. And all of these paths, sources, and destinations are multi-threaded for performance.
But in essence, what happens is syslog-ng will be collecting messages, either from the network, and it will do that using any protocol-- any network protocol, any syslog protocol, from any device, from any application. It will also collect log messages that are being created on an instance itself. So in other words, any application log files or platform specific log files, say from the journald or the kernel. It will also harvest those, it will run them through these log paths. Log paths are where you will do the processing. And that processing will consist of filtering, parsing to extract name value pairs, and possibly, otherwise unstructured log messages.
It will also allow you to do transformations. You may have to send a log message from one of your sources to another application that requires a specific format that's different from the format that the sender used. You can very easily transform it to make it match what the destination requires. You can also enrich logs. So you can add metadata to the basic raw log message to add context to the message, to aid in filtering, or to make it more usable to one of those downstream applications.
And of course, the syslog-ng server can store files-- can do it very securely. It could store it either in text format or binary format. And the binary format that's provided with syslog-ng premium edition gives you the capability to encrypt those log files, also to compress them to a very large degree, and also gives you the ability to provide timestamping-- you can have the log messages timestamped by an external time stamping authority.
And then of course, the resulting messages having gone through the log paths can be routed to other applications, as I mentioned before. And that routing can be many to many. So you can have-- one message can go to many different destinations. And in fact, all of the processing. You could have one message coming through your source and it could go through several different log paths. So you can process a single message, or set of messages, in many different ways. They can go to different destinations.
The other feature that syslog-ng provides is what we call flow control. This gives syslog-ng the ability to monitor the ability of downstream destinations to accept log messages. So, and it can modulate the flow of the messages incoming from the sources to match the ability of downstream sources to process them. And this is very important, because you don't want to have a situation in which a downstream receiver is not able to handle messages. Maybe they're too busy, there's a buffer full on the downstream end, maybe there's a network problem and they can't, at this moment, accept more messages. You want to be able to tone down the syslog-ng's sending of messages to match that, to make sure that nothing gets overrun and no log messages drop into the bit bucket anywhere-- as well as having that basic flow control.
Syslog-ng, in all of its implementations, gives you the capability to have an on-disk buffer. So if for some reason it is temporarily impossible for this like energy to get a message out to one of its destinations, it can store the messages on an on-disk queue, to make sure they're securely put into a queue where they can't be erased or lost. And then when the downstream application is again able to process-- maybe the network condition is cleared, the machine is no longer busy, or it's back up after a reboot-- syslog-ng will go into that queue, and then send the messages out in the order that they had come in from the sources.
And of course, now what I want to talk about in particular is the topic of this session, which is, how do we talk to Pub/Sub in the Google Cloud platform? And this is something that was introduced in-- just recently, in fact. The latest version of syslog-ng, 7.0.22, we now have the ability to both receive message streams from Google Cloud Pub/Sub, and also send messages from syslog-ng into Google Cloud Pub/Sub. So we can both be a subscriber-- let's use the terminology of the message broker technology-- or we can be a publisher.
So what is Google Pub/Sub? Well, it's a messaging-- asynchronous messaging service, one of several that are out there. And what these do is they essentially decouple any type of service that's producing events from the services that consume those events. It decouples it, or loosely couples it, so the consumer of the event does not necessarily have to be always online. It could be it could be disaggregated from the sender, both and obviously a space standpoint, but also from a time standpoint. So they don't have to be simultaneously up and running in order for that flow of information to pass from one to the other.
It's topic-based. So the sender would associate messages with a particular topic, or set of topics, and then the receiving application will subscribe to subscriptions which are connected to various topics. And of course, being a Cloud platform, it offers high availability and scalability. So in typical Cloud fashion, there will be resources in many different geographic locations to handle this. There'll be a lot of redundancy, and they could dynamically scale, depending on the particular load at a point in time.
So let's look how syslog-ng would fit into this picture. And as I mentioned, syslog-ng can act as a publisher. So it can send information into Google Cloud via Pub/Sub. So in this kind of architecture, you would have your typical log sources in a syslog-ng log processing world. They could either be on-prem devices or applications or they could be cloud resources, as well, sending messages to syslog-ng, and on of syslog-ng's destinations can then be Pub/Sub. So syslog-ng can send messages, different streams, to multiple different topics, if desired. Those topics can be matched with subscriptions to which other applications can then subscribe. And again, this also works in a many to many type of configuration. So you could have multiple subscribers subscribing and receiving information from multiple topics.
So this is how syslog-ng would work in a publishing mode-- in other words, sending logs from syslog-ng into Pub/Sub-- but we can also work as a subscriber. So in this case, you would have other applications somewhere-- and they could be cloud applications, either within Google Cloud, or they could be from another cloud, as well. Or they could be on-prem applications sending their messages via topics into Pub/Sub. And then on the other end, syslog-ng becomes a subscriber. So it would subscribe to whatever subscriptions are appropriate. These would come into syslog-ng as syslog-ng sources. So we would have however many sets of sources are required to satisfy all the subscriptions that you need to match.
And it's very easy to set it up. Obviously, what you need is a Google Cloud account, right? That's obvious. You need to have a project in Google Cloud, and that can be an existing one or you can create a new one, if you prefer. You'll need a service account associated with that project, and that service account will provide you-- you'll be provided with the credentials key. That'll be a JSON file with your credentials that you'll securely store on your syslog-ng implementation. And then you'll need, at the very least, syslog-ng premium edition 7.0.22-- which is, in fact, the current and latest syslog-ng.
And then, of course, you'll have to set up the definitions within syslog-ng, within its configuration file. And of course, these will look like typical sources and destination definitions, very much following the templates of the other types of sources and definitions that you would have included in your syslog-ng.conf file.
To give you an example, here is a very simple source statement for a syslog-ng implementation that's going to be a subscriber to a particular subscription in GCP. And it's very, very simple. Again, all you need to do is use the appropriate source driver-- which, naturally enough, is called google-pubsub-- and at a minimum, all you need to really provide are three different parameters. You need to give the source-- the location-- of your credentials file, you'll have to know what project name you're associated with, and then the only other thing you need to do is identify the name of the subscription that you're going to be receiving messages from.
And conversely, if you're talking to-- or sending messages to-- Google Pub/Sub, you have to create a destination. And of course, this is very similar. Again, very simple syntax. It's not a lot of information that you've got to provide. What syslog-ng premium edition is doing for you is it's abstracting away all of the details of talking to the REST API in the Google Cloud. And of course, that's what we're doing here. We're talking via HTTP to a REST API. But we're under the covers, taking care of all those implementation details, so that all you need to provide is, really, some high level information that's very simple to specify.
There are other options, as well. What I showed you were the required options, but in fact, you have other parameters that you can apply to either your source or destination. So these are some of the destination options that you can do. And these what you would use to provide more worker threads, to set up a disk buffer, to make sure that if anything happens to your syslog-ng instance, or if there are network issues, you'll have incoming messages stored securely somewhere for later transmission to GCP. There's also some ability to monitor and modify how you're going to use the bandwidth-- how much batching you're going to do to get data from syslog-ng to GCP. But again, in general, it's very, very simple. You just have to provide a minimal amount of information to be able to use this feature.
So those are the basics, now, of what syslog-ng does, and how we talk to GCP Pub/Sub. I do have a bit of a demo here to show you all these pieces working together, and I'm just going to show a couple of slides that lays that out.
So I'm going to have, essentially, three instances of syslog-ng. The two important ones are here. I'm going to be receiving logs into this particular Linux machine from another VM. So that's going to generate some logs-- log messages-- send it to this machine, whose name is wecrelay. And this is going to be the publisher machine. So this is going to take, essentially, logs coming in from my network infrastructure, and it's going to send them in to Pub/Sub in GCP, into, you know, there's a topic associated with the messages going into Pub/Sub.
And on the other end, I've got another instance of syslog-ng, and this is my subscriber. So this is going to be having a syslog-ng implementation with a GCP source. This is looking to ingest logs from Google Pub/Sub. And so it's very simple. We'll see the logs being transferred, and being ingested by this first machine, and then we'll be able to see them coming into the second machine. And for this receiving machine from the subscriber, what will happen is those log messages that are coming from Pub/Sub will be deposited into a file on the local machine.
Incidentally there, if you want to learn any more about syslog-ng general, some of the other features, just go to syslog-ng.com. There you can be directed to a trial download page, all the administration guides and other documentation, some blogs. Great deal of information on syslog-ng. So very, very useful site.
And that's that. I'm going to now transition into the demo. So let me just hang on for a second, and we'll be right there with-- OK. I've got some terminals open here. And in fact, four different ones that I'm going to show you. This is on my syslog-ng instances that I showed you before.
So the first one here is the machine that's going to act as my publisher. So this machine is going to be accepting messages from my network, and it's going to process them and send them to Google Pub/Sub, to a particular topic that it showed before. And I'm also, then, going to be receiving them. So I have another instance of syslog-ng that's already running here, and it's waiting for messages to be coming from Google Pub/Sub.
So this is my subscriber machine, and what's going to happen is I'm going to generate a bunch of messages, send them to my publisher machine. It'll send them to the cloud, to Pub/Sub. Then they'll come back down to my subscriber machine that's running the syslog-ng Pub/Sub source-- in other words, ready to accept messages from Pub/Sub. And it'll be writing them into a file. So what I'll do is, I'm going to tail that file so you'll be able to see them as they come in real time.
Now what I'm going to be doing is, I'm going to be generating messages synthetically, from yet another instance of syslog-ng and another Linux VM. But this is going to continually create log messages, and you'll see there, the message body will just consist of the uppercase characters P-A-D repeated over and over again. So again, they're purely synthetic, but they're actual syslog messages, being generated at a rate that I'm specifying, and for an interval I'm specifying, as well. So again, these are going to go to my machine that will be my publisher, the publisher will send it to GCP Pub/Sub, they'll come back through via a subscription, and come down to my other Linux machine.
So let me go ahead, and I'm going to generate my messages here. And there they go. So it's pumping them into the network. If I go over to my machine, my receiving machine, this is the guy that's receiving them for the network, and it's going to be pushing them off to Pub/Sub, as well. So now the only other thing I need to do is I'll look at my re-- here is my receiving machine. This is running syslog-ng, and I'm running in the foreground so you can see all of the activity that's happening. And you could see, sure enough, syslog-ng is seeing these messages coming in from Pub/Sub. This is how syslog-ng itself, the application, sees it, and if it's doing its job correctly, it should be sending them to the destination, the local destination, on that machine that I've specified.
And sure enough, here is that destination file on my receiver, or my subscriber, pulling in the messages. They're coming from Pub/Sub and they're being deposited in that file. So it was in var logs. If I do an ls -l, and it was Pub/Sub messages, you can see that here are the files-- 5:10 Eastern time on October 14. Pretty big file already. But those are all the messages that I've been able to ingest via this pathway.
So from my network infrastructure to syslog-ng, from syslog-ng those messages then get transferred up to Google Cloud, into Pub/Sub, into their asynchronous messaging system. And I also pull that back down, because I'm subscribing to the topic that that first just like any instance has been publishing to, or publishing to the to-- I'm subscribing to the topic that it published to, and I'm receiving them at the back end.
So in other words, Google Pub/Sub can now be used either as a destination or a source, or both, for syslog-ng. And again, this conforms with syslog-ng premium edition. This is a feature that's exclusive to the premium edition at this point, along with many of the other features that premium edition does bring to the party, in addition to, again, commercial support.
So hope you enjoy that brief demo and the presentation. And hopefully you'll visit our website, and you'll get a lot more information about this feature and other features that are available in syslog-ng premium edition.