Hello, and thanks for joining me here today at Virtual Unite. My name is James Bonamico, and I'm a Senior Sales Engineer here at One Identity. And I'll be talking about a new feature that's soon to make its way into Syslog-ng. And that is Windows Event Collector Clustering. We'll go through how to collect events from your entire Windows infrastructure at enterprise scales agentlessly and with multiple node redundancy. For this presentation, I'll assume you already have some familiarity with Syslog-ng itself as well as Windows events.
First, let me back up a bit to go over the current ways you can use Syslog-ng to capture logs from Windows servers and desktops. First of all, we have the Syslog-ng Windows Agent, which is included with Syslog-ng Premium Edition as well as the Syslog-ng store box. So this is a lightweight agent that can be installed in standalone mode or pushed out through a group policy. It's easily configured through an MMC either locally or centrally. And what it does is allow you to capture Windows Event Container Logs and stream them over to a Syslog-ng server. Not only that, but it can also monitor files and folders as well. So if you have an application that logs to flat files, these can also be easily read and ingested.
There are robust built-in filtering and formatting options available so you can customize what data to send out over the wire. And it can all be encrypted during that transit. So this slide here, what you're looking at is a basic diagram on how you would use the Agent to collect logs from a Windows environment. On the left, you have your various data centers and offices where the agent has been deployed, and they're all streaming logs to Syslog-ng Premium Edition. Here we're using a Load Balancer in the middle there so we can have multiple Syslog-ng nodes for redundancy. And then on the Syslog-ng PE servers themselves, we can parse, filter, reformat, and route those messages to any number of final destinations. So whether you're sending the Splunk, QRadar, Elastic Kafka, or cloud destinations like Google Pub/Sub, Stackdriver, or Azure Sentinel, we've got you covered.
And then next up is this is Syslog-ng Windows Event Collector, or WEC. So this approach, on the other hand, is agentless. So if you can't install anything, there is really no need to install into your Windows environment for it to work. It's based on Windows event forwarding, so it's a proven Microsoft technology. It allows you to define source-initiated push subscriptions and have them forwarded over HTTPS using TLS mutually authenticated encryption.
So here is how that works. On the Syslog-ng server, and running alongside Syslog-ng itself, is this Syslog-ng WEC process. Subscriptions are configured here in a YAML file. So once Windows authenticates the Syslog-ng server, the WEC service then lets Windows know which event containers it's interested in, so Windows can then begin sending those events. The WEC service receives these events and then passes them to a Syslog-ng Windows Events source, which then runs them through an XML Parser to break out all key-value pairs like application, event ID, et cetera. Then from there, Syslog-ng can filter, process, and reformat the logs however you want before storing or sending them on to a SIEM or other log consumer.
So just to be clear here, the Syslog-ng Windows Event Collector and the Syslog-ng server-- they are two processes running side by side in the Syslog-ng server itself. So these aren't actually two separate servers.
And then to scale either of these methods up, whether agent or agentless, you can use Syslog-ng Relays. So relays are full Syslog-ng instances whose purpose is to aggregate and, if necessary, pre-filter and pre-process, and then pass those logs onto a Syslog-ng server. It's the same software, but the difference is that the relay is intended as a pass-through. It's not a destination in itself. And it will always work with a server instance. So you could have these relays acting as a local ingest point for a few thousand endpoints each, and then ensure a secure and reliable communication from those relays to a central server.
So let me pause the slides for a minute here, and I'll switch to my terminal. What I'm going to show you is some of the internals of the Windows Event Collector, really essentially what it looks like. So this here is the YAML file that defines the subscription. So Windows will have a list of one or more collectors that it will contact and provide logs to through [? WIN or M ?] according to the subscriptions defined in this YAML file.
So you can see here, this is the level of events that we want. It's informational and above. So we're subscribed to the application, the security, the setup system, and forwarded events containers. The rest of the stuff here is just connection parameters. And also, we're not doing any event ID filtering in this particular case, but it is possible in this YAML file.
And then down here, what we're looking at are the state files that WEC generates. So you can see, an entry will be created in this folder for every single client, one separate file per client.
And then down at the bottom here, if we look into the actual contents of one of these files-- for example, we can see that it says from this particular client, here is a list of bookmarks. And these bookmarks tell you where we are at, reading each of the event channels that we're subscribed to. So the event channel last left off at security at this record number, set up container at this record number, and then at the end it shows which channel we last received. So, currently reading from the system event channel.
So this all works fantastically, but there's one problem. And I'll ask you if you can spot it. So my question is, how would you add redundancy to this architecture? So knowing now how Windows Event Forwarding and the Collector works, could you simply put multiple collector instances behind a load balancer? Well it turns out, unfortunately, the answer is no. It's not quite that simple.
So as we've seen, the WEC service necessarily maintains a state file. So if we were to store each node state file on their own individual back-ends, and they're all maintained separately, they'll be out of sync. So if a client is sending event logs to one server, and that server dies, and there's a failover, it's guaranteed that these records will be stale and out of date. And then if the container is rotated on the client side, in the meantime, it'll be further out of sync. So the result is that you could have missed messages or even duplicate messages once it starts trying to read from the wrong point in the event log. So either way it's not a reliable solution.
What we ultimately need is something that can share the states between multiple WEC servers and also write them atomically. Can't afford allowing the back ends to get out of sync. And this is where our new Windows Event Collector Clustering Enhancement comes in. So to make something like this happen we've come up with a Redis-based solution. Well, what is Redis? Essentially, Redis is a networked key-value database. It supports high availability clustering so that we can allow an arbitrary number of WEC instances to record bookmarks to a single location.
So this way the state information is shared, and updated by all nodes, and it's guaranteed to be in sync. So if one node goes down, the others are still running and updating the state file. And when that node comes back up, it checks the state file, which is current and accurate, and it's able to pick up at that correct record.
So before I go on, I just want to note that this feature is currently in the latter stages of development. And as of the time of this video, it'll be released soon. So just as a disclaimer, what you'll be seeing today may not be quite the final product. For example the exact file structure and entries may be somewhat different in production, but what I'll show you is the basic functionality.
So first of all, this screen here-- what I'm showing you is that we have two IP addresses behind a load balancer, so that Windows servers will be sending logs to a single IP-- 133.46. But the connections will be round-robin-ed across my two WEC nodes on 47 and 48. And just for navigation purposes, if you notice you can see all the way at the bottom of the window that these are the titles of the screens that I'll be on, so you can tell what we're looking for. In this case, I'm using IPVSADM as a load balancer. So we're looking at the status of that.
Switch to here. This is the first WEC service on IP 47. So normally WEC runs in the background, but I want to show you is it's running in the foreground with debug messages on so that we can see the operations and then messages flowing in. So you can see here these are the messages coming in, and you can see what WEC is actually doing.
And then switching over to second screen. So here's the second WEC service running on 48. So in this case, I only have one Windows server. So in this case, the connection is going to WEC 1 And this one here is idle because I don't have tons of Windows servers running.
And switching over to this screen. So here, if I switch this window, we can see what I'm doing is I'm refreshing every second the current state entry written into RES. So this will continually update as messages come in, and it's read from and written to by both WEC nodes. So as messages come in, each of these record ideas will be incremented.
And switching to yet another screen, just to see what the final raw messages look like. So here I'm just tailing the messages on the first node. So this is messages just being written to file as is, after being run through Syslog-ng. But if you want you can process these further by using Syslog-ng. Format them however you want, filter, and whatnot. But these are just the raw messages, just so we can see them coming in.
And then this screen is the telling messages on WEC 2. So this is going to be static just because no new messages are currently coming in on this node because they're all going in the first one. So you can see both WEC nodes are running. Messages will be coming in and connections are split across the two IPs via load balancing, and both nodes are then able to process messages.
So what I'm going to show you next as part of this demo is what happens when one node goes down. So I'm going to switch to a blank terminal here. And what I'm going to do is take out one of the first WEC node that's coming in. So this is just a script that kills the IP. And so this kills the network connection to the first node, so it leaves only one remaining. And if I switch over to the load balancer, you can see here there's only one active IP in the group. So where I used to have 47, you're not seeing that anymore. So the first WEC node is down.
Now we switch to [INAUDIBLE]. So now I'm telling WEC 2, you can see, now the messages are coming in on the second node. So Windows has realized that connection was severed, it's reconnecting, and it's just connecting directly to the second node. But for Windows, it's still the same IP because it's reusing the load balancer. And then if I go into Redis-- again here's the state information file, and it's still going to be updated, this time by node 2. And so these record IDs, we can see them incrementing already. Whereas originally they were being written to-- it being written to by WEC 1, now it's being written to by WEC 2.
Just going back, I'm going to add WEC 1 back in, and move WEC 2, and then see if I can generate similar Windows messages. So going back to-- here go the IPs. Now 48 is missing, so WEC 2 is missing. So things should switch back to WEC 1. So now going to WEC 1's messages. Some tailing WEC 1.
And then sometimes it takes a little while for Windows to retry that connection. I don't really have control over that. Windows just decides to retry the connection. So you might have to wait a little bit before it decides to go again, but then we should start seeing-- there we go. So these are now event logs streaming into the original WEC 1.
So that is reliable load balancing for the Windows Event Collector, soon to be part of Syslog-ng. Again this will scale very nicely beyond what you can do with Windows alone. So this has been a longstanding problem, and this solution will take care of it. So if you're interested or have questions, please reach out to us. Thanks for your time, and thanks for watching. Take care.