One of the most common questions I received in my analyst years of covering SIEM and other security monitoring technologies was “what data sources to integrate into my SIEM first?”
And of course the only honest answer to this question is: it depends on your security monitoring use cases and how you prioritize them.Naturally, some people then ask “ok, so then what are my use cases?” (and then there are these challenges too). Finally, perhaps in this paper, we made a list of popular log sources aggregated from many organizations. Admittedly, the list may end up being useless for organizations with different security needs and challenges.
Joking aside, big organizations often make the decision to integrate a log source into their SIEM / UEBA based on factors other than the pure security necessity.
Overall, such factors may include:
- Necessity for detection
- Necessity for alert triage and incident response
- Necessity as context data for another log source
- Compliance requirements to collect and retain this log type
- Compliance requirements to monitor this data source and/or system
- Ease of integration of the log source
- Parser availability from the vendor
- Ability to actually transfer the log data to a SIEM
- Other planned log sources that compete for attention
- Data volume of the log source
And of course for users of those sad SIEM products that charge per gigabyte or EPS [oh… wait … this is still almost everybody! :-)], the cost of introducing a new data source into the platform may be one of the BIG deciding factors.
Be honest: will you include a data source that will eat up 10% of your overall SIEM license if you only plan to use it as context — valuable though it may be — for another data source? Namely, if you don’t plan to write any detection rules or other logic based on this telemetry (DHCP being my favorite example here — how many detections rely solely on DHCP logs? None or very few at most).
As a result, my experience with SIEM deployments (going back to 2002, if you are curious) taught me that few people will include DNS or DHCP logs during their initial phases of SIEM roll-out. In fact, some will never include them in their SIEM! When asked why, those people explain that while they are convinced of the general utility of DNS logs, they do not see much value in each individual message that costs money to collect. And there are so many of those messages! Over the years, I’ve usually called them “sparse value logs” where the value is in getting the bulk rather than in getting some particularly valuable messages like say Windows Security Event ID 1102 …
As a result, SIEM operators have doubts about paying for inclusion of this data into their SIEM. The same doubt occasionally appears even for firewall logs, netflow records and many other high volume sources. Thus, web proxy logs, netflow, DNS, DHCP historically ended up in few SIEMs. I recall a client story from a few years back where adding web proxy logs would have 3X’d the volume of log data flowing into a SIEM. That is, web proxy logs were twice the volume of all other logs they collected.
Even more so, very few people will toss all EDR telemetry into a SIEM, and usually limit themselves to EDR alerts. Admittedly, sysmon records are becoming more popular, but perhaps more so in “free” Elastic vs paid SIEM (and this will still cost you in either hardware or public cloud costs — sometimes eye-watering cloud costs at that).
In fact, this gave rise to an architecture where one product is used for high-value logs while another product augments it by storing more voluminous logs. However, such as architectures usually have no technical merit and bring up complexity and fragmentation and thus fragility. They do work if there are good APIs in the products (such as to query one telemetry repository from another), but it is useful to remember that they do not offer advantages other than cost.
To summarize, in some perfect world I want to make log integration decision based ONLY on the value of such logs to my security goals and, specifically, use cases. However, today’s “popular” licensing models make this very hard.
Let’s change something!
[cross-post from "Anton on Security"]
Comments