Biswajit Banerjee's Posts (29)

Sort by

One more idea that has been bugging me for years is an idea of “detection as code.” Why is it bugging me and why should anybody else care?

First, is “detection as code” just a glamorous term for what you did when you loaded your Snort rules in cvs in, say, 1999? Well, not exactly.

What I mean by “detection as code” is a more systematic, flexible and comprehensive approach to threat detection that is somewhat inspired by software development (hence the “as code” tag). Just as infrastructure as code (IaC) is not merely about treating your little shell scripts as real software, but about machine-readable definition files and descriptive models for infrastructure.

Why do we need this concept? This is a good question! Historically, from the days of first IDS (1987) to the sad days of “IDS is dead” (2003) and then to today, detection got a bit of a bad reputation. We can debate this, to be sure, but most would probably agree that threat detection never “grew up” to be a systematic discipline, with productive automation and predictable (and predictably good!) results. In fact, some would say that “Your detections aren’t working.” And this is after ~35 years of trying …

Detection engineering is a set of practices and systems to deliver modern and effective threat detection. Done right, it can change security operations just as DevOps changed the stolid world of “IT management.” You basically want to devops (yes, I made it a word) your detection engineering. I think “detection as code” is a cool name for this shift!

As you see, this is not so much about treating detections as code, but about growing detection engineering to be a “real” practice, built on modern principles used elsewhere in IT (agile this, or DevOps whatever).

Now, to hunt for the true top-tier APTs, you probably need to be an artist, not merely a great security engineer (IMHO, best threat hunting is both art and science, and frankly more art than science….). But even here, to enable “artistic” creativity in solving threat detection problems we need to make sure those solutions function on a predictable layer. Moreover, for many other detection pursuits, such as detecting ransomware early, we mostly need automated, systematic, repeatable, predictable and shareable approaches.


OK, how do we do “detection as code”? How would I describe the characteristics of this approach?

  • Detection content versioning so that you can truly understand what specific rule or model triggered an alert — even if this alert was last July. This is even more important if you use a mix of real-time and historical detections.

  • Proper “QA” for detection content that covers both testing for broken alerts (such as those that never fire or those that won’t fire when the intended threat materializes, and of course those that fire where there is no threat) and testing for gaps in detection overall. “False positives” handling, naturally, get thrown into this chute as well.

  • Content (code) reuse and modularity of detection content, as well as community sharing of content, just as it happens for real programming languages (I suspect this is what my esteemed colleague describes here). As a reminder, detection content does not equal rules; but covers rules, signatures, analytics, algorithms, etc.

  • Cross-vendor content would be nice, after all we don’t really program in “vendor X python” or “big company C” (even though we used to), we just write in C or Python. In the detection realm, we have Sigma and YARA (and YARA-L too). We have ATT&CK too, but this is more about organizing content, not cross-vendor writing of the content today.

  • I also think that getting to cross-tool detection content would be great, wherever possible. For example, you can look for a hash in EDR data and also in NDR; and in logs as wellSIEM alone won’t do.
  • Metrics and improvement are also key; the above items will give you plenty of metrics (from coverage to failure rates), but it is up to you to structure this process so that you get better.

  • While you may not be looking at building a full CI/CD pipeline for detections to continuously build, refine, deploy and run detection logic in whatever product(s), I’ve met people who did just that. To me, these people really practice detection as code.

  • Finally, I don’t really think this means that your detections need to be expressed in a programming language (like Python here and here or Jupyter notebooks). What matters to me is the approach and thinking, not actual code (but we can have this debate later, if somebody insists)

Anything else I missed?


For our recent SANS paper / webcast, that mentioned this topic, we crafted this example visual:

 
13188015463?profile=RESIZE_710x

 Source: recent SANS paper.



Finally, let’s cattle-prod the elephant in the room: what about the crowd that just does not want anything “as code”? They also don’t like to create their own detections at all. In fact, they like their detections as easy as pushing an ON button or downloading a detection pack from a vendorThis is fine.

Personally, I’ve met enough security people who run away screaming from any technology that is “too flexible”, “very configurable” and even “programmable” (or: “… as code”) because their past experience indicates that this just means failure (at their organization). However, to detect, you need both a tool and content. Hence, both will have to come from somewhere: you can build, buy, rent, but you must pick.

Now, upon reading this, some of you may say “duh … what is not painfully obvious about it?” but I can assure you most people in the security industry do NOT think like that. In fact, such thinking is alien to most, in my experience. Maybe they think detection is a product feature. Or perhaps they think that detection is some magical “threat” content that comes from “the cloud.”

Hence, “detection as code” is not really an approach change for them, but a more philosophical upheaval. Still, I foresee that threat detection will always be a healthy mix of both an engineering and a creative pursuit….

Thoughts?


P.S. Thanks to Brandon Levene for hugely useful contributions to this thinking!

 

- By Anton Chuvakin (Ex-Gartner VP Research; Head Security Google Cloud)

Original link of post is here

Read more…

We all know David Bianco Pyramid of Pain, a classic from 2013. The focus of this famous visual is on indicators that you “latch onto” in your detection activities. This post will reveal a related mystery connected to SIEM detection evolution and its current state. So, yeah, this is another way of saying that a very small number of people are perhaps very passionate about it …

But who am I kidding? I plan to present a dangerously long rant about the state of detection content today. So, yes, of course there will be jokes, but ultimately that is a serious thing that had been profoundly bothering me lately.

First, let’s travel to 1999 for a brief minute. Host IDS is very much a thing (but the phrase “something is a thing” has not yet been born), the term “SIEM” is barely a twinkle in a Gartner analyst eye. However, some vendors are starting to develop and sell “SIM” and “SEM” appliances (It is 1999! Appliances are HOT!).

Some of the first soon-to-be-called-SIEM tools have very basic “correlation” rules (really, just aggregation and counting of a single attribute like username or source IP) and have rules like “many connections to the same port across many destinations”, “Cisco PIX log message containing SYNflood, repeated 50 times” and “SSH login failure.” Most of these rules are very fragile i.e. a tiny deviation in attacker activities will cause it to not trigger. They are also very device dependent (i.e. you need to write such rules for every firewall device, for example). So the SIM / SEM vendor had to load up many hundreds of these rules. And customers had to suffer through enabling/disabling and tuning them. Yuck!

While we are still in 1999, a host IDS like say Dragon Squire, a true wonder of 1990s security technology, scoured logs for things like “FTP:NESSUS-PROBE” and “FTP:USER-NULL-REFUSED.” For this post, I reached deep into my log archives and actually reviewed some ancient (2002) Dragon HIDS logs to refresh my memory, and got into the vibe of that period (no, I didn’t do it on a Blackberry or using Crystal Reports — I am not that dedicated).

Now fast forward to about 2003–2004 — and the revolution happened! SIEM products unleashed normalized events and event taxonomies. I spent some of that time categorizing device event IDs (where does Windows Event ID 1102 go?) into SIEM taxonomy event types, and then writing detection rules on them. SIEM detection content writing became substantially more fun!

This huge advance in SIEM gave us the famous correlation rules like “Several Events of The Exploit Category Followed By an Event of Remote Access Category to Same Destination” that delivered generic detection logic across devices. Life was becoming great! These rules were supposed to be a lot more resilient (such as “any Exploit” and “any Remote Access” vs a specific attack and, say, VNC access). They also worked across devices — write it once, was the promise, and then even if you change the type of the firewall you use, your correlation still detects badness.

Wow, magic! Now you can live (presumably) with dozens of good rules without digging deep into regexes and substrings and device event IDs across 70 system and OS version types deployed. This was (then) perceived as essential progress of security products, like perhaps a horse-and-buggy to a car evolution.

Further, some of us became very hopeful our Common Event Expression (CEE) initiative will take off. So, we worked hard to make a global log taxonomy and schema real and useful (circa 2005).

But you won’t believe what happened next!

Now, let’s fast forward to today — 2020 is almost here. Most of the detection content I see today is in fact written in the 1990s style of exact and narrow matching to raw logs. Look at all the sexy Sigma content, will you? A fellow Network Intelligence enVision SIM user from 1998 will recognized many of the detections! Sure, we have ATT&CK today, but it is about solving a different problem.

An extra bizarre angle here is that as machine learning and analytics rise, the need for clean, structured data rises if we were to crack more security use cases, not just detection. Instead, we just get more data overall, but less data that you can feed your pet ML unicorn with. We need more clean, enriched data, not merely more data!

To me, this feels like the evolution got us from a horse and buggy to a car, then a better car, then a modern car — and then again a horse and buggy ...

So, my question is WHY? What happened?

I’ve been polling a lot of my industry peers about it, ranging from old ArcSight hands that did correlation magic 15 years ago (and who can take a good joke about kurtosis) and people who run detection teams today on modern tools [I am happy to provide shout-outs, please ping me if I missed somebody, because I very likely did due to some of you saying that you want to NOT be mentioned]

But first, before we get to the answer I finally arrived at, after much agonizing, let’s review some of the things I’ve heard during my data gathering efforts:

  • Products that either lack event normalization or do it poorly (or lazily rely on clients to do this work) won the market battle for unrelated reasons (such as overall volume of data collected), and a new generation of SOC analysts have never seen anything else. So they get by with what they have. Let’s call this line of reasoning “the raw search won.”

  • Threat hunters beat up the traditional detection guys because “hunting is cool” and chucked them out of the window. Now, they try to detect the same way they hunt — by searching for random bits of knowledge of the attack they’ve heard of. Let’s call this line of thinking “the hunters won.”

  • Another thought was that tolerance for “false positives” (FP) has decreased (due to growing talent shortages) and so writing more narrow detections with lower FP rates became more popular (‘“false negatives” be damned — we can just write more rules to cover them’). These narrow rules are also easier to test. Let’s calls this “false positives won.”

  • Another hypothesis was related to the greater diversity of modern threats and also a greater variety of data being collected. This supposedly left the normalized and taxonomized events behind since we needed to detect more things of more types. Let’s call this one “the data/threat diversity won.”

So, what do you think? Are you seeing the same in your detection work?

Now, to me all the above explanations left something to be desired — so I kept digging and agonizing. Frankly, they sort of make some sense, but my experience and intuition suggested that the magic was still missing…

What do I think really happened? I did arrive at a very sad thought, the one I was definitely in denial about, but the one that ultimately “clicked” and many puzzle pieces slid into place!

The normalized and taxonomized approach in SIEM never actually worked! It didn’t work back in 2003 when it was invented, and it didn’t work in any year since then. And it still does not work now. It probably cannot work in today’s world unless some things change in a big way.

When I realized this, I cried a bit. Given how much I invested in building, evangelizing and improving it, then actually trying to globally standardize it (via CEE), it feels kinda sad…


Now, is this really true? Sadly, I think so! SIEM event taxonomization is …

  • always behind the times and more behind now than ever
  • inconsistent across events and log sources — for every vendor today
  • remains to be seriously different between vendors — and hence cannot be learned once
  • contains an ever-increasing number of errors and omissions that accumulate over time
  • is impossible to test effectively vs real threats people face today.


So, I cannot even say “SIEM event taxonomy is dead”, because it seems like it was never really alive. For example, “Authentication Failure” event category from a SIEM vendor may miss events from a new version of software (such as a new event type introduced in a Windows update), miss events from an uncommon log source (SAP login failed), or miss events erroneously mapped to something else (say to “Other Authentication” category).

In essence, people write stupid string-matching and regex-based content because they trust it. They do not — en masse — trust the event taxonomies if their lives and breach detections depend on it. And they do.

What can we do? Well, I am organizing my thinking about it, so wait for another post, will you?

 

- By Anton Chuvakin (Ex-Gartner VP Research; Head Security Google Cloud)

Original link of post is here

Read more…

We had a community session on Evaluating AI Solutions in Cybersecurity: Understanding the "Real" vs. the "Hype" featuring Hilal Ahmad Lone, CISO at Razorpay & Manoj Kuruvanthody, CISO & DPO at Tredence Inc.

In this discussion, we covered key aspects of evaluating AI solutions beyond vendor claims and assessing an organization’s readiness for AI, considering data quality, infrastructure maturity, and how well AI can meet real-world cybersecurity demands. 

Key Highlights:

  • Distinguishing marketing hype from practical value: Focus on ways to assess AI solutions beyond vendor claims, including real-world impact, measurable results, and the AI’s role in solving specific cybersecurity challenges.

  • Evaluating AI maturity and readiness levels: Assessing whether an organization is ready for AI in its cybersecurity framework, especially regarding data quality, infrastructure readiness, and overall maturity to manage and scale AI tools effectively. This also includes gauging the AI model’s maturity level in handling complex, evolving threats.

  • AI Maturity and Readiness - Proven Tools vs. Experimental Models: Evaluate the readiness level of AI models themselves, where real maturity is marked by robust performance in varied cyber environments, while hype often surrounds models that are still experimental or reliant on ideal conditions. Organizational readiness, such as infrastructure and data integration, also plays a critical role in realizing real-world results versus theoretical benefits.


About Speaker

  • Hilal Ahmad Lone, CISO at Razorpay 
  • Manoj Kuruvanthody, CISO & DPO at Tredence Inc.

 

Executive Summary (Session Highlights):

  • Navigating AI Risk Management: Standards and Frameworks:
    This session explored the significance of adopting industry standards and frameworks like Google's SAFE Framework, ISO 42001:2023, and the NIST Cybersecurity Framework in ensuring responsible AI adoption. Experts emphasized the need for organizations to fine-tune these frameworks based on their unique risks and objectives.

  • Risk Assessments and Maturity Models for AI Systems:
    The conversation highlighted the necessity of performing thorough risk assessments tailored to AI environments. Maturity models, including red teaming and vulnerability assessments, were discussed as pivotal methods for evaluating the robustness of AI implementations. Emerging techniques such as jailbreaking LLMs and prompt injections were also examined for their role in testing AI vulnerabilities.

  • The Case for Chaos Engineering:
    Chaos engineering was underscored as a critical approach to stress-testing AI systems in real-world conditions. Experts advocated for implementing chaos testing in production environments to uncover hidden vulnerabilities and ensure resilience under unpredictable scenarios.

  • Quantum Computing and AI: A Transformational Combination:
    Participants discussed the profound security implications of quantum computing, particularly when paired with AI. While quantum technology poses immediate threats to existing cryptographic systems, its integration with AI accelerates both opportunities and risks. The session stressed the importance of preparing for the quantum era by adopting quantum-resistant cryptography and evolving defense strategies.

  • AI and Data Loss Prevention (DLP): Harmonizing Technologies:
    The discussion explored the coexistence of AI and DLP technologies, emphasizing the challenges of aligning AI-driven systems with non-AI DLP solutions. Fine-tuning and adaptability were identified as key enablers for integrating these technologies effectively without compromising data security.

  • Preparing for the Future of AI and Quantum Security:
    Concluding the session, experts advised organizations to focus on defense-in-depth strategies while preparing for quantum-resistant solutions. They stressed the importance of proactive learning, collaboration, and incremental adoption of advanced security measures to fortify defenses in an era shaped by AI and quantum innovations.
Read more…

Many organizations are looking for trusted advisors, and this applies to our beloved domain of cyber/information security. If you look at LinkedIn, many consultants present themselves as trusted advisors to CISOs or their teams.

13167902464?profile=RESIZE_710x

Untrusted Advisor by Dall-E via Copilot


This perhaps implies that nobody wants to hire an untrusted advisor. But if you think about it, modern LLM-powered chatbots and other GenAI applications are essentially untrusted advisors (RAG and fine-tuning notwithstanding).


Let’s think about the use cases where using an untrusted security advisor is quite effective and the risks are minimized.

To start, naturally intelligent humans remind us that any output of an LLM-powered application needs to be reviewed by a human with domain knowledge. While this advice has been spouted many times — with good reasons — unfortunately there are signs of people not paying attention. Here I will try to identify patterns and anti-patterns and some dependencies for success with untrusted advisors, in security and SOC specifically.

First, tasks involving ideation, creating ideas and refining them are very much a fit to the pattern. One of the inspirations for this blog was my eternal favorite read from years ago about LLMs “ChatGPT as muse, not oracle”. If you need a TLDR, you will see that an untrusted cybersecurity advisor can be used for the majority of muse use cases (give me ideas and inspiration! test my ideas!) and only for a limited number of oracle use cases (give me precise answers! tell me what to do!).

So let’s create new ideas. How would you approach securing something? What are some ideas for doing architecture in cases of X and Y constraints? What are some ideas for implementing controls given the infrastructure constraints? What are some of the ways to detect Z? All of these produce useful ideas that can be turned by experts into something great. Ultimately, they shorten time to value and they also create value.

A slightly more interesting use case is the Devil’s Advocate use case (this has been suggested by Gemini Brainstormer Gem during my ideation of this very post!). This implies testing ideas that humans come up with to identify limitations, problems, contradictions or other cases where these things may matter. I plan to do X with Y and this affects security, is this a good idea? What security will actually be reduced if I implement this new control? In what way is this new technology actually even more risky?

Making “what if” scenarios is another good one. After all, if the scenarios are incorrect, ill-fitting or risky, a human expert can reject them. No harm done! And if they’re useful, we again see shorter time to value (epic example of tabletops via GenAI)

Now think about all the testing use cases. Given the controls we have, how would you test X? This makes me think that perhaps GenAI will end up being more useful for the red team (or: red side of the purple team). The risks are low and the value is there.

Report drafting and data story-telling. By automating elements of data-centric story telling, GenAI can produce readable reports, freeing humans for more fun tasks. Furthermore, GenAI excels at identifying patterns. This enables the creation of compelling narratives that effectively communicate insights and risks. And, back to the untrusted advisor: it’s still essential to remember that experts should always review GenAI-generated content for accuracy and relevance (thanks for the reminder, Gemini!)


Summary — The Good:

  • Ideation and Brainstorming: LLMs excel at generating ideas for security architectures, controls, and approaches. They can help overcome mental blocks and accelerate the brainstorming process.

  • Devil’s Advocate: LLMs can challenge existing ideas, identify weaknesses, and highlight potential risks. This helps refine strategies and improve overall security posture.

  • “What-if” Scenarios: LLMs can create various scenarios to test the effectiveness of security controls and identify vulnerabilities.

  • Security Testing: LLMs can be valuable tools for testing, proposing simulated attacks and identifying weaknesses in defenses.

  • Report drafting: LLMs can help you write reports that make sense and flow well.


On the other hand, let’s talk about the anti-patterns. It goes without saying that if it leads to deployment of controls, automated reconfiguration of things, or remediation that is not reviewed by a human expert, that’s a “hard no”.

Admittedly, any task that require sharing detailed knowledge of my environment is also on that “hard no” list (some bots leak, and leak a lot). I just don’t trust the untrusted advisor with my sensitive data. I also assume that some results will be inaccurate, but only a human domain expert will recognize when this is the case…

Summary — The Bad:

  • Direct Control: Allowing LLMs to directly deploy controls, reconfigure systems, or automate remediation without human review is a major risk.

  • Access to Sensitive Information: Avoid sharing detailed knowledge of your environment with an untrusted LLM (which is another way of saying “an LLM”).



Bridging the Trust Gap

The key to safely using LLM-powered “untrusted security advisor” for more use cases is to maintain a clear separation between their (untrusted) outputs and your (trusted) critical systems.

13167904096?profile=RESIZE_710x

Forrester via Allie Mellen webinar https://www.forrester.com/technology/generative_ai_security_tools_webinar/


A human domain expert should always review and validate LLM-generated suggestions before implementation.
 This choice is obvious, but it is also a choice that promises to be unpopular with some environments. What are the alternatives, if any?


Alternatives and Considerations

While relying on non-expert human review or smaller, grounded LLMs might seem appealing, they ultimately don’t solve the trust issue. Clueless human review does not fix AI mistakes. Another AI may fix AI mistakes, or it may not…

Perhaps a promising approach involves using a series of progressively smaller and more grounded LLMs to filter and refine the initial untrusted output. Who knows … we live in fun times!

Agent-style valuation is another route (if an LLM wrote remediation code, I can run it in a test or simulated environment, and then decide what to do with it, perhaps automatically prompting the LLM to refine it until it works well).

But still: will you automatically act on it? No! So think real hard about the trust boundary between your “untrusted security advisor” and your environment! Perhaps we will eventually invent a semantic firewall for it?

Conclusion

LLMs can be powerful tools for security teams, but they must be used responsibly given lack of trust. By focusing on appropriate use cases and maintaining human oversight, organizations can leverage the benefits of LLMs while mitigating the risks.

Specifically, LLMs can be valuable “untrusted advisors” for cybersecurity, but only when used responsibly. Ideation, testing, and red teaming are excellent applications. However, direct control, access to sensitive data, and unsupervised deployment are off-limits. Human expertise remains essential for validating LLM outputs and ensuring safe integration with critical systems.

  • LLMs can be valuable “untrusted advisors” for ideation and testing in cybersecurity.

  • Human experts should always review and validate LLM output before implementation.

  • LLMs should not (yet?) be used for tasks requiring high trust or detailed environmental knowledge.

  • Striking the right balance between human expertise and AI assistance is crucial.


Thanks Gemini, Editor Gem, Brainstormer Gem and NotebookLM! :-)


Related:

 

- By Anton Chuvakin (Ex-Gartner VP Research; Head Security Google Cloud)

Original link of post is here

Read more…

 Mention “alert fatigue” to a SOC analyst. They would immediately recognize what you are talking about. Now, take your time machine to 2002. Find a SOC analyst (much fewer of those around, to be sure, but there are some!) and ask him about alert fatigue — he would definitely understand what the concern is.

Now, crank up your time machine all the way to 11 and fly to the 1970s where you can talk to some of the original NOC analysts. Say the words “alert fatigue” and it is very likely you will see nods and agreement about this topic.

So the most interesting part is that this problem has immense staying power, while cyber security industry changes quickly. Yet this one would be familiar to a person doing a similar job 25 years apart. Are we doomed to suffer from this forever?

I think it is a bit mysterious and worth investigating? Join me as we uncover the dark secrets behind this enduring pain.


Why Do We Still Suffer?

First, let’s talk about people’s least favorite question: WHY.

An easy answer I get from many industry colleagues is that we could have easily solved the problem at 2002 levels of data volumes, environment complexity and threat activity. We had all the tools, we just needed to apply them diligently. Unfortunately, more threats, more data, more environments came in. So we have alert fatigue in 2024.

Personally, I think this is a part of why this has been tricky, but I don’t think that’s the entire answer. Frankly, I don’t recall any year during which this problem was considered close to being solved, pay no heed to shrill vendor marketing. The early SIM/SEM vendors in the late 1990s (!) promised to solve the alert fatigue problem. At the time, these were alerts from firewalls and IDS systems. The problem was not solved with the tools at the time, and then again not solved with better tools, better scoring. I suspect that throwing the best 2024 tools at the 2002 levels of alerts will in fact solve it, but this is just a theoretical exercise…

False positive (FP) rates increased? I frankly don’t know and don’t have a gut feel here. In theory they should have decreased over the last 25 years, if we believe that security technology is improving. Let me know if anybody has any data on this, but any such data set would include a lot of apples/oranges (1998 NIDS vs 2014 EDR vs 2024 ADR anybody?)

Some human (Or was it a bot? Definitely a bot!) suggested that our fear of missing attacks is driving false positives (FP) up. Perhaps this is also a factor adding to high FP rates. If you have a choice of killing 90% of FPs by increasing FNs by 10%, would you take it? After all, merely 1 new FN (aka real intrusion not detected) may mean that you are owned…

Manual processes persisting at many SOCs mean that even a 2002 volume of alerts would have overrun them, but they hired and covered the gap. Then alert volumes increased with IT environment (and threat) growth, and they were not able to hire (or transform to a better, automation-centric model).

More tools that are poorly integrated probably contributed to the problem not being solved. IDS was as the sole problem child of the late 1990s. Later, this expanded and evolved to EDR, NDR, CDR, and other *DR as well as lots of diverse data types flowing into the SIEMs.

All in all, I am not sure there is one factor that explains why “alert fatigue” has been a thing for 25+ years. We are where we are.

Where are we exactly?


Some [Bad] Data

With the help of a new Gem-based agent, I collected a lot of data on alert fatigue, and let me tell you…. based on the data, it is easy to see why we struggle. A lot of “data” is befuddling, conflicting and useless. Examples (mix of good and bad, your goal is to separate the two):

70% of SOC teams are emotionally overwhelmed by the volume of security alerts” (source)

43% of SOC teams occasionally or frequently turn off alerts, 43% walk away from their computer, 50% hope another team member will step in, and 40% ignore alerts entirely.” (source)

55% of security teams report missing critical alerts, often on a daily or weekly basis, due to the overwhelming volume.” (source)

“A survey found that SOC teams deal with an average of 3,832 alerts per day, with 62% of those alerts being ignored. (source)

“56% of large companies (with over 10,000 employees) receive 1,000 or more alerts per day.” (source)

“78% of cybersecurity professionals state that, on average, it takes 10+ minutes to investigate each alert (source)

“Security analysts are unable to deal with 67% of the daily alerts received, with 83% reporting that alerts are false positives and not worth their time.” (source)

In brief, the teams are flooded with alerts, leading to burnout and pain. While the exact figures vary across studies (like, REALLY, vary!), a pattern emerges: teams are overwhelmed by the volume of alerts, and often a majority of them are false. The data barely teaches us anything else…


What Have We Tried?

The problem persists, but the awareness of this problem is as old as the problem (see the hypothetical 2002 SOC analyst conversation above). In this section, let’s go quickly through all the methods we’ve tried, largely unsuccessfully.

First, we tried aggregation. Five (50? 500? 5000? 5 gazillion?) of alert such and such get duct-taped together and shipped off to pester a human. That clearly did not solve the problem. Don’t get me wrong, aggregation helps. But clearly this 1980s trick has not fixed alert fatigue.

Then we tried correlation where we try to logically related and group alerts and assign priority to the “correlated event” (ah, so 2002!) and then give them to an analyst. Nah, didn’t do it.

We also tried filtering both on what goes in the system that produces alerts (input filtering; just collect less telemetry) and also filter alerts (output; just suppress these alerts).

We obviously tried tuning i.e. carefully turning off alerts for cases where such alert is false or unnecessary. This has evolved to be one of the least popular advice in security ops (“just tune the detections” is up there with “just patch faster” and “just zero trust it”).

We tried — and are trying — many types of enrichment where the alerts are deemed to be extra fatigue-inducing because context was missing. So various automation was used to add things to alerts. IP became system name, became asset role/owner, past history was added and a lot of other things (hi 2006 SIEM Vendor A). Naturally, enrichment on its own does not solve anything, but it reduces fatigue by letting machines do more of the work.

We tried many, many types of so-called risk prioritization of alerts. Using ever-more-creative algorithms from the naively idiotic threat x vulnerability x asset value to more elaborate ML-based scoring systems. It sort of helped, but also hurt when people focused on top 5 alerts from the 500 they needed to handle. Ooops! Alert #112 was “you are so owned!” Prioritization alone is not a solution to alert fatigue.

Then there was a period of time when beautiful, hand-crafted, artisanal SOAR playbooks were the promised way to solve alert fatigue.

Meanwhile, some organizations thought that the SIEM system itself is the problem and they needed to focus them on narrow detection systems such as EDR where alerts are supposedly easier to triage. Initially, there was some promise … and now you can see more and more people who complain about EDR alert fatigue. So narrow focus tools also weren’t the answer. BTW, as EDR evolved to XDR (whatever that is) this solution “unsolved” itself (hi again, SIEM).

Today, as I’m writing this in 2024, many organizations naively assume that AI would fix it any day now. I bet some 2014 UEBA vendor already promised this 10 years ago…

So:

  1. Aggregation

  2. Correlation

  3. Filtering (input [logs] and output [alerts])

  4. Alert source tuning

  5. Context enrichment

  6. “Risk-based” and other prioritization

  7. SOAR playbooks

  8. Narrow detection tools (SIEM -> EDR)

  9. AI…

Good try, buddy! How do we really solve it?


Slicing the Problem

Now, enough with whining and towards something useful. I want to start by suggesting that alert fatigue is not one problem. Over the years, I’ve seen several distinct cases for alert fatigue.

To drastically oversimplify:

You may have alert fatigue because a high ratio of your alerts are either false positives (or: other false alerts), or they indicate activities that you simply don’t care to see. In other words, bad alerts type A1 (false) and bad alerts type A2 (benign / informational / compliance).


A1. FALSE ALERTS

A2. BENIGN ALERTS

You also have alert fatigue when your alerts are not false, but a high ratio of them are particularly fatigue-inducing and hard to triage (it’s not the volume, but the poor information quality of the alert that kills; also bad UX, or, as Allie says, AX). In other words, bad alerts, type B (high fatigue).

NEW: this also applies to malicious (i.e. not benign and not false) alerts where the risk is accepted by the organization (“yes, this student machine always gets malware, no action” kinda thing)


B. HARD TO TRIAGE ALERTS

Finally, there’s the scenario where you have perfectly crafted alerts indicating malicious activities, but your team just isn’t sufficient for the environment you have. In other words, good alerts, but just too many.


C. SIMPLY TOO MANY ALERTS

Naturally, in real life we will have all problems blended together: high ratio of bad alerts AND high overall volume of alerts AND false alerts being hard to triage, leading to (duh!) more high fatigue.

Frankly, “false alerts x hard to triage alerts x lots of them = endless pain.” If you are taking 2 hours to tell that the alert is a false positive, I have some bad news for you: this whole SOC thing will never work…

 

13167884862?profile=RESIZE_710x

Alert fatigue dimensions (Anton, 2024)

Anything I missed in my coarse grained diagnosis?


What Can We Do

Now, I don’t promise to solve the alert fatigue problem with one blog, even a long one. But I do propose a framework for diagnosing the problem that we face and for trying to sort the solutions into more promising and less promising for your situation.

For example, if you are specifically flooded with false positive alerts (e.g. high severity alert that triggers on an unrelated benign activity), unfortunately the answer is the one you won’t like: you do need to tune. Aggregation, correlation, etc are not the answer; “fix the bug in your detection code” is. If some alerts are false in bulk, these just should not be produced. If you rely on vendor alerts and your vendor alerts suck, change your vendor. Perhaps in the future some AI will tune your detection content based on the alerts for you, but today, sorry buddy, you are doing it…

So the answer here is not to use excessively more complicated SOAR playbooks. It is about actually making sure that alerts with high false positive ratios are not produced.

Huh? You think, Anton? Yup, in the case of proper false positives, “fix the detection code” really is the answer (or otherwise tune by limiting which systems are covered by the detection, this of course has tradeoffs…). I cringe a bit since I feel that I am dispensing 2001-style advice here (“tune your NIDS!”) but it does not change the fact that it is the right thing to do. BTW, most clients are just not brutal enough with their vendors in this regard…

What about the alerts that are just not useful, but also not false. In this case, the main solution avenue is enrichment. That is, after you take a good look at those that serve no purpose whatsoever, not even informational, and turn these off. You are adding enrichment dimensions so that alerts become more useful, and easy to triage.

For example, logging in after hours may not be bad over the entire environment (a classic useless alert 1996–2024), but may be great for a subset (or perhaps one system, like a canary or a honeypot, actually). Enriched alerts are dramatically easier to process via automation (so a SIEM/SOAR tool may do both for you).

Another scenario involves alerts that, while valid, are exceptionally difficult, painful to triage. This is where again enrichment combined with SOAR is the right answer. I remember a story where a SOC analyst had to open tickets with 3 different IT teams to get the missing context and then conclude (after 2 days — DAYS!) that the alert was indeed an FP.

Another situation is that alerts are hard to triage and cause fatigue because alerts just go to the wrong people. All the modern federated alerting frameworks where alerts are flowing down the pipelines to the correct people seek to fix it, but somehow few SOC team discovered this approach (we make heavy use of this in ASO, of course). For (a very basic) example, routing DLP alerts to data owners instead of the SOC can be more efficient, but this requires careful consideration and planning (not diving into this flooded rathole at the time…)

Naturally, some lessons from other fields where the alerting problem is “more solved” help. In this case, I am thinking of SREs. In our ASO/MSO approach, we have spent lots of time on the relentless drive to automation“Study what SREs did and implement in SOC/D&R” is essentially the essence of ASO (here is our class on it). Relating to the problems of alert fatigue we covered, automation (here including enrichment) and a rapid feedback loop to fix bad detection content is the whole of it, basically. No magic! No heroes!

I do want to present the final case for more alert triage decisions to be given to machines. “Human-less”, fully automated, “AI SOC” is of course utter BS (despite these arguments). However, the near future where AI will help by handling much of the cognitive work with the alert triage is coming. This may not always reduce the alert volume, but likely reduces human fatigue.

Despite all these efforts, alert fatigue may persist. In some cases, the issue might simply be a lack of adequate staffing and that’s that…


Summary

So, a summary of what to do

Diagnose the fatigue: Begin by identifying the root cause of your specific alert fatigue. Is it due to false positives, benign alerts, hard-to-triage alerts, or simply an overwhelming volume of alerts? Or wrong people getting the alerts perhaps?

Targeted treatment: Once diagnosed, apply the appropriate solutions based on the symptoms identified:

  • False positives: Focus on tuning detection rules, improving alert richness/quality, and potentially changing vendors if necessary.
  • Benign alerts: Implement enrichment to add context and make alerts more actionable. Then SOAR playbooks to route.

  • Hard-to-triage alerts: Utilize enrichment and SOAR playbooks to streamline the triage process. This item has a lot more “it depends”, however, to be fair…

  • Hard-to-triage alerts for specific analysts: Start adopting federated alerting for some alert types (e.g. DLP alerts that go to data owners)

If in doubt, focus on developing more automation for signal triage.

Expect some fun AI and UX advances for reducing alert fatigue in the near future.

Wish for some luck, because this won’t solve the problem but it will make it easier.

Share your experience with security alert fatigue and — ideally — how you solved it or made it manageable…

Final thought: Let’s collectively aim for Security Alert Fatigue (1992–202x)

v1.1 11–2024 (more updates likely in the future)

v1.0 11–2024 (updates likely in the future)


Related resources:



- By Anton Chuvakin (Ex-Gartner VP Research; Head Security Google Cloud)

Original link of post is here

Read more…

The present application was filed for quashing proceedings in a case pending for the offence punishable under Section 66-C and 67 of the Information Technology Act, 2000 (‘The IT Act, 2000’). The Hon. HC stated that it could not be concluded without any evidence that applicant would have been the only person behind creation of fake Facebook accounts, from which alleged defamatory posts were made in respect of Respondent 2, his family members including applicant’s wife.

The Court was of opinion that print of screenshots of Facebook would not prove that the said post was created from an alleged fake account.

Case Law : Mahesh Shivling Tilkari v. State of Maharashtra, Criminal Application No. 2850 of 2019, decided on 22-10-2024


Read more: Link to the Criminal Application


-By Adv (Dr.) Prashant Mali

Original link of post is here

Read more…

We are hosting an exclusive CISO Platform Talks session on Evaluating AI Solutions in Cybersecurity: Understanding the "Real" vs. the "Hype" featuring Hilal Ahmad Lone, CISO, Razorpay and Manoj Kuruvanthody, CISO & DPO, Tredence Inc.

In the evolving world of cybersecurity, distinguishing real AI innovation from marketing hype is crucial. This discussion explores key aspects of evaluating AI solutions beyond vendor claims and assessing an organization’s readiness for AI, considering data quality, infrastructure maturity, and how well AI can meet real-world cybersecurity demands. 

13154570470?profile=RESIZE_710x

 

Key Discussion Points: 

  • Distinguishing marketing hype from practical value: Focus on ways to assess AI solutions beyond vendor claims, including real-world impact, measurable results, and the AI’s role in solving specific cybersecurity challenges.

  • Evaluating AI maturity and readiness levels: Assessing whether an organization is ready for AI in its cybersecurity framework, especially regarding data quality, infrastructure readiness, and overall maturity to manage and scale AI tools effectively. This also includes gauging the AI model’s maturity level in handling complex, evolving threats.

  • AI Maturity and Readiness - Proven Tools vs. Experimental Models: Evaluate the readiness level of AI models themselves, where real maturity is marked by robust performance in varied cyber environments, while hype often surrounds models that are still experimental or reliant on ideal conditions. Organizational readiness, such as infrastructure and data integration, also plays a critical role in realizing real-world results versus theoretical benefits.

Join us live or register to receive the session recording if the timing doesn’t suit your timezone.

>> Register here

Read more…

We had a community session on "Offensive Security: Breach Stories to Defense Using Offense" with Saravanakumar Ramaiah, (Director - Technology Risk Management, Sutherland) & Rajiv Nandwani (Global Information Security Director, BCG).

In this discussion, we explore the importance of penetration testing and red team exercises in identifying security gaps within organizations, the tactics attackers employ in phishing campaigns to gain initial access, and the simulation of advanced persistent threats (APTs) to uncover risks from zero-day vulnerabilities and social engineering attacks. We also examine the critical role of social engineering in physical penetration testing and strategies to bolster defenses against these threats.

 

Key Highlights

  • Leveraging penetration testing and red team exercises to identify security gaps within organizations.

  • Techniques attackers use in phishing campaigns to gain initial access and navigate networks to access sensitive data.

  • Simulating advanced persistent threats (APTs) to understand risks from zero-day vulnerabilities and social engineering attacks.

  • Examining the role of social engineering in physical penetration testing and methods to strengthen defenses against it.

 

About Speaker

  • Saravanakumar Ramaiah, Director - Technology Risk Management, Sutherland 
  • Rajiv Nandwani, Global Information Security Director, BCG

 

CISO Platform Talks (Recorded Version)

 

Executive Summary (Session Highlights) : 

  1. Identifying Security Gaps with Penetration Testing
    In this session, experts discuss the critical role of penetration testing and red team exercises in identifying vulnerabilities within organizations. These proactive measures simulate real-world attacks, enabling companies to uncover weaknesses before they can be exploited by malicious actors.

  2. Understanding Phishing Campaigns
    The conversation highlights the techniques employed in phishing campaigns that attackers use to gain initial access to networks. Recognizing these tactics is essential for developing effective security protocols and training programs to defend against such threats.

  3. Simulating Advanced Persistent Threats (APTs)
    The chat delves into the simulation of APTs to understand the risks associated with zero-day vulnerabilities and social engineering attacks. By mirroring advanced tactics used by threat actors, organizations can better prepare their defenses.

  4. The Role of Social Engineering in Physical Penetration Testing
    Experts analyze the impact of social engineering in physical penetration tests, emphasizing the need for comprehensive training and awareness to strengthen defenses. Participants discuss methods for mitigating risks associated with these covert tactics.

  5. Strengthening Organizational Defenses
    Finally, the discussion underscores the importance of integrating findings from penetration tests and simulations into broader security strategies. By doing so, organizations can enhance their resiliency against evolving cyber threats and improve their overall security posture.
Read more…

Fireside Chat On

We are hosting an exclusive CISO Platform Talks session on "Offensive Security: Breach Stories to Defense Using Offense" featuring Saravanakumar Ramaiah, Director - Technology Risk Management, Sutherland and Rajiv Nandwani, Global Information Security Director, BCG.

In the constantly evolving environment of today, it is essential for security leaders to implement an offensive approach to capitalize on emerging opportunities. As boards become more aware of the consequences of security incidents, these leaders need to guide their colleagues on effective mitigation strategies.

13109685093?profile=RESIZE_710x

 

Key Discussion Points: 

  • Leveraging penetration testing and red team exercises to identify security gaps within organizations.

  • Techniques attackers use in phishing campaigns to gain initial access and navigate networks to access sensitive data.

  • Simulating advanced persistent threats (APTs) to understand risks from zero-day vulnerabilities and social engineering attacks.

  • Examining the role of social engineering in physical penetration testing and methods to strengthen defenses against it.

 

Join us live or register to receive the session recording if the timing doesn’t suit your timezone.

 

>> Register here 

Read more…