A Fair Weather SOC: 5 Signs It’s Time to Panic (and Fix It!) | Anton Chuvakin - All Articles

A fair-weather SOC by Meta AI

Do you have a fair-weather friend? Or two?

Fair weather friend (via Google)

OK, do you also have a fair-weather SOC?

This train of thought was inspired by reading pilot forums about how some training approaches lead to “fair weather pilots” who perform well in all cases except real emergencies. Anyhow, let me stop with this because this is not my area; it only triggered the ideation process for me.

So, what does fair-weather SOC look like? First, this reminded me of “compliance SOC” or “better than nothing SOC.” The latter is, practically, sometimes worse than nothing (If you have nothing, you also have no illusions. If you have a dramatically sub-par capability, you have illusions and you may act as if you are covered for some risks, while in reality you are not). Naturally, a “compliance SOC” serves a single purpose — to prove to an auditor that you have a SOC (and nothing else). For an MSSP, it may be a SOC built to make money by fleecing low maturity clients who don’t know better and just “want better than nothing security” done “by somebody else” (that they can then sue for fun…)

Anyhow, I digress, what makes a SOC a fair-weather SOC? Here is my top 5:

#1 Lack of experience with major incidents:

The team may have limited experience dealing with significant security incidents, resulting in a lack of preparedness and inadequate response capabilities when a crisis occurs. This can lead to slower response times, failure to detect threats, long triage times, and an inability to respond quickly yet correctly. And, yes, your SOC team may be “proven” (on minor incidents) and “metrics look good” (in the absence of attackers), but it won’t hold up in inclement weather.

What to do? Given that instrumenting a major incident is probably not the way to go, tabletop exercises are likely the main means of addressing this. These days, gen AI helps a lot here. And so does testing SOC automation under stress.

#2 Inability to operate under pressure:

When faced with a high-pressure situation, such as a major incident, the team may struggle to make decisions, communicate with annoying bosses ;-), or coordinate a cohesive response (Gemini suggested something else for a chance to use the word “delve” but no thank you…). So the team may have a plan, but they never did any planning.

What to do? Not to make the work more stressful, for sure! Planning activities, drills, practice checklists and — again — tabletops are the way to go. And you know who else works well under pressure? Robots! Automation is indirectly helping you operate better under pressure. If you automate yourself out of a stressful job, your job will be to (calmly) make the robots, while they (stressfully) operate … (this is kinda the point!)

#3 SOC metrics (if any) are smooth and measure efficiency:

They teach you nothing about what will happen when things are NOT normal. Efficiency only matters … if your stuff actually works, and works when it really has to.

What to do? Look at your SOC metrics stack and try to see which metrics measure how calm waters flow and which metrics measure how you handle the storm surge. For example, average alert triage time has nothing whatsoever to do with triaging alerts resulting from a top tier APT compromise. Measure efficiency, effectiveness and “effectiveness under fire” that cover cases when the attacker shows up. This blog and webinar cover the topic.

#4 Untested Tooling and Technology Under Stress:

The SOC relies on tools and technologies that have not been rigorously tested under high-stress scenarios or against real-world attack simulations. This reveals weaknesses when the pressure is on. People who build such SOC tool stacks assume that everything would be fine, they essentially forget that threat detection is hard and that (some) attacker care to not be detected (duh!)

What to do? Conduct regular performance testing of security tools, simulate high-volume attack scenarios, and validate the effectiveness of automated — and manual! — responses. If you use BAS or similar tools as part of such testing program, then really hit it.

#5 “Mature” and very rigid processes:

This one is going to be weird. Many years ago I remember a research paper that lamented that some SOCs had excessive maturity — the concept that I found illogical at the time. It manifested as excessively polished yet rigid processes, where consistence absolutely won over creativity (SOC consistency vs creativity conundrum is covered here). Some “compliance SOC” devolve even deeper into “cargo cult SOC” where the processes are rigid and diligently followers; they are also wrong. This combined fragile process stack with wrongness, a killer combo. Overly rigid, ‘checkbox’ SOC processes that crumble under pressure,” and if diligently followed, they lead you wrong.

Weirdly, over-reliance on “fair weather”, fragile automation fits here too. While automation is hugely valuable and is a bit part of the cure, an over-reliance on fragile automated tools that break when the attacker shows up and tries something funny can become a disease.

Similarly, lack of threat hunting and adversarial simulation (both essentially “proactive” i.e. not attacker-led processes) is a feature of a fair weather SOC. The SOC primarily reacts to alerts and incidents, with minimal proactive threat hunting or simulated adversarial testing. This leaves them unprepared for sophisticated attacks that bypass standard detection methods.

What to do about it: Hunt! Do security things that do not follow rigid processes, require and stimulate creativity. Implement regular threat hunting exercises, conduct red team/blue team exercises, and engage external experts for penetration testing.

Call to action

So, “Is your SOC a fair-weather friend?” Review our list, do a solid “red team” run on it and call Mandiant for an assessment, then double down on building things that help “when the attacker shows up.”

Resources:

- By Anton Chuvakin (Ex-Gartner VP Research; Head Security Google Cloud)

Original link of post is here

All Articles

A Fair Weather SOC: 5 Signs It’s Time to Panic (and Fix It!) | Anton Chuvakin

Comments

Join The Community Discussion

Read More

Announcing Fireside Chat With Dan Bowden (Global Business CISO, Marsh McLennan (Marsh, Guy Carpenter, Mercer, Oliver Wyman))

CISO Cocktail Reception At RSA Conference USA, San Francisco 2025 !

CISO's First 30 Days Cheatsheet With Mathew Ireland, CISO, NTT Research & Bikash Barai, Cofounder CISO Platform & FireCompass

Our Vision At The CISO Platform Community | Bikash Barai

Note: this page contains paid content.

CISO Platform

A global community of 5K+ Senior IT Security executives and 40K+ subscribers with the vision of meaningful collaboration, knowledge, and intelligence sharing to fight the growing cyber security threats.

City Round Table Meetup - Mumbai, Bangalore, Delhi, Chennai, Pune, Kolkata

Round Table Dubai 2025 | GISEC

Fireside Chat With Dan Bowden (Global Business CISO, Marsh McLennan (Marsh, Guy Carpenter, Mercer, Oliver Wyman))

CISO Platform: CISO 100 Awards & Future CISO Awards @ Atlanta