How Human Intelligence Supercharges Crowdstrike’s Artificial Intelligence

Crowdstrike Security Cloud processes trillions of events per day from endpoint sensors, but human experts play a key role in providing the structure and ground truth for Artificial Intelligence (AI) to be effective. Without human experts, AI would be useless.

There is a new trope in the security industry and it’s like this: to keep yourself safe, you need an AI-powered solution that works on your own and in order to do that, you need to keep those awkward people away. That is. As a trainee with a track record of bringing AI into cybersecurity – not because Marketing These days it is demanding but due to its actual purpose to solve security issues – I think this characterization is vague.

If you find this controversial, note that this only happens in the cybersecurity industry. This characterization is completely controversial among AI and machine learning (ML) researchers. And even in other industries, enhancing human expertise is completely normal. How much? You can purchase services for people to label your data sets. Some companies even get leverage Crowdsource processes To get labels from regular users. You may have already collaborated in such an endeavor when you prove to the website that you are not a robot.


How widespread is this vicious security posture? There are two mistakes in the game. If you’re a half-filled glass person, you can call those myths. But if you focus on the top of the glass, you may point them out incorrectly. First, artificial intelligence is not really intelligent. Converse with your smart speaker to assure you of that fact. AI is often a set of algorithms and techniques that provide useful results. But sometimes they fail in odd and vague ways. It also has its own uniqueness Attack surface Opponents can gain credit if left unprotected. It is dangerous to consider AI as a panacea to solve the difficulties of our industry, as I discussed last year Invited discussion In the workshop Strengthening of AI systems against counter-attacks.

Second, we are all still tired from signing days. By then, the signatures had taken effect, first the threats had stopped, then the new threats began to disappear, prompting humans to write new signatures, and the next day the cycle would restart. Obviously, this approach is a losing proposition – not only is this model fully reactive, but its speed is clearly limited by the human response time. However, AI models are not integrated in this way to avoid threats. The AI ​​model on the Crowdstrike Falcon platform does not require human interaction to stop the threat on its tracks. CrowdStrike uses AI exclusively Identify unexpected threats yet – Without the need for any updates.

Date, date, date

What does it take to train an AI model that can reliably do such a feat? Essentially, it takes data. And that’s a lot. CrowdStrike Security processes trillion events from cloud endpoint sensors Per day. With this in mind, 500 page office printer paper is 50 millimeters thick (approximately 2 inches). One trillion pages is about 100,000 kilometers high or about 60,000 miles high. These are enough for you to earn Gold status Every day On most airlines, but it can take you up to four days to travel this distance at normal airline cruising speeds. And after those four days, the stock reaches the moon.

However, this metaphor is not limited to stock length. Crowdstrike Security covers big footprints such as cloud endpoint security, cloud security, identity protection, threat intelligence and more. For each of these aspects, we process complex and sensitive data records. All of this information is contextualized and correlated in our proprietary CrowdStrike threat graph.Large distribution graph database that we have developed.

The Falcon Platform was created from the ground up as a cloud-native system to effectively process this data volume in meaningful ways. None of this is possible on the device. And none of that is possible with hybrid cloud solutions – that is, just cloud stocks of vendor-operated rock-mounted accessories. They make as much sense as streaming video from the VCR to the Internet.

More data allows us to identify blurred signals. Think about planning the latitude and longitude of US cities on graph paper. In the beginning, you will see some randomly scattered points. After doing this for a large number of cities, the familiar shape slowly emerges from the points cloud. And it is in the shape of the United States. However, if everyone used “native” graph paper to plot certain cities near them, the pattern would never be clear.

Construction and Ground Truth

So how do humans fit into the picture? If so much information is piled up on our printer paper, how can humans have the opportunity to fight for information and impact that not even an airplane can sustain it?

There are two ways. First, stacking sheets is not the smartest way to handle them. Placing them flat on each other makes the paper square approximately 250 times 250 kilometers (150 miles to the side). It’s more manageable – such an area can be mapped. Instead if we organize the paper rims in a cube it will be approximately 180 × 180 × 180 meters cube (or 600 feet per edge). Note that it is now meters, not kilometers anymore, which is very compact and ready to chart. The takeaway is that the problem becomes easier by managing the data from more angles and taking into account the adjacent ones. That is the goal of our cloud and threat graph.

Second, not all data is created equal. There is another type of data that humans can collaborate on. We call this kind of data The real truth, And this has a significant impact on the training of AI models. Ground truth is the type of data that describes how an AI model might behave under a particular input. For our metaphorical paper stock, Ground Truth is an example of whether a sheet of paper corresponds to a threat (e.g., red-colored sheet) or benign activity (green-colored sheet). If you organize your data in meaningful ways, as described earlier, you will only need a few colored sheets to reduce the information for the entire rim of paper. Imagine you are pulling a sheet out of a ram somewhere in our paper cube and it turns red. The other sheets in that rim are also red. And some of the adjacent rims are mostly red paper. Some types of AI learn how: they find how to respond to similar (adjacent) inputs based on ground truth – this is called Supervised Learning.

Supervised practice is a powerful way to design highly accurate classification systems – that is, systems with high true positive rates (reliable detection of threats) and low false positive rates (rarely causing alarms on benign behavior). Not all exercises need to be performed using ground truth (domain Unsupervised Learning is related to other technologies, for example). But as soon as the time comes to evaluate whether such an AI system is working as intended, you also need ground truth.

Finally, since Ground Truth is often a rare object, more rare than other data, other methods combine these two approaches. In Semi-supervised Learning, an AI is trained on a large amount of data in a way that is not supervised, and then it is tuned using supervised training using less ground truth. In Self-monitoring Learning, AI takes clues from the structure in the data.

Humans, humans, humans

At Crowdstrike, we designed our systems to increase Ground Truth Generation. For example, whenever the Crowdstrike Falcon Overwatch ంపు bullying hunters find an opponent in the network, those investigations turn into new facts. Similarly, when overwatch experts assess suspicious activity as harmless, it also becomes a fact of life. Those data points help train or evaluate AI systems. We generate this type of data on a daily basis using our vantage point in the cloud. It allows us to train better models and build better systems with better understood performance features.

AI systems can do the same Flag events Earth truth here is minimal and there is a high degree of uncertainty. Although AI can still prevent threats from being delayed in those situations, flagged data can be reviewed later by humans to maximize the amount of ground truth available where it is most important. Alternatively, other ways can provide additional data, for example an explosion inside CrowdStrike Falcon X ™ Malware Analysis Sandbox To observe threat behaviors in a controlled environment. Such solutions are based on a model called Active Learning.

Active learning is a useful way to get limited resources to spend human attention where it is most important. AI decisions do not stop – AI continues to analyze and stop threats. We call this the “fast loop”. The Falcon Overwatch team, among others, analyzes the surface of our AI systems and provides experts who feed our AI algorithms. In this way, our AI models receive a consistent view of where they have been successful and where we have detected and stopped novel attacks by other means. AI learns from this feedback and incorporates it into future identities. We call this part the “long loop”. As a result, our AI is constantly improving as we enter the new data system.

Proof points

Every day we prove that this approach is the best in the field by repelling adversaries from our customer networks, preventing data theft and ensuring that the livelihoods of the companies we serve – their information and intellectual property – are protected.

In addition, we have one Strict test record Several independent third-party evaluations by leading testing companies such as AV-Comparatives, SE Labs and MITRE. AI-centric vendors avoid testing that imposes fines for false positives – but not CrowdStrike. Public reports from independent testing firms confirm Crowdstrike’s commitment to transparency, especially as AI has become a widespread technology for working with data.

Outside of testing, CrowdStrike was also the first NGAV vendor VirusTotal makes our technology instantly available We immediately provide our technology for public scrutiny, and for use by the research community Hybrid analysis. Transparency is the core principle of our privacy-wise-design approach: CrowdStrike makes its submissions with transparency as its core value so that customers can see exactly what is being processed, make decisions about how it is processed and select retention periods.

Final ideas

AI is becoming a ubiquitous tool to stop cyber threats, but it is important to look beyond the presence of an AI algorithm somewhere in the data flow. Assessing the capabilities of an AI system by understanding where the data is coming from, including the required ground truth, is crucial. Artificial intelligence can only be learned if new facts are constantly being entered into the system on a scale, and the humans in the loop are the hallmark of a well-designed AI system.

Additional resources

Leave a Comment