3a. Use case analysis: turning on safe-mode vs post-moterm analysis
The H1ST.AI approach to this problem begins by thinking about the end-users of the decision system, and their uses cases.
What are the use cases for such Automotive Cybersecurity system? We can envision two distinctive use cases:
- The onboard intrusion detection system can detect an attack event in realtime and set the car into a safe mode so that drivers can safely get to a safe location and not be stuck in the highway with malfunctioning cars.
- An security expert could review the attack in post-mortem mode, in which the IDS provides message-by-message attack vs normal classification.
For use case #1 “safe mode triggering by attack event detection”, the ML requirement is that it has near-zero FPR.
To give an example, each second might contain 100 of CAN messages per car. If we have a fleet with just 1000 cars, each driven 1h per day, then a FPR of 0.001% at message-level still means that each day we have 0.00001 x 100msg x 3600s x 1000cars = 3600 false positive events per day that a security operation center will need to handle!
Additionally, for deployment & anticipated regulatory purpose, the system should behave robustly and explainably. While explainability is a complex subject, we meant that one could anticipate the system’s behavior reasonably well, as well as for legal/regulation purposes. As we saw with iForest or GBM ML models, they don’t quite meet this requirement, as it is hard to explain precisely how these models classify attacks, even if they can achieve good accuracy.
For use case #2 “post-morterm analysis”, it turns out that the requirement is very different. Some FPR could be traded off for higher TPR for post-mortem. And the system might not need to highly explainable as it is after all the jobs of the security experts to analyze the attacks in depth and make the final decisions.
3b. Problem (re)formulation into H1st.AI Graph
We reformulate the problem into the form of a decision graph, where the outermost flow detects attack events and corresponding yes branches handles message classification. For this tutorial we focus on injection attacks which are most common in the wild (we will revisit this later).
The graph looks like this.
3c. Encoding human insights for event detection as a H1st.Model¶
Remember when we start analyzing the CAN dataset, we have remarked that the normal data is highly regular, especially in terms of the message frequency for each CAN ID.
It turns out that using message frequency statistics for injection event detection is highly accurate for safe-mode use cases (high TPR, low FNR). This surprising fact was first pointed out by the original CAN bus hackers Chris Valasek and Charlie Miller in the seminal white paper Adventures in Automotive Networks and Control Units.
It is pretty straightforward to detect the attacks discussed in this paper. They always involve either sending new, unusual CAN packets or flooding the CAN bus with common packets… Additionally, the frequency of normal CAN packets is very predictable… Therefore we propose that a system can detect CAN anomalies based on the known frequency of certain traffic and can alert a system or user if frequency levels vary drastically from what is well known.
Using H1ST, we can encode insights of such “human” models and use them just like ML models. An h1.Model is essentially anything that can predict. H1ST provides tools to help automate their saving and loading, too, easing the way for using them in an integrated decision system.
A data-science project in H1ST.AI is designed to be a Python-importable package. You can create such a project using the h1 command-line tool.
Organizing model code this way makes it easy to use. The Model API is uniquely designed so that models can be used interactively in notebooks as well as in more complex project such as this one.
The H1st package of the full tutorial is available from the H1st Github project at https://github.com/h1st-ai/h1st/tree/master/examples/AutoCyber.
Simply go ahead and clone it, then follow along.
The details of training the message frequency statistics is quite simple: looping through a number of files to compute window statistics such as how many msg per CAN ID are found & what’s the min & max and percentile values.
The content of models/msg_freq_event_detector.py should look like following.
Now let’s import and train this MsgFreqEventDetectorModel.
Using h1st.Model enable ease of saving/loading them. By default, the “model”, “stats” and “metrics” properties are persisted and they support a variety of flavors & data structure.
We call h1.init() to setup the model repository with storage location specified in MODEL_REPO_PATH. You can also use put MODEL_REPO_PATH in config.py and call h1.init() without any parameter.
It should take several minutes to compute the regular frequency a.k.a. “train” this model.
Persisting returns a model version ID that you can use to load it back later, (or you can also give it name).
3d. Working with H1st Graph
Let’s now make some event-level predictions.
Note that since the model was persisted using H1st model repo, this means that we can easily come back to a notebooks and/or scripts and load the trained model or computed statistics.
Importantly, H1st allows much speedier integration into a Graph (and later deployment, too).
And we should see that we can start detecting attacks events. We’ll evaluate this later, and now let’s finish adding our detection graph by adding the message classifier.
Note that the graph returns separate output keys, collected from all the nodes’s outputs. Typically each node is expected to return a dict.
3e. Adding a message classifier, harmonizing human + ML models in the graph¶
For message-level classification we can simply bring back our gradient-boosted trees which did a decent job of recognizing injection messages. (Integrating sequence model such as Bidirectional LSTM is left as an exercise for the reader).
As before, we’ve re-orgarnized it as a H1st.Model in the tutorial folder, ready for use.
The content of models/gradient_boosting_msg_classifier.py looks like this.
Putting everything together in a h1.Graph and running through graph.predict() on a single file looks like this.
The confusion matrix for message-level classification looks like this.
Now let’s evaluate the whole graph against the test set, especially focusing on the event-level TPR & FPR since they are crucial in the safe-mode deployment use case.
Now that’s something! Event-level FPR=0.0% with zero false positives!
(Note that the provided attack samples was created on a subset of the driving trips, but you should able to do a more thorought evaluation by running against synthetic attacks created from the all driving trips dataset, and the results should be the same: zero false positive at event-level.)
The message-level accuracy should be nearly the same because we used the same classifier. However the decomposition leads to separation of concerns and requirement for these two use cases. We’re much more comfortable with the solution now both in terms of accuracy as well as robustness and explainability.
Another significance worth pointing out here is that we get multiple output streams from H1st.Graph: event-level outputs and msg-level outputs, exactly what we need for two different use cases we highlighted: safe-mode triggering and post-mortem analysis.