3a. Use case analysis: turning on safe-mode vs post-moterm analysis
The H1ST.AI approach to this problem begins by thinking about the end-users of the decision system, and their uses cases.
What are the use cases for such Automotive Cybersecurity system? We can envision two distinctive use cases:
The onboard intrusion detection system can detect an attack event in realtime and set the car into a safe mode so that drivers can safely get to a safe location and not be stuck in the highway with malfunctioning cars.
An security expert could review the attack in post-mortem mode, in which the IDS provides message-by-message attack vs normal classification.
For use case #1 “safe mode triggering by attack event detection”, the ML requirement is that it has near-zero FPR.
To give an example, each second might contain 100 of CAN messages per car. If we have a fleet with just 1000 cars, each driven 1h per day, then a FPR of 0.001% at message-level still means that each day we have 0.00001 x 100msg x 3600s x 1000cars = 3600 false positive events per day that a security operation center will need to handle!
Additionally, for deployment & anticipated regulatory purpose, the system should behave robustly and explainably. While explainability is a complex subject, we meant that one could anticipate the system’s behavior reasonably well, as well as for legal/regulation purposes. As we saw with iForest or GBM ML models, they don’t quite meet this requirement, as it is hard to explain precisely how these models classify attacks, even if they can achieve good accuracy.
For use case #2 “post-morterm analysis”, it turns out that the requirement is very different. Some FPR could be traded off for higher TPR for post-mortem. And the system might not need to highly explainable as it is after all the jobs of the security experts to analyze the attacks in depth and make the final decisions.
3b. Problem (re)formulation into H1st.AI Graph
We reformulate the problem into the form of a decision graph, where the outermost flow detects attack events and corresponding yes branches handles message classification. For this tutorial we focus on injection attacks which are most common in the wild (we will revisit this later).
The graph looks like this.
3c. Encoding human insights for event detection as a H1st.Model¶
Remember when we start analyzing the CAN dataset, we have remarked that the normal data is highly regular, especially in terms of the message frequency for each CAN ID.
It turns out that using message frequency statistics for injection event detection is highly accurate for safe-mode use cases (high TPR, low FNR). This surprising fact was first pointed out by the original CAN bus hackers Chris Valasek and Charlie Miller in the seminal white paper Adventures in Automotive Networks and Control Units.
It is pretty straightforward to detect the attacks discussed in this paper. They always involve either sending new, unusual CAN packets or flooding the CAN bus with common packets… Additionally, the frequency of normal CAN packets is very predictable… Therefore we propose that a system can detect CAN anomalies based on the known frequency of certain traffic and can alert a system or user if frequency levels vary drastically from what is well known.
Using H1ST, we can encode insights of such “human” models and use them just like ML models. An h1.Model is essentially anything that can predict. H1ST provides tools to help automate their saving and loading, too, easing the way for using them in an integrated decision system.
A data-science project in H1ST.AI is designed to be a Python-importable package. You can create such a project using the h1 command-line tool.
Organizing model code this way makes it easy to use. The Model API is uniquely designed so that models can be used interactively in notebooks as well as in more complex project such as this one.
The details of training the message frequency statistics is quite simple: looping through a number of files to compute window statistics such as how many msg per CAN ID are found & what’s the min & max and percentile values.
The content of models/msg_freq_event_detector.py should look like following.
import pandas as pd
import h1st as h1
import config
import util
class MsgFreqEventDetectorModel(h1.Model):
def load_data(self, num_files=None):
return util.load_data(num_files, shuffle=False)
def train(self, prepared_data):
files = prepared_data["normal_files"]
from collections import defaultdict
def count_messages(f):
df = pd.read_parquet(f)
counts = defaultdict(list)
for window_start in util.gen_windows(df, window_size=config.WINDOW_SIZE, step_size=config.WINDOW_SIZE):
w_df = df[(df.Timestamp >= window_start) & (df.Timestamp < window_start + config.WINDOW_SIZE)]
for sensor in config.SENSORS:
counts[sensor].append(len(w_df.dropna(subset=[sensor])))
return pd.DataFrame(counts)
ret = [count_messages(f) for f in files]
df = pd.concat(ret)
self.stats = df.describe()
def predict(self, data):
df = data['df']
window_starts = data["window_starts"]
window_results = []
for window_start in window_starts:
w_df = df[(df.Timestamp >= window_start) & (df.Timestamp < window_start + config.WINDOW_SIZE)]
results = {}
for _, sensor in enumerate(config.SENSORS):
w_df_sensor = w_df.dropna(subset=[sensor])
max_normal_message_freq = self.stats.at['max', sensor]
msg_freq = len(w_df_sensor)
if msg_freq > (max_normal_message_freq * 1.1):
results[sensor] = 1
else:
results[sensor] = 0
# print("%s => %s" % ((window_start, sensor, msg_freq, max_normal_message_freq), results[sensor]))
results["WindowInAttack"] = any(results.values())
results["window_start"] = window_start # information for down-stream
window_results.append(results)
return {"event_detection_results": window_results}
Now let’s import and train this MsgFreqEventDetectorModel.
Using h1st.Model enable ease of saving/loading them. By default, the “model”, “stats” and “metrics” properties are persisted and they support a variety of flavors & data structure.
Note
We call h1.init() to setup the model repository with storage location specified in MODEL_REPO_PATH. You can also use put MODEL_REPO_PATH in config.py and call h1.init() without any parameter.
It should take several minutes to compute the regular frequency a.k.a. “train” this model.
m.train(data)
m.stats
SteeringAngle
CarSpeed
YawRate
Gx
Gy
count
11084.000000
11084.000000
11084.000000
11084.000000
11084.000000
mean
34.316763
17.158607
34.314778
34.314778
34.314778
std
1.311491
2.121101
1.359257
1.359257
1.359257
min
0.000000
0.000000
0.000000
0.000000
0.000000
25%
33.000000
17.000000
33.000000
33.000000
33.000000
50%
34.000000
17.000000
34.000000
34.000000
34.000000
75%
35.000000
18.000000
35.000000
35.000000
35.000000
max
40.000000
22.000000
41.000000
41.000000
41.000000
Persisting returns a model version ID that you can use to load it back later, (or you can also give it name).
m.persist()
2020-09-17 19:59:38,891 INFO h1st.model_repository.model_repository: Saving stats property...
'01EJFJEB89MNS65B4CK714TT4B'
3d. Working with H1st Graph
Let’s now make some event-level predictions.
Note that since the model was persisted using H1st model repo, this means that we can easily come back to a notebooks and/or scripts and load the trained model or computed statistics.
Importantly, H1st allows much speedier integration into a Graph (and later deployment, too).
And we should see that we can start detecting attacks events. We’ll evaluate this later, and now let’s finish adding our detection graph by adding the message classifier.
Note that the graph returns separate output keys, collected from all the nodes’s outputs. Typically each node is expected to return a dict.
3e. Adding a message classifier, harmonizing human + ML models in the graph¶
For message-level classification we can simply bring back our gradient-boosted trees which did a decent job of recognizing injection messages. (Integrating sequence model such as Bidirectional LSTM is left as an exercise for the reader).
As before, we’ve re-orgarnized it as a H1st.Model in the tutorial folder, ready for use.
The content of models/gradient_boosting_msg_classifier.py looks like this.
import h1st as h1
import pandas as pd
import config
import util
FEATURES = config.SENSORS + ["%s_TimeDiff" % s for s in config.SENSORS]
class GradientBoostingMsgClassifierModel(h1.Model):
def load_data(self, num_files=None):
return util.load_data(num_files, shuffle=False)
def prep(self, data):
def concat_processed_files(files):
dfs = []
for f in files:
z = pd.read_parquet(f)
z = util.compute_timediff_fillna(z, dropna_subset=FEATURES)
dfs.append(z)
df2 = pd.concat(dfs)
return df2
split = int(len(data["attack_files"])*0.5)
train_files = data["attack_files"][:split]
test_files = data["attack_files"][split:]
result = {
"train_files": train_files,
"test_files": test_files,
"train_attack_df": concat_processed_files(train_files),
"test_attack_df": concat_processed_files(test_files)
}
print("len train_attack_df = %s" % len(result["train_attack_df"]))
print("len test_attack_df = %s" % len(result["test_attack_df"]))
return result
def train(self, prepared_data):
df = prepared_data["train_attack_df"]
from sklearn.experimental import enable_hist_gradient_boosting
from sklearn.ensemble import HistGradientBoostingClassifier
X = df[FEATURES]
y = df.Label == config.ATTACK_LABEL
self.model = HistGradientBoostingClassifier(max_iter=500).fit(X, y)
def evaluate(self, prepared_data):
df = prepared_data["test_attack_df"]
ypred = self.model.predict(df[FEATURES])
import sklearn.metrics
cf = sklearn.metrics.confusion_matrix(df.Label == config.ATTACK_LABEL, ypred)
acc = sklearn.metrics.accuracy_score(df.Label == config.ATTACK_LABEL, ypred)
print(cf)
print("Accuracy = %.4f" % acc)
self.metrics = {"confusion_matrix": cf, "accuracy": acc}
def predict(self, data):
df = data["df"].copy()
df = util.compute_timediff_fillna(df)
df['MsgIsAttack'] = 0
df['WindowInAttack'] = 0
for event_result in data["event_detection_results"]:
if event_result['WindowInAttack']:
# print("window %s in attack: event_result = %s" % (event_result['window_start'], event_result))
in_window = (df.Timestamp >= event_result['window_start']) & (df.Timestamp < event_result['window_start'] + config.WINDOW_SIZE)
w_df = df[in_window]
if len(w_df) > 0:
ypred = self.model.predict(w_df[FEATURES])
df.loc[in_window, "WindowInAttack"] = 1
df.loc[in_window, "MsgIsAttack"] = ypred.astype(int)
return {"injection_window_results": df}
from gradient_boosting_msg_classifier import GradientBoostingMsgClassifierModel
m2 = GradientBoostingMsgClassifierModel()
data = m2.load_data(num_files=6)
prepared_data = m2.prep(data)
len train_attack_df = 1030994
len test_attack_df = 868436
prepared_data["train_attack_df"]
Timestamp
SteeringAngle
CarSpeed
YawRate
Gx
Gy
Label
AttackSensor
AttackMethod
AttackParams
AttackEventIndex
SteeringAngle_TimeDiff
CarSpeed_TimeDiff
YawRate_TimeDiff
Gx_TimeDiff
Gy_TimeDiff
2
0.024343
67.604385
0.000000
0.189777
0.002458
-0.002173
Normal
NA
NA
0.0
<NA>
-1.000000
-1.000000
-1.000000
-1.000000
-1.000000
3
0.027083
67.608772
0.000000
0.189777
0.002458
-0.002173
Normal
NA
NA
0.0
<NA>
0.013509
-1.000000
-1.000000
-1.000000
-1.000000
4
0.037508
67.608772
0.000000
0.189665
0.002375
-0.002151
Normal
NA
NA
0.0
<NA>
-1.000000
-1.000000
0.013230
0.013230
0.013230
5
0.038148
67.613159
0.000000
0.189665
0.002375
-0.002151
Normal
NA
NA
0.0
<NA>
0.011065
-1.000000
-1.000000
-1.000000
-1.000000
6
0.043605
67.617538
0.000000
0.189665
0.002375
-0.002151
Normal
NA
NA
0.0
<NA>
0.005457
-1.000000
-1.000000
-1.000000
-1.000000
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
369439
1649.991996
7.202400
6.485162
0.998420
0.433515
0.138374
Normal
NA
NA
0.0
<NA>
-1.000000
-1.000000
0.011372
0.011372
0.011372
369440
1649.994320
7.168000
6.485162
0.998420
0.433515
0.138374
Normal
NA
NA
0.0
<NA>
0.012805
-1.000000
-1.000000
-1.000000
-1.000000
369441
1650.003266
7.168000
6.485162
0.994170
0.437117
0.137005
Normal
NA
NA
0.0
<NA>
-1.000000
-1.000000
0.011269
0.011269
0.011269
369442
1650.007432
7.133600
6.485162
0.994170
0.437117
0.137005
Normal
NA
NA
0.0
<NA>
0.013111
-1.000000
-1.000000
-1.000000
-1.000000
369443
1650.009595
7.133600
6.422659
0.994170
0.437117
0.137005
Normal
NA
NA
0.0
<NA>
-1.000000
0.024471
-1.000000
-1.000000
-1.000000
1030994 rows × 16 columns
m2.train(prepared_data)
[[831878 603]
[ 15049 20906]]
Accuracy = 0.9820
m2.persist()
2020-09-17 17:09:34,127 INFO h1st.model_repository.model_repository: Saving metrics property...
2020-09-17 17:09:34,129 INFO h1st.model_repository.model_repository: Saving model property...
'01EJF8PXNE6VT0SGJM0RHF12Y6'
Putting everything together in a h1.Graph and running through graph.predict() on a single file looks like this.
2020-09-17 20:38:30,155 INFO h1st.model_repository.model_repository: Loading version 01EJFJEB89MNS65B4CK714TT4B ....
2020-09-17 20:38:30,160 INFO h1st.model_repository.model_repository: Loading version 01EJF8PXNE6VT0SGJM0RHF12Y6 ....
Now let’s evaluate the whole graph against the test set, especially focusing on the event-level TPR & FPR since they are crucial in the safe-mode deployment use case.
from util import evaluate_event_graph
evaluate_event_graph(graph, prepared_data['test_files'])
Now that’s something! Event-level FPR=0.0% with zero false positives!
(Note that the provided attack samples was created on a subset of the driving trips, but you should able to do a more thorought evaluation by running against synthetic attacks created from the all driving trips dataset, and the results should be the same: zero false positive at event-level.)
The message-level accuracy should be nearly the same because we used the same classifier. However the decomposition leads to separation of concerns and requirement for these two use cases. We’re much more comfortable with the solution now both in terms of accuracy as well as robustness and explainability.
Another significance worth pointing out here is that we get multiple output streams from H1st.Graph: event-level outputs and msg-level outputs, exactly what we need for two different use cases we highlighted: safe-mode triggering and post-mortem analysis.