Anomaly Detection

These are my notes & codes from the lecture: Unsupervised Learning, Recommenders, Reinforcement Learning given by DeepLearning.AI in Coursera. In this notebook I will explain the anomaly detection with numpy. You may find the original notebook here.

Data includes: X_train, X_val and y_val.

Let's import the libraries.

We will use the training dataset to fit the Gaussian distribution. Then we will use validation dataset and known true y values to select the treshold.

Dataset

Calculate mean an variance using Numpy

If you have only 2 features, you can calculate/estimate mean value as follows:

Or, you can get advantage of np.mean module:

Calculate p_val

Select the threshold $\epsilon$

We will use F1 score on cross validation set.

If an example $x$ has a probability less than epsilon: $p(x) < \varepsilon$, then it is classified as an anomaly.

prec = TP / (TP + FP) recall = TP/ (TP + FN)

F1 = (2 prec recall) / (prec + recall)

Find outliers on training set