Skip to content

Intrusion detection school project for network security subject using scikit-learn LogisticRegression, RandomForestClassifier, MLPClassifier, DecisionTreeClassifier models on KDD Cup 1999 and NSL-KDD dataset.

License

Notifications You must be signed in to change notification settings

TiieuTiien/intrusion-detection

Repository files navigation

Intrusion detection on KDD Cup 1999 and NSL-KDD dataset

Intrusion detection school project for network security subject using scikit-learn LogisticRegression, RandomForestClassifier, MLPClassifier, DecisionTreeClassifier models on KDD Cup 1999 and NSL-KDD dataset.

Set up

Download or clone this repository. You can download KDD Cup 1999 and NSL-KDD dataset in the dataset branch or use your own dataset.

Create a virtual environment

Create a virtual environment

python -m venv venv

Activate virtual environment

source venv/Scripts/activate

Install requirements to acquired the same environment and needed library

pip install -r requirements.txt

Extract models and dataset files in each folder

Execute main.py to start the app

python main.py

Data Format and Training Data

The dataset was used has been modified from the original. In particular, it was added 1 line of header in each csv file so that pandas can read it. If you want to use the origin dataset you can create a column with code instead, here is an example. One dataset will have two file (train and test).

The KDD Cup 1999 has header attached to it and the 'normal.' from feature 'label' was changed to 'normal' to encoding feature for both KDD Cup 1999 and NSL-KDD dataset. The column 'attack' in the original NSL-KDD was change to label so that I don't have to change the feature since they refer to the same feature.

Click Data train to import train file, Data test to import text file, choose model and click Train to train model.

The result will be save in the result/ folder.

Load data

In load_data function Encoding features that are not numeric

label_encoder = LabelEncoder()
clm=['protocol_type', 'service', 'flag']
for x in clm:
    csv_file[x]=label_encoder.fit_transform(csv_file[x])
csv_file['label'] = csv_file['label'].apply(lambda x: 1 if x == 'normal' else 0)

Split the data frame into two data frames

features_df = csv_file.drop(['label'], axis=1)
labels_df = csv_file['label']

Evaluation

Training

This 'detector' use four models from scikit-learn include: LogisticRegression, RandomForestClassifier, MLPClassifier, DecisionTreeClassifier. Train model based on the selected model and model fitting

model = train_model(selected_model)
model.fit(X_train, labels_train)

Evaluate

Create predictions to evalute model

predictions = model.predict(X_test)

Evaluate mode base on TP (True Positive), TN (True Negative), FP (False Positive), FN (False Negative) generated. On this project we choose to evaluate TPR and TNR

tn, fp, fn, tp = confusion_matrix(labels, predictions).ravel()
print(confusion_matrix(labels, predictions))
TPR = tp / (tp + fn)
TNR = tn / (tn + fp)

Result with KDD Cup 1999 dataset
kddcup_LogisticRegression kddcup_DecisionTreeClassifier kddcup_RandomForestClassifier kddcup_MLPClassifier

Result with NSL-KDD dataset
nslkdd_LogisticRegression nslkdd_DecisionTreeClassifier nslkdd_RandomForestClassifier nslkdd_MLPClassifier

Meta

P. Tien: [email protected]

Distributed under the MIT license. See LICENSE for more information.

https://github.com/TiieuTiien/

Reference

NgocDung211/KDD_CUP1999
Network Intrusion Detection using Python
Intrusion Detection System [NSL-KDD].

License: MIT

About

Intrusion detection school project for network security subject using scikit-learn LogisticRegression, RandomForestClassifier, MLPClassifier, DecisionTreeClassifier models on KDD Cup 1999 and NSL-KDD dataset.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages