Skip to content

This repository have project on data manipulation , model building (Classification Model) and its visualization.

Notifications You must be signed in to change notification settings

ArfaKhalid/Prediction-of-Flight-Delays-using-Classification-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Enhancing Air Travel Efficiency with Machine Learning-Powered Flight Delay Prediction

Classification Models in Machine Learning

Classification models in machine learning are formidable tools designed to categorize and assign labels to data based on patterns and features. At their core, these models serve the crucial role of decision-makers, sorting input into distinct classes or groups. Whether it's classifying emails as spam or non-spam, identifying types of flowers in images, or predicting customer churn, classification models bring order to diverse datasets. Statistically speaking, the significance of classification models lies in their ability to learn from historical data, discern underlying patterns, and generalize this knowledge to make predictions on new, unseen data. Leveraging statistical algorithms, these models make informed decisions by evaluating features and assigning probabilities to various outcomes. Precision, recall, and F1 score are among the statistical metrics employed to gauge the effectiveness of a classification model, providing a quantitative measure of its accuracy and reliability. In essence, classification models stand as pillars of predictive analytics, enabling us to make sense of complex datasets and facilitating informed decision-making across various domains. Their statistical importance is evident in their capacity to transform raw data into actionable insights, ultimately enhancing our understanding of the world through the lens of machine learning.

Project Overview:

This project aims to predict whether a flight will arrive on time using a classification model. The dataset contains on-time arrival information for a major U.S. airline.

Step 1: Data Manipulation and Cleaning:

  1. Imported Data:

    • Imported the dataset using Pandas.
    • Displayed the first five rows to understand the structure.
  2. Data Cleaning:

    • Checked the shape of the dataset (11231 rows, 26 columns).
    • Identified and handled missing values:
      • Checked for missing values (True).
      • Found columns with missing values and their counts.
    • Removed the irrelevant column "Unnamed: 25."
  3. Data Preparation:

    • Filled missing values in the "ARR_DEL15" column with 1s, indicating late arrivals.
    • Quantized the "CRS_DEP_TIME" column by dividing each value by 100 and rounding down.
    • Converted categorical variables ("ORIGIN" and "DEST") into dummy variables.

Step 2: Model Building:

  1. Dataset Splitting:

    • Divided the dataset into training and testing sets (80-20 split).
  2. Feature and Label Columns:

    • Created feature columns (input variables) and label column ("ARR_DEL15").
  3. Classification Model (RandomForestClassifier):

    • Used the RandomForestClassifier from scikit-learn.
    • Trained the model on the training set.
  4. Model Testing:

    • Tested the model on the testing set.
    • Calculated the mean accuracy of the model.

Step 3: Model Evaluation:

  1. Metrics Used:
    • Used metrics to evaluate the model:
      • Mean accuracy (86.43%).
      • Area Under Receiver Operating Characteristic Curve (ROC AUC) score (0.70).
      • Confusion matrix to understand prediction proficiency.

Step 4: Model Improvement Suggestions:

  1. Potential Steps for Improvement:
    • Algorithm exploration.
    • Parameter tuning.
    • Dataset expansion for more robust training.
    • Addressing imbalance between late and on-time arrivals.

Step 5: Visualization:

  1. ROC Curve:
    • Plotted the ROC curve to visualize the model's performance.

Step 6: Prediction and Probability Visualization:

  1. Prediction Function:

    • Provided a function to predict the probability of on-time arrival for specific flights.
  2. Probability Plots:

    • Demonstrated probability plots for different flight scenarios.

Conclusion:

  • The RandomForestClassifier achieved a mean accuracy of 86.43% in predicting on-time arrivals.
  • The ROC AUC score (0.70) provides a more robust evaluation considering class imbalances.
  • Suggestions for model improvement and visualizations were discussed.

Overall Outcome:

The project successfully applied a classification model to predict flight arrival delays, providing insights into model performance and areas for improvement.

About

This repository have project on data manipulation , model building (Classification Model) and its visualization.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published