Skip to content

A collection of various applied Machine Learning and Artificial Intelligence projects I have done.

Notifications You must be signed in to change notification settings

gabrieldeolaguibel/ML-Projects

Repository files navigation

ML/AI Projects

This repository showcases a collection of various applied Machine Learning and Artificial Intelligence projects I have done.

Each project explores different aspects of ML/AI techniques, demonstrating the implementation and analysis of various models and algorithms in real-world scenarios. The projects span across multiple domains, including natural language processing, computer vision, regression analysis, and unsupervised learning, providing a comprehensive overview of practical applications in data science.

Featured Projects

This project involved building a Spam Classification model using Support Vector Machines (SVM). It explores both linear and non-linear SVMs, with detailed experimentation on C-value selection and the application of Gaussian (RBF) kernel for non-linear data. The final classifier achieves an accuracy of 97.8% in identifying spam emails. The project includes cross-validation for optimal hyperparameter selection and provides insights into the model's performance across different data representations.

This project utilizes the YOLO (You Only Look Once) object detection algorithm to train a custom model for detecting sleepers and nails in railroad track images. The project includes building the model from scratch using the YOLOv8 architecture, with extensive training and validation. The model demonstrates high accuracy in detecting railroad tracks, showcasing the power of YOLO in real-time object detection tasks.

This project applies tree-based machine learning models to predict the probability of payment collection within 90 days. The models used include Random Forest, Gradient Boosting, XGBoost, LightGBM, and CatBoost. Extensive feature engineering, including handling missing data, outlier treatment, and creating interaction terms, was performed. The final model evaluation shows that LightGBM and CatBoost provide the best performance, balancing accuracy and recall effectively.

This project focuses on predicting property prices using a combination of regression and classification models. Techniques such as linear regression, decision trees, and ensemble methods are explored, with a strong emphasis on feature engineering, including log transformations and handling categorical variables. The project highlights the challenges of price prediction and the effectiveness of different models in capturing property value trends.

In this project, advanced natural language processing (NLP) and deep learning techniques are applied to classify medical texts. The project explores the use of Vanilla Neural Networks, recurrent neural networks (RNNs), convolutional neural networks (CNNs), and LSTMs to compare classification accuracy. The model's performance is evaluated using various metrics.

This unique project combines image processing with natural language understanding using Large Language Models (LLMs). The project involves extracting information from images, such as accident reports, using OpenAI's GPT models. The integration of visual data with text processing is demonstrated, highlighting the capabilities of LLMs in interpreting and extracting useful information from complex documents.

This project applies unsupervised learning techniques for fraud detection. Clustering methods, such as K-means and DBSCAN, along with anomaly detection techniques, are used to identify fraudulent activities without labeled data. The project emphasizes the importance of feature selection and the challenges of working with imbalanced datasets in unsupervised settings.

This project demonstrates the application of the Perceptron Learning Algorithm on a bidimensional dataset, as well as in higher dimensions (10D). The project involves creating a random line in a plane to serve as a target function, followed by generating a dataset classified by this line. The perceptron algorithm is then applied to this dataset to determine how quickly it converges and how well the final hypothesis matches the target function. The project extends to a higher-dimensional space, showcasing how the algorithm performs when handling more complex datasets. Linear separability and misclassification correction techniques are key components of this analysis.