Project Title: Feature Selection and SVM Model for SNP Genotype Data

Description/Problem Statement: Conducted feature selection on a simulated dataset of single nucleotide polymorphism (SNP) genotype data comprising 29,623 SNPs to extract important features using the f-score method. This process was performed in Python without relying on external libraries. The dataset included 4,000 cases and 4,000 controls as the training dataset.

Subsequently, a linear Support Vector Machine (SVM) model was built and trained using the selected features derived from the f-score method. The objective was to predict the outcomes of 2,000 test individuals accurately. Model optimization was pursued to achieve a target accuracy exceeding 63%.

The output of the project included the total number of features utilized and the column numbers corresponding to the selected features utilized for the final prediction.

Skills Utilized:

Feature Selection
Machine Learning (SVM)
Python Programming
Data Analysis
Model Optimization

Solution:

Implemented feature selection using the f-score method in Python without relying on external libraries.
Selected relevant features from the SNP genotype dataset to improve model performance and interpretability.
Constructed and trained a linear SVM model using the selected features to predict outcomes for test individuals.
Fine-tuned the SVM model to achieve an accuracy threshold of over 63%, ensuring robust predictive performance.
Provided the total count of selected features and the corresponding column numbers used for the final prediction, facilitating transparency and reproducibility of the results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Files

README.md

Latest commit

History

README.md

File metadata and controls