GitHub - connect-midhunr/sms-spam-classifier-using-complement-naive-bayes: Machine learning model that predicts whether a message is spam or not.

In this project, I have attempted to analyze the SMS spam dataset and build a machine learning model to predict whether the message is spam or not.

💾 Project Files Description

This project contains an executable iPython Notebook, a presentation and source as follows:

Executable Files:

SMS_Spam_Classifier.ipynb - Google Colab notebook containing data summary, exploration, visualisations, text processing, modelling and performance evaluation.

Source Directory:

SMSSpamCollection - Includes SMS spam collection.

📖 Problem Statement

Almost every person today owns a mobile phone with messaging and calling capabilities. Spam calls are infamous for the constant ringing of cell phones they often initiate to get promotional or fraudulent information to innocent customers. However, with the cheaper rates on bulk messaging services from wireless networks, a swarm of these spam calls has quickly shifted over to SMS messaging. There, in this scenario, classification becomes mandatory. The objective of this project is to understand the SMS spam collection dataset and build a machine learning model to predict whether the mail is spam or not.

📖 Approach

Understanding the business task.
Reading data from files given.
Data pre-processing.
Data visualization.
Text processing.
Modelling data.
Conclusion.

📖 Text Processing

Stemming is used for text normalization since getting base words is more crucial than the meaning of words to determine whether the message is positive or not.

Bag-of-Words was used for feature extraction from text since just the frequency of words needs to be considered instead of their importance.

📖 Modelling

Complement naive bayes classifier was used for training as each feature represents the frequency of the word in each message and to correct the severe assumptions made due to the imbalanced dataset.

📘: Conclusion

📜 Credits

Midhun R | Avid Learner | Data Analyst | Data Scientist | Machine Learning Enthusiast

Contact me for Data Science Project Collaborations

📚 References

Analytics Vidhya, 'Stemming vs Lemmatization in NLP: Must-Know Differences'. [Online].

Available: https://www.analyticsvidhya.com/blog/2022/06/stemming-vs-lemmatization-in-nlp-must-know-differences/
Medium, 'Fundamentals of Bag Of Words and TF-IDF'. [Online].

Available: https://medium.com/analytics-vidhya/fundamentals-of-bag-of-words-and-tf-idf-9846d301ff22/
Scikit-learn, 'sklearn.naive_bayes.ComplementNB'. [Online].

Available: https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.ComplementNB.html/

Image by upklyak on Freepik

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Images		Images
README.md		README.md
SMSSpamCollection		SMSSpamCollection
SMS_Spam_Classifier.ipynb		SMS_Spam_Classifier.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💾 Project Files Description

Executable Files:

Source Directory:

📖 Problem Statement

📖 Approach

📖 Text Processing

📖 Modelling

📘: Conclusion

📜 Credits

📚 References

About

Releases

Packages

Languages

connect-midhunr/sms-spam-classifier-using-complement-naive-bayes

Folders and files

Latest commit

History

Repository files navigation

💾 Project Files Description

Executable Files:

Source Directory:

📖 Problem Statement

📖 Approach

📖 Text Processing

📖 Modelling

📘: Conclusion

📜 Credits

📚 References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages