A simple class project to classify a sample data-set of 1200 tweets to either positive or negative class, based on the type of words used in those tweet.
All data are located in the data directory.
PLUS_TRAINING_DATA = data/processed_plus_data.txt
MINUS_TRAINING_DATA = data/processed_minus_data.txt
First do a pip install -r requirements.txt
to install the required modules.
- To train the models, run
run_training.py
. First you should enter path to training data withPLUS_TRAINING_DATA
and then withMINUS_TRAINING_DATA
. After that you should select an algorithm to train. Options arenaiveBayes
,logisticRegression
,treeClassifier
. - To test the trained models, run
run_test.py
. For the first input enterPLUS_TRAINING_DATA
andMINUS_TRAINING_DATA
. Then choose a model fromKNN
,naiveBayes
,logisticRegression
,treeClassifier
,finalModel
. Then enter the path to the test data file likedata/test.txt
. At last, enter path to label file for test data. Sample files are provided indata
directory. - To evaluate models, run
run_estimation.py
. For evaluating, just enterPLUS_TRAINING_DATA
andMINUS_TRAINING_DATA
as parameters.