This project aims to develop a robust multi-modal sentiment analysis system that integrates visual cues from images with textual data to provide a more comprehensive understanding of human emotions.
In this project, we are using the CMU-MOSI dataset and the CMU-MOSEI dataset for sentiment analysis.
CMU-MOSI pickle files can be downloaded from here
We are using mosi_raw.pkl
for most of our models, and mosi_data.pkl
for our transformer models.
CMU-MOSEI pickle files can be downloaded from here
We are using mosei_raw.pkl
for most of our models, and mosei_senti_data.pkl
for our transformer models.
The Datsets notebook explains the procedure to download the dataset, and explores them.
Model | CMU-MOSI Accuracy | CMU-MOSEI Accuracy |
---|---|---|
Early Fusion (GRU) | 65.74% | 49.03% |
Early Fusion (Transformer) | 76.96% | 69.09% |
Late Fusion (GRU) | 70.26% | - |
Late Fusion (Transformer) | 74.34% | - |
Tensor Fusion | 72.74% | 67.11% |
Low Rank Tensor Fusion | 68.07% | - |
MFM | 66.47% | - |
MCTN | 73.76% | - |
MulT | 75.07% | 71.91% |
Note: Some of the models were not included for CMU-MOSEI dataset as they were not yielding the expected results, this is a topic for further exploration
Each notebook contains the graphs for train and validation losses. Furthermore, each notebook also contains the validation accuracy at each epoch. (They were not included in the report due to the page limit)
The resulting trained models are stored in models
directory, and the losses are stored in results
directory as pickle files.