Skip to content

Commit

Permalink
Merge pull request #1 from abhisheks008/main
Browse files Browse the repository at this point in the history
update
  • Loading branch information
Cgarg9 committed Dec 22, 2023
2 parents 57ed102 + ccc4849 commit 4c4e999
Show file tree
Hide file tree
Showing 59 changed files with 350 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
https://www.kaggle.com/competitions/nlp-getting-started/data
Dataset
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
EDA was used to compare tweets based on number of words, lettercount and the keywords mentioned. Bar Charts were used.

Confusion matrix was used to compare the performance of standard ML models

Graph of training and test accuracy were used for comparing performance of transformer based models

Now, we also use wordclouds to graphically depict the keywords and words with highest frequency in both kinds of tweets.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
71 changes: 71 additions & 0 deletions Disaster Tweets Prediction using Deep Learning/Models/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Disaster Twitter Sentiment Analysis NLP

## PROJECT TITLE

Disaster Twitter Sentiment Analysis NLP

## GOAL

The main goal of this project is to analyse the tweets on disasters and classify them as fake and real using Transformers

## DATASET

https://www.kaggle.com/competitions/nlp-getting-started/data

## DESCRIPTION

The main goal of this project is to analyse the tweets on disasters and classify them as fake and real using Transformers. Also, standard ML models like Random forest, SVC, logistic regression are used for this research

## WHAT I HAD DONE

This neural network architecture is tailored for Natural Language Processing (NLP) tasks. The Positional Embedding incorporates the sequential order of words, crucial for understanding context. The Transformer Encoder captures contextual information, enabling the model to comprehend relationships within input sequences. Global Max Pooling 1D extracts salient features, reducing dimensionality for efficient processing. Dropout mitigates overfitting, enhancing the model's generalization ability. The Dense layer produces the final prediction, with the architecture designed for tasks like text classification or sentiment analysis. Overall, this combination empowers the model to effectively process and interpret textual data, making it suitable for a range of NLP applications.

## MODELS USED

1. Random forest regressor
2. SVC
3. Logistic Regression
4. Decision Tree Regression
5. Transformer based Neural Network

## LIBRARIES NEEDED
- numpy
- pandas
- sklearn
- tensorflow
- keras
- scipy

## VISUALIZATION

EDA was used to compare tweets based on number of words, lettercount and the keywords mentioned. Bar Charts were used.
Confusion matrix was used to compare the performance of standard ML models
Graph of training and test accuracy were used for comparing performance of transformer based models
Now, we also use wordclouds to graphically depict the keywords and words with highest frequency in both kinds of tweets.

## EVALUATION METRICS

Confusion matrix was created and recall, f1 score, precision were used as metrics of accuracy

## RESULTS

Transformers provide 95% accuracy which is significantly higher than Logistic Regression and Decision Tree models coming at 81% accuracy

## CONCLUSION
The provided neural network architecture is well-suited for classifying disaster tweets as fake or real in the context of natural language processing (NLP). Here's how each component contributes to this task:

Positional Embedding:

Helps the model understand the order of words in tweets, capturing nuances and context essential for discerning fake or real information during a disaster.
Transformer Encoder:

Enables the model to process the entire sequence of words, capturing intricate relationships and contextual information, which is crucial for distinguishing between authentic and misleading content in disaster-related tweets.
Global Max Pooling 1D:

Extracts the most significant features from the encoded sequence, focusing on key information that might indicate whether a tweet is reporting a real disaster or spreading misinformation.
Dropout:

Mitigates overfitting, enhancing the model's ability to generalize from training data to unseen examples, which is vital for accurately classifying diverse disaster-related tweets.
Dense Layer:

Produces the final prediction, indicating whether a given tweet is likely to be real or fake based on the features extracted by the preceding layers.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Tensorflow
Keras
NLTK
Numpy
SkLearn
Pandas
Matplotlib
SeaBorn
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
https://www.kaggle.com/datasets/abbymorgan/penguins-vs-turtles
Dataset here
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
The data set has been visualized using line charts and confusion matrices to get the performance of the Resnet and CNN models
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
46 changes: 46 additions & 0 deletions Penguin and Turtle Image Classification using DL/Models/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
**PROJECT TITLE**: Penguin and Turtle Image Classification using DL

**GOAL**: To detect penguin and turtle images and classify them

**DATASET**: https://www.kaggle.com/datasets/abbymorgan/penguins-vs-turtles

**DESCRIPTION**:

The task relies on use of Baseline CNN model and ResNet 50 to detect if the given images are of penguins or turtles. ResNet (Residual Networks) and EfficientNet are popular deep learning architectures. ResNet introduces skip connections to mitigate vanishing gradient issues. EfficientNet optimizes model efficiency by balancing depth, width, and resolution. ResNet is well-established, while EfficientNet achieves competitive accuracy with fewer parameters, making it computationally efficient for resource-constrained environments.

**TASKS PERFORMED**:
1. Dataset created from the set of images and tagged as 1 and 0
2. Resizing, padding and augmentation tasks were done to properly extract features necessary for classification tasks
3. The Resnet50 model is used from keras_Cv to classify images, with box_loss and classification_loss as parameters used to compare the efficiency.
4. On the other hand, we prepare an EffecientNet model, with accuracy and loss as metrics used to reflect the performance of the model
5. Lastly confusion matrices help determine the misclassifications and correct classification for each type.

**MODELS USED**:
EfficientNet
ResNet


**LIBRARIES NEEDED**:

1. Numpy
2. Pandas
3. Matplotlib
4. Scikit-learn
5. Tensorflow
6. Keras

**VISUALIZATION**

![Alt text](<../Images/Screenshot (338).png>) ![Alt text](<../Images/Screenshot (335).png>)

The graphs compare loss of data for ResNet vs Effecient Net effectively
**ACCURACIES**:
Loss ranges around 0.10 to 9.20 for ResNet, while at averages at about 1 for EfficientNet architecture. ResNet exhibits a lower classification loss than EfficientNet on this dataset, which may suggest that ResNet is more effective in capturing the specific features and patterns present in the given images, which is essential while classifying animal images

**CONCLUSION**:
ResNet and EfficientNet are popular convolutional neural network architectures used for image classification tasks, including the detection of turtle and penguin images. ResNet employs residual learning, addressing the vanishing gradient problem by introducing skip connections, making it deeper and more effective. This helps capture intricate features in the images, crucial for identifying unique characteristics of turtles and penguins.

EfficientNet, on the other hand, optimizes model efficiency by balancing model depth, width, and resolution through a compound scaling method. It achieves high accuracy with fewer parameters, making it computationally efficient for detecting distinct patterns in turtle and penguin images.

Both ResNet and EfficientNet can be fine-tuned on the dataset containing annotated turtle and penguin images to create models which are highly accurate with upwards of 95% accuracy

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
numpy
pandas
tensorflow
sklearn
keras
2 changes: 2 additions & 0 deletions Sea Animal Detection Using Neural Networks/Datasets/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
https://www.kaggle.com/datasets/vencerlanz09/sea-animals-image-dataste/data
Dataset
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
The grouping of images by class has been shown that for tortoise/turtle higher number of images are present. But, the rest of the dataset is balanced overall
76 changes: 76 additions & 0 deletions Sea Animal Detection Using Neural Networks/Models/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# SEA ANIMAL DETECTION USING DEEP LEARNING
Full name : Aindree Chatterjee

GitHub Profile Link : https://github.com/aindree-2005

Email ID : [email protected]

Program : CodePeak

Approach for this Project :

**Description**
Using CNNS to handle image data and identify sea animals from a diverse set

**Model Used**
Input Layers:
Type: Conv2D
Parameters:
64 filters
Kernel size of (5, 5)
Activation function: ReLU
Padding: 'valid'
Input shape: (224, 224, 3)
This layer is responsible for detecting 64 different features using 5x5 convolutional filters on the input image.
Max Pooling Layer:

Type: MaxPooling2D
Parameters:
Pool size of (2, 2)
This layer performs max pooling, reducing the spatial dimensions of the representation.
Dropout Layer:

Type: Dropout
Parameters:
Dropout rate: 0.2
Dropout layers are used to prevent overfitting by randomly setting a fraction of input units to 0 at each update during training.
Batch Normalization Layer:

Type: BatchNormalization
Batch normalization normalizes the activations of the network, which can help improve training stability and speed.
Repeat of Convolutional, Pooling, Dropout, and Batch Normalization Blocks:

Similar blocks are repeated three more times with different filter sizes and configurations:
128 filters, (5,5) kernel, ReLU activation, MaxPooling, Dropout, and BatchNormalization.
256 filters, (3,3) kernel, ReLU activation, MaxPooling, Dropout, and BatchNormalization.
512 filters, (3,3) kernel, ReLU activation, MaxPooling, Dropout, and BatchNormalization.
1024 filters, (3,3) kernel, ReLU activation, MaxPooling.
Flatten Layer:

Type: Flatten
This layer flattens the input to a one-dimensional array, preparing it for the fully connected layers.
Dense Layer (Fully Connected):

Type: Dense
Parameters:
512 units/neurons
Activation function: ReLU
This layer connects every neuron in the previous layer to every neuron in this layer.
Dense Output Layer:

Type: Dense
Parameters:
Number of units: equal to the number of labels/classes
Activation function: Softmax
This is the output layer responsible for producing the final classification probabilities for each class.
**Visualisation**
Visualisation plots have been added in notebook for showing class distribution, accuracy and loss in data while training

**Accuracies**
90.19% using CNN

**My Conclusion**
Convolutional Neural Networks (CNNs) are ideal for sea animal detection due to their ability to automatically learn hierarchical features from image data. The hierarchical nature of CNNs allows them to capture spatial patterns and features crucial for identifying sea animals in underwater images. With their capacity to recognize complex patterns, CNNs excel at image classification tasks, making them well-suited for accurately detecting and classifying diverse sea creatures in marine environments.

**YOUR NAME**
Aindree Chatterjee

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions Sea Animal Detection Using Neural Networks/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
utils
keras
tensorflow
numpy
pandas
sklearn
2 changes: 2 additions & 0 deletions Store Sales Prediction Using Deep Learning/Dataset/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
https://www.kaggle.com/competitions/store-sales-time-series-forecasting/overview
Dataset
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Line Charts and box plots have been used for data visualisation by months
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

35 changes: 35 additions & 0 deletions Store Sales Prediction Using Deep Learning/Models/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
**PROJECT TITLE**
Store Sales Prediction using Deep Learning

**GOAL**
Store Sales Prediction using Deep Learning
**DATASET**
https://www.kaggle.com/competitions/store-sales-time-series-forecasting/overview

**DESCRIPTION**
The project uses RNN to make predictions for store sales.Dataset is updated daily and is dynamic. The project also aims to compare performance of Lasso, Ridge and Decision Tree regression models with respect to the use of Regression models
**WHAT I HAD DONE**
1. Used EDA and correlation matrix to figure out needed features
2. Tested using basic ML models like Ridge, Lasso, Linear and Decision Tree Regression
3. Tested using RNNs. Used multilayer networks for time-series data
4. RNNs have proven to be far more useful and versatile

**MODELS USED**
Lasso Regression, Ridge Regression,Decision Tree Regression, RNN

**LIBRARIES NEEDED**
Pandas, Numpy, Keras,TensorFlow, ScikitLearn, Seaborn, Matplotlib

**VISUALIZATION**
We use correlation matrix to visualize required features.
Line Charts are used to visualize day/month/store wise sales

**ACCURACIES**
MAE is lowest for RNNs at 55 to 70.
The highest MAE is provided by Linear Regression at 1000+ and considerably better by Lasso Regression and Decision Tree at a little over 100.

**CONCLUSION**
Recurrent Neural Networks (RNNs) are employed for time series data due to their ability to capture temporal dependencies. RNNs maintain a memory of past information, enabling them to process sequential data with contextual awareness. This makes them well-suited for tasks such as stock price prediction or weather forecasting, where understanding patterns over time is crucial. The recurrent nature of RNNs facilitates the modeling of dynamic relationships within time series datasets, enhancing their effectiveness in capturing temporal dependencies.

**YOUR NAME**
Aindree Chatterjee

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions Store Sales Prediction Using Deep Learning/Requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
tensorflow
keras
numpy
scipy
pandas
matplotlib
seaborn
2 changes: 2 additions & 0 deletions Twitter Sentiment Analysis NLP/Dataset/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
https://www.kaggle.com/datasets/kazanova/sentiment140
Dataset
1 change: 1 addition & 0 deletions Twitter Sentiment Analysis NLP/Images/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
EDA done through line plot, wordcloud, confusion matrix
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
57 changes: 57 additions & 0 deletions Twitter Sentiment Analysis NLP/Models/Readme.Md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Twitter Sentiment Analysis NLP

## PROJECT TITLE

Twitter Sentiment Analysis NLP

## GOAL

The main goal of this project is to analyse the tweets of people using LSTM and Keras Sequential model

## DATASET

https://www.kaggle.com/datasets/kazanova/sentiment140.

## DESCRIPTION

This project aims to perform a sentiment analysis on the tweets posted by various people, and group those into positive and negative tweets.

## WHAT I HAD DONE

1. Used NLTK to preprocess and clean text , using Stemmer, Lemmatizer, removing symbols, etc
2. Created sequential model using Keras, added weight initializers and regulators
3. Used Glove embeddings in other notebook
4. Created LSTM model with Conv1D, Spatial DropOut, Dense and other layers
5. Used confusion matrix
6. Used BERT for classification

## MODELS USED

1. Glove embeddings with LSTM
2. Sequential Model
3. BERT

## LIBRARIES NEEDED
- numpy
- pandas
- sklearn
- tensorflow
- keras
- scipy

## VISUALIZATION

![For Sequential Model](<../Images/Screenshot (277).png>)- keras sequential
![For LSTM](<../Images/Screenshot (279).png>) - lstm

## EVALUATION METRICS

Confusion matrix was created and recall, f1 score, precision were used as metrics of accuracy

## RESULTS

LSTM has higher accuracy (About 78%) compared to 72% of Keras sequential model. The highest accuracy is offered by BERT at 87%

## CONCLUSION

Long Short-Term Memory (LSTM) networks are beneficial in tweet sentiment analysis compared to Keras Sequential models due to their ability to capture contextual dependencies and handle sequential data effectively. Tweets often contain short and informal language, making it challenging for traditional models to discern sentiment accurately. LSTMs, with their memory cells, can capture nuances in the temporal structure of tweets, considering dependencies between words and phrases. This enables LSTMs to grasp the sentiment context better than simple sequential models. In contrast, Keras Sequential models may struggle to capture the inherent sequential nature and intricate dependencies present in tweet data, leading to suboptimal performance in sentiment analysis tasks.Using encoders from Transformer enables BERT to have a better context understanding than traditional neural networks such as LSTM or RNN since the encoder process all inputs, which is the whole sentence, simultaneously so when building a context for a word, BERT will take into account the inputs before it

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions Twitter Sentiment Analysis NLP/Models/tweet_lstm.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions Twitter Sentiment Analysis NLP/Models/tweetbert.ipynb

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions Twitter Sentiment Analysis NLP/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
numpy
pandas
sklearn
tensorflow
keras
scipy

0 comments on commit 4c4e999

Please sign in to comment.