TensorFlow Speech Recognition Challenge

Aim

The aim of this project is to detect and classify simple spoken commands from one-second long audio by learning from a labeled training set and testing it on an unlabeled test set.

Dataset

The dataset used is the Speech Commands Datasets which was released by TensorFlow. It includes 65,000 one-second long utterances of 30 short words, by thousands of different people. However, in this project challenge, we were supposed to classify the audio for one of the 12 classes, namely: yes, no, up, down, left, right, on, off, stop, go, silence, unknown. Note that the unknown label is used for a command that is not one one of the first 10 labels or that is not silence.

Implementation

I implemented 3 neural network architectures:

Combination of RNN LSTM nodes and CNN,
CNN with residual blocks similar to ResNet,
Deep RNN LSTM network;

Using the above, I compared their performance to detect 12 speech commands. The audio data is preprocessed to generate Spectogram images, followed by data augmentation and normalization. Achieved test accuracy of 74%, 76% and 71% in those 3 architectures respectively.

RNN: Recurrent Neural Network

LSTM: Long Short Term Memory

CNN: Convolutional Neural Network

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
__pycache__		__pycache__
LICENSE		LICENSE
README.md		README.md
data_augmentation.py		data_augmentation.py
data_preprocessing.py		data_preprocessing.py
main_CNN_RNN_LSTM_tensorflow.py		main_CNN_RNN_LSTM_tensorflow.py
main_CNN_Resnet_tensorflow.py		main_CNN_Resnet_tensorflow.py
main_CNN_tensorflow.py		main_CNN_tensorflow.py
main_RNN_LSTM_tensorflow.py		main_RNN_LSTM_tensorflow.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TensorFlow Speech Recognition Challenge

Aim

Dataset

Implementation

About

Releases

Packages

Languages

License

sdhayalk/TensorFlow_Speech_Recognition_Challenge

Folders and files

Latest commit

History

Repository files navigation

TensorFlow Speech Recognition Challenge

Aim

Dataset

Implementation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages