Skip to content

Automated Speech Recognition, Manifold Learning & Agglomerative Clustering

Notifications You must be signed in to change notification settings

akshayjoshii/Speech-Recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction:

The .tsv (removed from the repo) file contains phoneme vectors, or phoneme embeddings, that were obtained from a neural model of grapheme-to-phoneme (g2p) conversion. Each line in the file is a phoneme embedding, where the first entry in each line is the phoneme symbol in IPA, the rest of the 236 entries in each line are real-value numbers that represent the corresponding 236-dimensional vector.

Tasks:

  1. Conduct a small research on phoneme embeddings (VSM).
  2. Read the dataset into a suitable data structure (e.g., Pandas data frame, Python dictionary, Numpy array, etc.)
  3. Computing the pair-wise cosine similarity between the phonemes represented by the embeddings and obtaining a confusion matrix of similarity scores.
  4. Exploring the embeddings space with at different techniques. Perhaps, using dimensionality reduction and visualization (e.g., PCA, t-SNE), as well as a different clustering analysis.

Execution Instruction:

  1. Install Python 3
  2. Install PIP
  3. Run "pip install -r requirements.txt"
  4. Run "python SLR.py"

Implemented Functions:

  1. Pairwise Cosine Similarity Heatmap/Confusion Matrix
  2. Agglomerative Clustering & Dendrogram Visualization
  3. Priniciple Component Analysis (PCA)
  4. Independent Component Analysis (ICA)
  5. t-Distributed Stochastic Neighbor Embedding (t-SNE)
  6. Multidimensional Scaling (MDS - Metric)
  7. PCA - DBSCAN Clustering

Visualizations:

Cosine Similarity Heatmap:

Heatmap of pairwise cosine similarity of phoneme vectors

Agglomerative Clustering Dendrograms:

ward complete average single

Principle Component Analysis:

No. of PCs v/s Cumulative Variance:

Cumulative Variance

50 dimention data points reduced to 3 using 3 Priciple Components:

PCA

Independent Component Analysis:

ICA

t-Distributed Stochastic Neighbor Embedding (t-SNE) - Manifold Learning:

tsne

Multidimensional Scaling (MDS - Metric) - Manifold Learning:

mds

DBSCAN Clustering:

dbscan

Author

Akshay Joshi [Universität des Saarlandes]

Releases

No releases published

Packages

No packages published

Languages