Voice Processing and Analysis System

This project involves a comprehensive voice processing and analysis system implemented in Python. It includes functionalities for recording audio, processing and extracting features, comparing voice samples, and training a machine learning model for voice recognition.

Features

Audio Recording: Record audio samples using the sounddevice library.
Noise Addition: Add Gaussian noise to audio samples for robustness.
Feature Extraction: Extract MFCC (Mel-Frequency Cepstral Coefficients) features from audio using librosa.
Voice Comparison: Compare voice features to identify matching samples.
Model Training: Train a Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) model for voice recognition using Keras.
Web Server Interface: Provide a web interface using socketio and aiohttp for real-time voice analysis.
Voice Identification in Group Conversations: Identify and extract a person's voice from a group conversation.

Dependencies

Python 3.x
Numpy
Sounddevice
Matplotlib
Scipy
Librosa
Keras
TensorFlow
Scikit-Learn
Speech Recognition
Pydub
Aiohttp
SocketIO
Threading
Asyncio

Installation

Clone the repository.

Install the required dependencies:

pip install numpy sounddevice matplotlib scipy librosa keras tensorflow scikit-learn SpeechRecognition pydub aiohttp python-socketio

OR

Docker:
```
docker build -t speaker-recognizer .
```

Run the script:

python main.py

OR docker

docker run -p 5678:5678 speaker-recognizer

Usage

Upon running the script:

The system will print the available audio devices.
Follow the prompts to record voice samples and group conversations.
The system will automatically process the recordings, extract features, and perform voice comparisons.
To train the model, place .wav files in the audio_training_data/ directory and run the training section of the script.
Access the web server interface on http://localhost:5678/ for real-time analysis.
Contributing
Contributions to this project are welcome. Please fork the repository and submit pull requests with your enhancements.

Running Tests

To run the tests for the application, navigate to the project directory and execute:

python -m unittest discover -s tests

Screenshots

Below are various screenshots depicting different functionalities of the voice processing system:

MFCC with Deltas Visualization:

This image shows the MFCC feature extraction with delta coefficients visualized as a heatmap. Each column represents a time frame, while each row represents cepstral coefficients. The intensity of colors reflects the magnitude of the coefficient values.

Available Audio Devices:

Here we see a list of available audio devices detected by the system. The device IDs are used to select appropriate input devices for recording audio.

Recording Prompts:

The system prompts the user to record individual and group voice samples, providing clear instructions for interaction.

Model Training Progress:

The system prompts the user to record individual and group voice samples, providing clear instructions for interaction.

Voice Identification Result:

This console output captures the progress of the machine learning model training over epochs, displaying loss and accuracy metrics.

Server Running:

After processing, the system outputs the result of voice identification, indicating to which group the analyzed voice sample belongs.

License

This project is licensed under the MIT License.

!!! IMPORTANT !!!

Note: The system is designed for educational and research purposes and may require further modifications for production-level deployment.

Example service frontend implementation in Angular

web-socket.service.ts:

import { Injectable } from '@angular/core'; import { Observable } from 'rxjs'; import io from 'socket.io-client'; @Injectable({ providedIn: 'root' }) export class WebSocketService { private socket: any private readonly uri: string = 'http://localhost:5678'; constructor() { this.socket = io(this.uri); } public onGetResult(): Observable<string> { return new Observable<string>(observer => { this.socket.on('response', (data: string): void => { observer.next(data); }); }); } public emitGetResult(data: any): void { this.socket.emit('get_result', data); } }

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
audio_training_data		audio_training_data
img		img
result_audio		result_audio
tests		tests
Dockerfile		Dockerfile
README.md		README.md
audio_analysis.py		audio_analysis.py
audio_processor.py		audio_processor.py
audio_utils.py		audio_utils.py
constants.py		constants.py
data_loader.py		data_loader.py
database.py		database.py
fragment_group_20231109150857_4526.npy		fragment_group_20231109150857_4526.npy
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt
server.py		server.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Processing and Analysis System

Features

Dependencies

Installation

Run the script:

Usage

Upon running the script:

Running Tests

Screenshots

MFCC with Deltas Visualization:

Available Audio Devices:

Recording Prompts:

Model Training Progress:

Voice Identification Result:

Server Running:

License

!!! IMPORTANT !!!

Example service frontend implementation in Angular

About

Releases

Packages

Languages

Przemekhasz/speaker-recognizer

Folders and files

Latest commit

History

Repository files navigation

Voice Processing and Analysis System

Features

Dependencies

Installation

Run the script:

Usage

Upon running the script:

Running Tests

Screenshots

MFCC with Deltas Visualization:

Available Audio Devices:

Recording Prompts:

Model Training Progress:

Voice Identification Result:

Server Running:

License

!!! IMPORTANT !!!

Example service frontend implementation in Angular

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages