Text-to-Speech Alignment Project

Project Overview

This project explores and implements various text-to-speech (TTS) alignment techniques, aiming to improve the quality and efficiency of TTS systems. Our work spans multiple approaches, each addressing different aspects of the alignment challenge.

Project Structure

This repository is organized into three main branches, each representing a distinct approach to TTS alignment:

MoBoAligner
- Status: Completed, for reference only
- Description: Unofficial implementation of the "MoBoAligner: a Neural Alignment Model for Non-autoregressive TTS with Monotonic Boundary Search" paper
- Purpose: Learning and baseline comparison
- Limitation: Not suitable for large-scale applications due to maximum duration constraints
RoMoAligner
- Status: Development halted, for reference only
- Description: Experimental improvement attempt combining Rough Alignment with MoBoAligner
- Purpose: Explore self-supervised learning techniques in TTS alignment
- Limitation: Performance improvements were limited and did not meet expectations
OTA 👈 Current Focus
- Status: In active planning and early development
- Description: Adaptation of the "One TTS Alignment To Rule Them All" (OTA) method for implicit pause modeling
- Goal: Develop a solution for handling implicit pauses without relying on explicit silence tokens
- Progress: Conceptual development and planning phase

Current Focus

Our primary focus is on the OTA branch, where we're exploring ways to adapt the OTA method for improved alignment, especially in handling implicit pauses in speech.

How to Use This Repository

Check out each branch for specific implementation details and progress.
Refer to individual branch READMEs for setup and usage instructions.
For the latest developments, focus on the OTA branch.

Contributing

We welcome contributions to any of our branches. If you're interested in contributing:

Check the issues in the relevant branch for tasks you can help with.
Fork the repository and create a pull request with your improvements.
For major changes, please open an issue first to discuss what you would like to change.

Roadmap

Implement MoBoAligner (unofficial implementation)
Develop and test RoMoAligner
Adapt and implement OTA for implicit pause modeling
Conduct comparative studies across all methods
Refine and optimize the most promising approach

Acknowledgments

Original MoBoAligner paper
OTA paper

We appreciate the support and interest from the TTS and speech processing community in advancing this research.

Name		Name	Last commit message	Last commit date
Latest commit History 363 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
monotonic_align		monotonic_align
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-to-Speech Alignment Project

Project Overview

Project Structure

Current Focus

How to Use This Repository

Contributing

Roadmap

Acknowledgments

About

Releases

Packages

Languages

xiaozhah/Aligner

Folders and files

Latest commit

History

Repository files navigation

Text-to-Speech Alignment Project

Project Overview

Project Structure

Current Focus

How to Use This Repository

Contributing

Roadmap

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages