Skip to content

A server that can accept requests for tts, send them to elevenlabs to get the audio, and send that audio to a connected client. Also has a connection for Pally.gg

License

Notifications You must be signed in to change notification settings

Johnnycyan/AI-Twitch-TTS

Repository files navigation

project-logo

AI-TWITCH-TTS

Empower your streams with dynamic voice interactions.

license last-commit repo-top-language repo-language-count


Table of Contents

Overview

AI-Twitch-TTS is a real-time Twitch Text-to-Speech application built for interactive streaming experiences. The project orchestrates WebSocket connections for audio streaming, processes chat requests, and interfaces with external APIs for voice synthesis. It offers customizable voice options, real-time chat handling, and automated websocket reconnections, enhancing viewer engagement on Twitch streams. The projects modular design ensures a seamless integration of dependencies, automated testing, and CI/CD workflows for efficient development and deployment processes.


Example Usage from Samifying

AT-cm_GFGdmTEpggCtiNbY0qTQ0w.mp4

Features

Feature Description
⚙️ Architecture Server-side application using WebSockets for real-time audio streaming, with client-side support for Twitch Text-to-Speech functionality. Maintains web server to handle requests and WebSocket connections effectively.
🔩 Code Quality Well-structured codebase with clear separation of concerns, detailed inline comments, consistent naming conventions, and adherence to best practices. Follows the principles of clean code and maintainable architecture.
📄 Documentation Adequate documentation with detailed explanations for modules like logging, environment setup, WebSocket handling, and HTTP endpoints.
🔌 Integrations Relies on external libraries like godotenv, go-randomdata, WebSocket for Go, and others to enhance functionality like environment variable loading, random data generation, WebSocket communication, and real-time audio streaming.
🧩 Modularity Codebase exhibits modularity through separate modules for logging, WebSocket handling, text-to-speech requests, alerts retrieval, and Pally WebSocket connections. Modules are designed for reusability and maintainability.

Getting Started

System Requirements:

  • Internet

  • ffmpeg

Installation

From releases

  1. Download latest release:

    1. Latest Release
  2. Create ./alerts/<channel> folder with alert sound(s) in it for Pally (optional)

  3. Create ./effects folder with effect sound(s) in it for effect tags

  4. Create a .env file in the same directory

  5. Fill out required Environmental Variables explained below and in the .env.example


From docker

  1. Create ./effects folder with effect sound(s) in it for effect tags

  2. Create ./alerts/<channel> folder with alert sound(s) in it for Pally (optional)

  3. Either create a .env file with the required Environmental Variables explained below and in the .env.example or just change them in the compose file.

docker-compose.yml

version: "3.8"
services:
  ai-twitch-tts:
    image: johnnycyan/ai-twitch-tts:main
    container_name: tts
    ports:
      - 6969:8080
    environment:
      - ELEVENLABS_KEY=${ELEVENLABS_KEY}
      - ELEVENLABS_PRICE=${ELEVENLABS_PRICE}
      - SERVER_URL=${SERVER_URL}
      - SENTRY_URL=${SENTRY_URL}
      - TTS_KEY=${TTS_KEY}
      - PALLY_KEYS=${PALLY_KEYS}
      - PALLY_VOICES=${PALLY_VOICES}
      - VOICES=${VOICES}
      - VOICE_MODELS=${VOICE_MODELS}
      - VOICE_STYLES=${VOICE_STYLES}
      - VOICE_MODIFIERS=${VOICE_MODIFIERS}
      - MONGO_HOST=mongodb
      - MONGO_PORT=27017
      - MONGO_USER=${MONGO_USER}
      - MONGO_PASS=${MONGO_PASS}
      - MONGO_DB=${MONGO_DB}
      - FFMPEG_ENABLED=true
    volumes:
      - ./effects:/app/effects
      - ./alerts:/app/alerts
    depends_on:
      - mongodb
  mongodb:
    image: mongo
    container_name: tts-mongo
    restart: always
    environment:
      - MONGO_INITDB_ROOT_USERNAME=${MONGO_USER}
      - MONGO_INITDB_ROOT_PASSWORD=${MONGO_PASS}
    volumes:
      - mongodb_data:/data/db
volumes:
  mongodb_data:
Variable Description
ELEVENLABS_KEY Elevenlabs API key
SERVER_URL URL of where the server will be hosted (no protocol) Ex: example.com
TTS_KEY Secret key used to authenticate TTS generation
VOICES Json string list of name/id pairs for Elevenlabs voices
VOICE_MODELS Json string list of name/model pairs for Elevenlabs voices (optional)
VOICE_STYLES Json string list of name/style pairs for Elevenlabs voices (optional)
VOICE_MODIFIERS Json string list of name/modifier pairs for Elevenlabs voices (optional)
PALLY_KEYS Json string list of name/key pairs for Pally (optional)
PALLY_VOICES Json string list of channel/voice pairs for Pally (optional)
SENTRY_URL URL for Sentry logging of the client (optional)
MONGO_HOST URL for MongoDB Host (optional)
MONGO_PORT Port for MongoDB (optional)
MONGO_USER Username for MongoDB (optional)
MONGO_PASS Password for MongoDB (optional)
MONGO_DB Database name for MongoDB (optional)
ELEVENLABS_PRICE Monthly Price of Elevenlabs Subscription (optional)
FFMPEG_ENABLED Bool for if you have ffmpeg installed. (FFMPEG IS REQUIRED)

Usage

From releases

⚠️ Might not work without an SSL connection. Has not been tested.

  1. Run AI-Twitch-TTS using the command below:
    1. Logging mode is optional. Options: info, debug, fountain
$ ./AI-Twitch-TTS <port> <logging-mode>

or

$ AI-Twitch-TTS.exe <port> <logging-mode>
  1. Add this to your OBS as a browser source
http(s)://$SERVER_URL/?channel=<username>
  1. Generate TTS by accessing this URL either through a browser or a Twitch chat bot (voice is optional):
    1. See Advanced Usage to see how to use multiple voices and effects in one message.
http(s)://$SERVER_URL/tts?channel=<username>&key=$TTS_KEY&voice=<voicename>&text=<text to generate>

From docker

⚠️ Might not work without an SSL connection. Has not been tested.

  1. Add this to your OBS as a browser source
http(s)://$SERVER_URL/?channel=<username>
  1. Generate TTS by accessing this URL either through a browser or a Twitch chat bot (voice is optional):
    1. See Advanced Usage to see how to use multiple voices and effects in one message.
http(s)://$SERVER_URL/tts?channel=<username>&key=$TTS_KEY&voice=<voicename>&text=<text to generate>

Advanced Usage

[v-voicename] is a voice tag meaning any text written after it will be spoken with that voice.

[e-effectname] is an effect tag which will play an effect.

(reverb) adds reverb to a TTS message.

If you use a tag in a message you MUST use voice tags for all the text you want to say.

[v-voice] this is text and then an effect [e-effect]

this text has no voice tag [e-effect]

[v-voice] this is text and then an effect [e-effect] [v-voice] and then more text

[v-voice] this is text and then an effect [e-effect] this text has no voice tag

[e-effect] [v-voice] this is text

[e-effect] this text has no voice tag

Example of reverb:

(reverb) this is reverbed text

[v-voice] (reverb) this is reverbed text with a specific voice.


License

This project is protected under the MIT License. For more details, refer to the LICENSE file.


Return

About

A server that can accept requests for tts, send them to elevenlabs to get the audio, and send that audio to a connected client. Also has a connection for Pally.gg

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages