The central control unit and orchestrator of the ChatTTS application. It manages the entire text-to-speech workflow, coordinating model loading, asset management, text pre-processing, model inference, and audio output generation. It serves as the primary interface for user interaction, integrating various sub-components to achieve the final speech synthesis.
Components
ChatTTS.core.Chat
The central orchestrator of the ChatTTS application, managing the overall text-to-speech workflow.
ChatTTS.model.gpt
The generative transformer model responsible for converting processed text and speaker information into latent speech representations. It's the core AI model for speech synthesis.
ChatTTS.model.dvae
The Discrete Variational Autoencoder (DVAE) model. It encodes raw audio into discrete latent codes and decodes these codes back into audio, crucial for handling audio representations within the synthesis pipeline.
ChatTTS.model.speaker
Manages speaker embeddings, allowing the system to generate speech in various voices. It can sample random speaker embeddings or encode speaker characteristics from audio.
ChatTTS.model.tokenizer
Converts raw text input into a sequence of numerical tokens that the GPT model can process. It's a critical component for text pre-processing.
ChatTTS.model.embed
Responsible for generating various embeddings, such as text embeddings, which serve as crucial inputs for the GPT model.
ChatTTS.config.Config
Defines and manages application-wide configuration settings and parameters, including model paths and synthesis parameters.
ChatTTS.norm.Normalizer
Handles text normalization, converting raw text into a standardized format suitable for tokenization and speech synthesis, including handling homophones.
ChatTTS.utils
A collection of general-purpose utility functions supporting various aspects of the application, such as downloading models, managing GPU resources, and file I/O.
Referenced Source Code
ChatTTS.model.velocity
A package dedicated to optimizing LLM operations, potentially including components for efficient model loading, running, sampling, and scheduling, especially when `use_vllm` is enabled.