tts

CoquiTTS: An Open-Source Text-To-Speech Library

a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Hazem Abbas

Nov 20, 2022 — 1 min read

Table of Content

CoquiTTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality.

It comes with pretrained models, tools for measuring dataset quality and already used in 20+ languages for products and research projects.

CoquiTTS is written with Python, and it can be a handy tool for video game developers, post-production, dubbing, and creating educational videos.

CoquiTTS developers are working now on, Coqui studio which will offer a straightforward simple user-friendly interface to clone and create text-to-speech audios in MP3 format.

Features

High-performance Deep Learning models for Text2Speech tasks.
Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech).
Speaker Encoder to compute speaker embeddings efficiently.
Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN, WaveGrad, WaveRNN)
Fast and efficient model training.
Detailed training logs on the terminal and Tensorboard.
Support for Multi-speaker TTS.
Efficient, flexible, lightweight but feature complete Trainer API.
Released and ready-to-use models.
Tools to curate Text2Speech datasets underdataset_analysis.
Utilities to use and test your models.
Modular (but not too much) code base enabling easy implementation of new ideas.

Implemented Models

Spectrogram models

Tacotron: paper
Tacotron2: paper
Glow-TTS: paper
Speedy-Speech: paper
Align-TTS: paper
FastPitch: paper
FastSpeech: paper
SC-GlowTTS: paper
Capacitron: paper

End-to-End Models

VITS: paper
YourTTS: paper

Attention Methods

Guided Attention: paper
Forward Backward Decoding: paper
Graves Attention: paper
Double Decoder Consistency: blog
Dynamic Convolutional Attention: paper
Alignment Network: paper

Speaker Encoder

GE2E: paper
Angular Loss: paper

Vocoders

MelGAN: paper
MultiBandMelGAN: paper
ParallelWaveGAN: paper
GAN-TTS discriminators: paper
WaveRNN: origin
WaveGrad: paper
HiFiGAN: paper
UnivNet: paper

License

The project is released under the MPL-2.0 License.

Resources

tts Open-source Deep Learning Artificial Intelligence youtube Python programming Machine Learning

CoquiTTS: An Open-Source Text-To-Speech Library

Hazem Abbas

Table of Content

Features

Implemented Models

Spectrogram models

End-to-End Models

Attention Methods

Speaker Encoder

Vocoders

License

Resources

Are You Truly Ready to Put Your Mobile or Web App to the Test?

Articles

Systems

Development

Apps

Science - Healthcare

Open-source Apps

Medical Apps

Lists

Dev. Resources

Read more

VRWorkout: a Workout VR Assistant Game Experience that is Proudly Built with Godot!

Revolutionize Your Website with These 14 Free 360° Panorama & 3D Product Viewers

Why Hospitals and Clinics Should Run their Local AI Setup, ChatGPT Alternatives?

10 Reasons Why Web and Marketing Agencies Should Hire A ComfyUI Expert?

Table of Content

Features

Implemented Models

Spectrogram models

End-to-End Models

Attention Methods

Speaker Encoder

Vocoders

License

Resources

Read More Articles in tts

EchoCharm is a Free Versatile text-to-speech TTS Application

"audapolis": The Revolutionary Editor Empowering Spoken-Word Media Editing

Koodo Reader: open-source ebook reader (Free app)

16 Open-source and Free TTS (Text-To-Speech) Programs for Windows

16 Open-source Web-based Text-to-Speech Apps and TTS JavaScript Libraries

Best 10 Free Text To Speech (TTS) Services

Articles

Systems

Development

Apps

Science - Healthcare

Open-source Apps

Medical Apps

Lists

Dev. Resources

Read more

VRWorkout: a Workout VR Assistant Game Experience that is Proudly Built with Godot!

Revolutionize Your Website with These 14 Free 360° Panorama & 3D Product Viewers

Why Hospitals and Clinics Should Run their Local AI Setup, ChatGPT Alternatives?

10 Reasons Why Web and Marketing Agencies Should Hire A ComfyUI Expert?