text to speech voice clone

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

Notifications You must be signed in to change notification settings

A multi-voice TTS system trained with an emphasis on quality

neonbjb/tortoise-tts

Folders and files, repository files navigation.

Tortoise is a text-to-speech program built with the following priorities:

Strong multi-voice capabilities.
Highly realistic prosody and intonation.

This repo contains all the code needed to run Tortoise TTS in inference mode.

Manuscript: https://arxiv.org/abs/2305.07243

Hugging Face space

A live demo is hosted on Hugging Face Spaces. If you'd like to avoid a queue, please duplicate the Space and add a GPU. Please note that CPU-only spaces do not work for this demo.

https://huggingface.co/spaces/Manmay/tortoise-tts

Install via pip

If you would like to install the latest development version, you can also install it directly from the git repository:

What's in a name?

I'm naming my speech-related repos after Mojave desert flora and fauna. Tortoise is a bit tongue in cheek: this model is insanely slow. It leverages both an autoregressive decoder and a diffusion decoder; both known for their low sampling rates. On a K80, expect to generate a medium sized sentence every 2 minutes.

well..... not so slow anymore now we can get a 0.25-0.3 RTF on 4GB vram and with streaming we can get < 500 ms latency !!!

See this page for a large list of example outputs.

A cool application of Tortoise + GPT-3 (not affiliated with this repository): https://twitter.com/lexman_ai . Unfortunately, this project seems no longer to be active.

Usage guide

Local installation.

If you want to use this on your own computer, you must have an NVIDIA GPU.

On Windows, I highly recommend using the Conda installation method. I have been told that if you do not do this, you will spend a lot of time chasing dependency problems.

First, install miniconda: https://docs.conda.io/en/latest/miniconda.html

Then run the following commands, using anaconda prompt as the terminal (or any other terminal configured to work with conda)

create conda environment with minimal dependencies specified
activate the environment
install pytorch with the command provided here: https://pytorch.org/get-started/locally/
clone tortoise-tts
change the current directory to tortoise-tts
run tortoise python setup install script

Optionally, pytorch can be installed in the base environment, so that other conda environments can use it too. To do this, simply send the conda install pytorch... line before activating the tortoise environment.

Note: When you want to use tortoise-tts, you will always have to ensure the tortoise conda environment is activated.

If you are on windows, you may also need to install pysoundfile: conda install -c conda-forge pysoundfile

An easy way to hit the ground running and a good jumping off point depending on your use case.

This gives you an interactive terminal in an environment that's ready to do some tts. Now you can explore the different interfaces that tortoise exposes for tts.

For example:

Apple Silicon

On macOS 13+ with M1/M2 chips you need to install the nighly version of PyTorch, as stated in the official page you can do:

Be sure to do that after you activate the environment. If you don't use conda the commands would look like this:

Be aware that DeepSpeed is disabled on Apple Silicon since it does not work. The flag --use_deepspeed is ignored. You may need to prepend PYTORCH_ENABLE_MPS_FALLBACK=1 to the commands below to make them work since MPS does not support all the operations in Pytorch.

This script allows you to speak a single phrase with one or more voices.

do socket streaming

will listen at port 5000

faster inference read.py

This script provides tools for reading large amounts of text.

This will break up the textfile into sentences, and then convert them to speech one at a time. It will output a series of spoken clips as they are generated. Once all the clips are generated, it will combine them into a single file and output that as well.

Sometimes Tortoise screws up an output. You can re-generate any bad clips by re-running read.py with the --regenerate argument.

Tortoise can be used programmatically, like so:

To use deepspeed:

To use kv cache:

To run model in float16:

for Faster runs use all three:

Acknowledgements

This project has garnered more praise than I expected. I am standing on the shoulders of giants, though, and I want to credit a few of the amazing folks in the community that have helped make this happen:

Hugging Face, who wrote the GPT model and the generate API used by Tortoise, and who hosts the model weights.
Ramesh et al who authored the DALLE paper, which is the inspiration behind Tortoise.
Nichol and Dhariwal who authored the (revision of) the code that drives the diffusion model.
Jang et al who developed and open-sourced univnet, the vocoder this repo uses.
Kim and Jung who implemented univnet pytorch model.
lucidrains who writes awesome open source pytorch models, many of which are used here.
Patrick von Platen whose guides on setting up wav2vec were invaluable to building my dataset.

Tortoise was built entirely by the author (James Betker) using their own hardware. Their employer was not involved in any facet of Tortoise's development.

Tortoise TTS is licensed under the Apache 2.0 license.

If you use this repo or the ideas therein for your research, please cite it! A bibtex entree can be found in the right pane on GitHub.

Contributors 44

Jupyter Notebook 59.6%
Python 33.0%
Dockerfile 0.1%

AI Voice Generator: Most Realistic AI Text to Speech

Hyper realistic ai voice generator that .css-1625k06{background:var(--chakra-colors-transparent);white-space:nowrap;background-image:linear-gradient(to right, var(--chakra-colors-blue-600), var(--chakra-colors-skyblue-600));color:transparent;-webkit-background-clip:text;background-clip:text;} captivates your audience.

Join the over 2,000,000 users who love LOVO AI. Our award-winning voice generator and text to speech software is packed with 500+ voices in 100 languages. Create engaging videos with voice for marketing, training, social media, and more!

Start now for free

Chloe Woods

English Female

Sophia Butler

Santa Clause

English Male

Katelyn Harrison

Bryan Lee Jr.

Thomas Coleman

Create and edit videos effortlessly with Genny’s all-in-one voice and video editing platform.

Trusted by professionals & creatives globally

Introducing Genny The best way to add voiceover to video

Experience unparalleled voiceover production with our voice generator and online video editor, featuring professional grade human-like voices and powerful editing tools.

The most natural voices in the world

Surprise your audience with the perfect AI voice in 100+ languages for your content.

Genny is the .css-1ezzeyz{background:linear-gradient(90deg, #2871DE 0%, #27AADC 100%);white-space:nowrap;color:var(--chakra-colors-transparent);-webkit-background-clip:text;background-clip:text;-webkit-background-clip:text;-webkit-text-fill-color:transparent;} ultimate generative AI tool

For all your voiceover and video needs - scripts, ultra-realistic voices, images, editing and more! Genny has all the features you need to create engaging videos with integrated AI features.

main:generative_ai.text_to_speech.image_alt

Save $$ and time on voiceovers

Using Genny removes the need to spend time and money to record or use expensive equipment to achieve professional voiceovers with our advanced voice generator.

Text To Speech

main:generative_ai.online_video_editor.image_alt

Sync audio and video seamlessly

Achieve perfect synchronization without sacrificing speed or accuracy. With Genny’s online video editor, you can edit content effortlessly to create engaging high-quality videos.

Online Video Editor

main:generative_ai.auto_subtitle_generator.image_alt

Boost engagement with subtitles

Globalize your content and boost engagement in 20+ languages with our auto subtitle generator. Customize, animate, and transform your video with just a few clicks.

Auto Subtitle Generator

Write scripts 10x faster

Writer's block is everyone's nightmare. Genny's AI writer can help you get started on your script quickly by generating professionally written content in a lightening fast.

main:generative_ai.voice_cloning.image_alt

Create unique voices in minutes

Genny’s voice cloning lets you instantly create custom voices with just one minute of audio. Give your brand a unique voice that sets your content apart from the crowd.

Voice Cloning

main:generative_ai.ai_art_generator.image_alt

Generate royalty-free images

No more spending hours searching the web for the perfect stock image. Generate HD royalty-free images and add them to your videos in seconds with Genny’s AI art generator.

AI Art Generator

.css-bd7824{background:linear-gradient(90deg, #2E94FF 0%, #408CFF 32.81%, #3DB5FF 71.35%, #2ED1EA 100%);white-space:nowrap;color:var(--chakra-colors-transparent);-webkit-background-clip:text;background-clip:text;-webkit-background-clip:text;-webkit-text-fill-color:transparent;} Collaborate with your team

Drive efficiency and collaborate creatively with Genny teams and keep your projects safely secured with our cloud storage so you and your team can access them at any time!

Learn About Genny Teams

.css-1pdu0yo{background:var(--chakra-colors-transparent);white-space:nowrap;background-image:linear-gradient(90deg, #2E94FF 0%, #408CFF 32.81%, #3DB5FF 71.35%, #2ED1EA 100%);color:transparent;-webkit-background-clip:text;background-clip:text;webkit-background-clip:text;webkit-text-fill-color:transparent;} Versatile API made for developers

With our easy to use API, you now have the power to use the most advanced AI voices in the world in your own app or service! Get started in as little as 5 lines of code.

LOVO Open API

AI Voice Generator for any use case

Unlock your creative potential

Try Genny for free

Create a free voiceover

Start .css-l9o03z{background:var(--chakra-colors-transparent);white-space:nowrap;color:var(--chakra-colors-blue-600);} saving 90% of your time and budget today!

See pricing

No Credit Card required

14-day trial of pro

You might find an answer faster here

If you cannot find an answer, email [email protected] for help.

What happens if I hit my credit limit?

What does "Voice Generation Hours" Mean?

How is LOVO different from other TTS?

Can I use LOVO for Youtube videos?

Do I own the rights to content created?

What is an AI voice?

Which languages do you support?

Which emotions can LOVO express?

Do you have an API?

Do you have an enterprise plan?

Can I cancel any time?

What is an AI voice generator?

Check out latest articles on our blog

an illustration of a person wearing a blue hoody creating a voice clone at their desk.

6 Benefits of Real-Time Voice Cloning

man in yellow shirt pointing at cartoon of instructional design

Effective Text To Speech Tools For Instructional Design

Most Popular AI Voiceover Apps For TikTok

two people looking at phone screen with an AI translator showing and two other people inputting data

Best AI tools for businesses and marketers

Voice generators - perfect for content creation

LOVO is the most advanced AI voice and text-to-speech generator available on the market. With LOVO, you can save thousands of dollars and hours of time in generating realistic and high-quality voiceovers. Our cutting-edge technology produces super realistic voices that are almost impossible to distinguish from real human voices. Our easy-to-use professional UI makes generating voiceovers effortless, even for those with no prior experience in audio production. LOVO is perfect for businesses, content creators, educators, and anyone looking to create engaging content that stands out from the crowd. LOVO is designed to streamline your content creation process so you can focus on what matters most - delivering your message to your audience. With LOVO, you have access to an extensive library of voices, languages, and accents, ensuring that you find the perfect voice to match your brand or project.

Here are just some of the reasons why LOVO’s is the perfect tool for content creation

Scale content without scaling costs or resources.

With AI now more accessible than ever, tools like text-to-speech generators are the perfect assistant for content creation. These tools save you time and money by removing the need for expensive equipment or time-consuming tasks such as recording and editing while providing high-quality audio with realistic human voices.

Produce professional-grade content

At LOVO, our team has focused on creating Genny, the most advanced voice generator that produces high-quality voiceovers to elevate your video and audio projects. Complete the final stages of your project with Genny by generating your voiceover and seamlessly syncing it with your video. Then, before exporting your video, add all the finishing touches for a truly professional look, such as subtitles, images, logos, and video clips.

Create with ease and speed

Genny is designed to allow anyone to get started immediately - no downloading software or complicated onboarding or learning is required. Simply sign in with your web browser and you are good to go! Our intuitive and easy-to-use UI makes it a breeze for anyone who needs to create content up and running in minutes. This means you can focus on what matters most - engaging and delivering your message to your audience.

AI Voice generator use cases

Corporate training & education, marketing & sales, product demos & explainers, generate voices in over 100+ languages.

Genny supports Text to Speech in:

United States 🇺🇸
United Kingdom 🇬🇧
Ethiopia 🇪🇹
Philippines 🇵🇭
United Arab Emirates 🇦🇪
Pakistan 🇵🇰
Portugal 🇵🇹
Bangladesh 🇧🇩
Russian Federation 🇷🇺
Indonesia 🇮🇩
Korea, Republic of 🇰🇷
Afghanistan 🇦🇫
Thailand 🇹🇭

Learn More About AI Voice Generators

Why do you need an ai voice generator for your videos, are ai voices ethical, how can ai voices help your business, what is the best ai voice generator, how do you generate an ai voiceover, are content generated with ai voices copyrighted, can a voice generator produce different accents or languages, what industries benefit most from ai voice technology, is the speech from a voice generator realistic, how can i customize a voice generator to fit my needs, what future developments are expected in ai voice technology, where can i find a voice generator for free.

COMMENTS

AI Voice Cloning: Clone Your Voice Instantly
Speechify AI Voice Cloning can clone anyone’s voice in seconds. All it takes is for the AI to listen to your voice for around 30 seconds. Once it samples a person’s voice, it can then read …
Real-Time Voice Cloning
Clone your voice in seconds with our real-time voice cloning software. Simple and ready to use from your browser. Record your voice, type your text, and generate the audio.
Free AI Voice Cloning
Clone your voice for free in seconds for use in text to speech for content creation, voiceovers, and entertainment. Get Started → Trusted by iconic companies and artists.
Vocloner: Free Instant AI Voice Cloning
Experience fast and efficient AI voice cloning that takes just seconds. Clone any voice instantly without delays, making the process smooth and hassle-free. Use Vocloner for free, with a daily limit of 1000 characters.
AI Voice Cloning: Free Voice Changer Online
Clone your voice online for free with our AI Voice Cloning tool. Upload your voice samples and generate a realistic clone in seconds!
AI Voice Generator: Realistic Text to Speech & Voice …
Award-winning AI Voice Generator and text to speech software with 500+ voices in 100 languages. Realistic AI Voices with Online Video Editor. Clone your own voice.
Text to Speech
Turn any text or script into natural-sounding speech with Descript's text-to-speech voice generator. Choose from dozens of lifelike AI voices or create your own voice clones in minutes. It’s perfect for podcast intros, voiceovers, …
ElevenLabs: Free Text to Speech & AI Voice Generator
Voices fit for all of your ideas. Generate high quality speech in any voice, style, and language. Our AI voice generator renders human intonation and inflections with exceptional fidelity, adjusting the delivery based on context. Create a …

Navigation Menu