text to speech software linux

An In-Depth Guide to Open Source Text-to-Speech Engines for Linux

This comprehensive guide explores the top open source text-to-speech (TTS) engines available for Linux. Converting text into lifelike speech is useful for accessibility, delivering information via voice interfaces, learning pronunciation, and more. We’ll cover the capabilities of leading Linux TTS tools, their installation, and plenty of usage examples.

Introduction to Text-to-Speech

Text-to-speech (TTS) is the artificial production of human speech from written text. TTS engines ingest text, process it through natural language pipelines, and output synthesized audio speech. The quality of TTS systems is determined by how natural and humanlike the generated voices sound.

TTS has many practical use cases:

Improving accessibility for vision-impaired users
Reading text aloud when eyes-free is needed like while driving
Delivering information over voice interfaces or phone systems
Assisting with learning languages and proper pronunciation
Transcribing documents to audio book format
Adding speech output to applications by leveraging TTS APIs

High-quality voices require sophisticated deep learning algorithms. Most modern TTS engines utilize machine learning trained on huge datasets of recorded human speech.

In this guide, we’ll focus on open source command line utilities for performing TTS on Linux. Let‘s look at some of the best options.

eSpeak – Lightweight Open Source TTS

eSpeak is an open source text-to-speech engine released in 1995 by Jonathan Duddington. It supports over 70 languages and accents and is highly configurable for adjusting speech parameters.

eSpeak is lightweight and designed to be portable across many systems. It comes bundled with many Linux distributions due to being open source (GPLv3 license). The voices tend to sound robotic but the speech is clear and works well.

To install on Debian/Ubuntu:

Arch Linux:

Basic usage is simple. To output text to speech:

To read a file aloud:

Let‘s go through some ways to customize and control eSpeak‘s voices.

To list all available voices:

This prints out a table summarizing each voice‘s language, dialect, and identifier.

For example, to set the voice to US English:

Adjust the speech rate with the -s flag:

The pitch can be adjusted with -p :

To save audio output to a file, use -w :

This saves a Wave audio file that can be played in media players. eSpeak supports outputting .wav , .mp3 , and .ogg .

In addition to these common uses, eSpeak provides phoneme support for precise pronunciation:

And an API for integrating TTS directly into applications with C, C++, Python and other languages.

Overall, eSpeak provides a capable open source text-to-speech system on Linux. The voices aren‘t as human sounding as some commercial options, but it‘s free, customizable, lightweight, and easy to use.

Festival – Framework for Building TTS Voices

Festival is another leading open source text-to-speech system originally developed at the University of Edinburgh and released in 1997.

Festival utilizes a modular framework for building synthetic voices. It comes packaged with several English voices and support for Spanish, Welsh, and other languages. Festival is well-suited for research and education purposes.

Install Festival using your Linux distribution‘s package manager:

Some example usage:

Festival includes an interactive shell for experimenting with speech synthesis. This allows modifying parameters on the fly:

Under the hood, Festival provides a framework for building TTS voices called FestVox. This allows developers to create new synthetic voices and languages.

For basic usage, Festival has clear text-to-speech capabilities but sounds robotic. The option to build custom voices is useful for research. However, modern TTS technology has surpassed Festival‘s voice quality.

Pico TTS – Optimized Small Footprint Engine

Pico TTS is an open source project to create a small footprint text-to-speech engine optimized for embedded Linux.

The engine itself is written in C++ and comes packaged in many Linux distributions. It‘s licensed under the LGPL and was originally developed for the Raspberry Pi.

Install on Debian/Ubuntu:

Pico TTS supports English, Spanish, French, German, and Italian voices. Since it‘s designed for small systems, the quality is surprisingly good for the small resource requirements.

To synthesize text and save as a WAV file:

Here -l specifies the language code like en-US for US English.

Pico TTS doesn‘t allow piping text directly to stdout. But the WAV output works well for offline usage.

In summary, Pico TTS provides a capable text-to-speech engine optimized for embedded Linux applications like the Raspberry Pi. For desktop use, other options might be higher quality. But as a small footprint engine, Pico TTS works quite well.

gTTS – Leveraging Google‘s TTS API

gTTS provides a command line interface and Python library for Google Translate‘s Text-to-Speech API. It‘s an easy way to access Google‘s state-of-the-art deep learning models.

gTTS can be installed with pip:

Or on Linux distributions:

Basic usage:

This saves the synthesized audio to an MP3 file.

To read a text file aloud:

gTTS supports dozens of languages and natural sounding voices provided by Google:

Prints out all the available languages and voice codes.

For example, set the language to US English:

gTTS is ideal way to leverage Google‘s industry leading text-to-speech engine from the Linux command line. The audio quality is human sounding and highly intelligible.

Comparing Voice Quality Between TTS Engines

There are noticeable differences in audio quality between the open source text-to-speech solutions we covered. Let‘s do a quick comparison.

eSpeak and Festival sound robotic since they rely on formant synthesis instead of deep learning. eSpeak voices tend to be clearer than Festival.

Pico TTS delivers good quality given its tiny resource footprint. The voices aren‘t perfectly human sounding but quite intelligible.

gTTS provides the most natural sounding audio by far since it uses Google‘s state-of-the-art WaveNet deep neural network voices. The quality difference is very noticeable.

For the best sounding voices, gTTS is recommended. But the open source engines like eSpeak work well enough for some use cases, especially considering they‘re free.

Additional Tips and Tricks

Here are some additional tips for getting the most out of Linux text-to-speech engines:

Adjust speech rate, pitch, and volume to customize the voice
Use phoneme support for precise pronunciation of texts
Output audio to a file instead of directly to speakers
Pipe audio to media players like mplayer for enhanced controls
Chain multiple engines together for more options
Install alternative voices and languages
Use TTS engines from other languages like Chinese, Russian, etc.
Integrate speech synthesis directly into your own apps with provided APIs

And some troubleshooting advice:

If no audio, check speakers are not muted and volume is up
Install any required audio codec packs for your system
Try a different TTS engine if issues with a specific one
Look for error output for diagnose problems
Consult documentation and GitHub issues page

With a bit of tweaking, the open source text-to-speech engines provide plenty of options for your Linux projects.

Leveraging TTS Engines in Shell Scripts

One useful application of text-to-speech on Linux is scripting batch text file conversions. Here is an example bash script to synthesize all text files in a directory using eSpeak:

This iterates through .txt files, converts each to audio with eSpeak using the -w flag, and saves the output as a .wav file.

Scripts like this provide an easy way to automate batch text-to-speech conversions and workflows.

Appendix: Quick Reference of Engines

Engine	Languages	Voices	License	Notes
eSpeak	70+	Formant synthesis	GPLv3	Robotic voices, versatile, lightweight
Festival	Multiple	Formant synthesis	Custom	Framework for building voices
PicoTTS	5	Formant synthesis	LGPL	Small footprint, good quality
gTTS	Many	Google WaveNet NN	AGPL	Most natural sounding voices

This guide covered several excellent open source text-to-speech utilities for Linux. eSpeak and Festival are classic options that work reasonably well. Pico TTS is great for embedded devices. gTTS provides the best sounding human voices by leveraging Google‘s technology.

The installation process, basic usage, and customization options were explained for each text-to-speech engine. TTS enables many exciting applications on the Linux command line and within scripts or apps.

To learn more about the capabilities of each text-to-speech engine, be sure to consult the official project documentation. Their GitHub repositories also contain useful code samples to get started.

With the power of text-to-speech, Linux can talk back to you! Converting text to natural sounding speech opens many possibilities.

You maybe like,

Related posts, 10 best linux games for free in 2022.

Gaming on Linux has become incredibly popular in recent years, gaining the trust of hardcore gamers thanks to digital video game distribution services like Steam…

11 Best IDEs for Web Development

Integrated development environments (IDEs) are invaluable for making web development easier, faster, and more efficient. Rather than juggling multiple tools, an IDE brings together essential…

30 Best GNOME Extensions for Ubuntu in 2023

GNOME is one of the most popular desktop environments available for Linux today. With its sleek interface and intuitive workflow, GNOME offers a polished user…

4 Best Open Source Video Editors for Linux, Mac and Windows: A Complete 2023 Guide

Video content creation is more accessible today than ever before thanks to affordable equipment and software. But proprietary video editors like Final Cut Pro or…

5 Best Free and Open Source NAS Software for Linux

Network-attached storage (NAS) devices have become very popular among home users and businesses for centralized file storage and backup. NAS units typically run a Linux-based…

5 Best Linux Distros to Learn Linux

Hi there! If you‘re venturing into the world of Linux for the first time, one key decision you’ll face is: which Linux distribution (or "distro")…

The Linux Portal Site

13 Best Free Linux Speech Recognition Tools

Speech is an increasingly popular method of interacting with electronic devices such as computers, phones, tablets, and televisions. Speech is probabilistic, and speech engines are never 100% accurate. But technological advances have meant speech recognition engines offer better accuracy in understanding speech. The better the accuracy, the more likely customers will engage with this method of control. And, according to a study by Stanford University, the University of Washington and Chinese search giant Baidu, smartphone speech is three times quicker than typing a search query into a screen interface.

Witness the rise of intelligent personal assistants, such as Siri for Apple, Cortana for Microsoft, and Mycroft for Linux. The assistants use voice queries and a natural language user interface to attempt to answer questions, make recommendations, and perform actions without the requirement of keyboard input. And the popularity of speech to control devices is testament to dedicated products that have dropped in large quantities such as Amazon Echo. Speech recognition is also used in smart watches, household appliances, and in-car assistants. In-car applications have lots of mileage (excuse the pun). Some of the in-car applications include navigation, asking for weather forecasts, finding out the traffic situation ahead, and controlling elements of the car, such as the sunroof, windows, and music player.

The key challenge for developing speech recognition software, whether it’s used in a computer or another device, is that human speech is extremely complex. The software has to cope with varied speech patterns, and individuals’ accents. And speech is a dynamic process without clearly distinguished parts. Fortunately, technical advancements have meant it’s easier to create speech recognition tools. Powerful tools like machine learning and artificial intelligence, coupled with improved speech algorithms, have altered the way these tools are developed. You don’t need phoneme dictionaries. Instead, speech engines can employ deep learning techniques to cope with the complexities of human speech.

There aren’t that many speech recognition toolkits available, and some of them are proprietary software. Fortunately, there are some very exciting open source speech recognition toolkits available. These toolkits are meant to be the foundation to build a speech recognition engine.

This article highlights the best open source speech recognition software for Linux. The rating chart summarizes our verdict.

Ratings chart for best free and open source speech recognition tools

Let’s explore the 13 free speech recognition tools at hand. For each title we have compiled its own portal page with a full description and an in-depth analysis of its features.


	Automatic speech recognition (system trained on 680,000 hours of data
	Fast, flexible machine learning library written entirely in C++.
	Deep-learning toolkit for training and deploying speech-to-text models
	C++ toolkit designed for speech recognition researchers.
	All-in-one conversational AI toolkit based on PyTorch
	End-to-End speech processing toolkit
	Implementation of DeepSpeech2 using Baidu Warp-CTC.
	TensorFlow implementation of Baidu's DeepSpeech architecture.
	Two-pass large vocabulary continuous speech recognition engine
	TensorFlow-based toolkit for sequence-to-sequence models
	Speech recognition system for mobile and server applications
	End-to-End Speech Recognition
	Flexible speech recognition software

Read our complete collection of . Our curated compilation covers all categories of software.

The software collection forms part of our for Linux enthusiasts. There are hundreds of in-depth reviews, open source alternatives to proprietary software from large corporations like Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk.

There are also fun things to try, hardware, free programming books and tutorials, and much more.

This site uses Akismet to reduce spam. Please read our FAQ before making a comment .

What is really wrong with the license terms of HTK?

This clause is particularly damning:

2.2 The Licensed Software either in whole or in part can not be distributed or sub-licensed to any third party in any form.

…and nothing else matters…

Sadly my machine doesn’t have sufficient RAM on my graphics card to experiment with DeepSpeech. Any recommendations for a good GPU that works well with DeepSpeech?

Thanks for the comprehensive info regarding the open source tools. From the perspective of a visually impaired person, what I would like to know is which of these would be most suitable (now or in near future) for dictating to get text that could go into documents, e-mail, etc. Is that Simon?

Yes, Simon is very good for what you’re looking for. Most of the other open source speech recognition tools are not really aimed at a desktop user e.g. they are for academic research etc.

Is there any speech to text tool like Dragon Nat in linux? I work as a translator and I have it on windows but I wonder if there is something like that out there.

Baidu is required by Chinese laws to act, as and when demanded, as an arm of the Chinese Communist Party. Not sure I would trust a tool created by them.

I think you are jumping on the Hauwei bandwagon with absolutely no justification.

A few of the open source programs here are using speech recognition models based on Baidu DeepSpeech2. But the model is an approach, not a means of capturing data or doing anything else nefarious.

What concerns are you raising? The source code of the programs here (DeepSpeech etc) are open source, so you can see exactly what they are doing.

I think the so-called “Voice of Reason of Reason” is actually foolish. Baidu and Hauwei are separate and different companies and therefore the Hauwei bandwagon is irrelevant. The CPC is a relevant threat.

Open Source is in itself is not a guarantee of safety. There are and have been plenty of nefarious open source programs. There are and have been plenty of open source bugs persisting for many years before remediation. Who is examining the code? The “Voice of Reason?”

Hard to take someone calling themselves “The Fool” seriously…

Conflating bugs with nasties. You certainly sound foolish, The Fool.

completely agree

This account is solely made for saying yes to other accounts called “john”

LinuxLinks doesn’t have accounts

Could Android speech recognition be ported to Linux desktop packages, since android is open source?

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

Notifications You must be signed in to change notification settings

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

Licenses found

Espeak-ng/espeak-ng, folders and files.

Name		Name
5,811 Commits
workflows		workflows




annotationsEspeak		annotationsEspeak




metadata/android		metadata/android

Repository files navigation

Espeak ng text-to-speech.

Supported languages

Documentation

Espeak compatibility, license information.

The eSpeak NG is a compact open source software text-to-speech synthesizer for Linux, Windows, Android and other operating systems. It supports more than 100 languages and accents . It is based on the eSpeak engine created by Jonathan Duddington.

eSpeak NG uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings. It also supports Klatt formant synthesis, and the ability to use MBROLA as backend speech synthesizer.

eSpeak NG is available as:

A command line program (Linux and Windows) to speak text from a file or from stdin.
A shared library version for use by other programs. (On Windows this is a DLL).
A SAPI5 version for Windows, so it can be used with screen-readers and other programs that support the Windows SAPI5 interface.
eSpeak NG has been ported to other platforms, including Solaris and Mac OSX.
Includes different Voices, whose characteristics can be altered.
Can produce speech output as a WAV file.
SSML (Speech Synthesis Markup Language) is supported (not complete), and also HTML.
Compact size. The program and its data, including many languages, totals about few Mbytes.
Can be used as a front-end to MBROLA diphone voices . eSpeak NG converts text to phonemes with pitch and length information.
Can translate text into phoneme codes, so it could be adapted as a front end for another speech synthesis engine.
Potential for other languages. Several are included in varying stages of progress. Help from native speakers for these or other languages is welcome.
Written in C.

See the ChangeLog for a description of the changes in the various releases and with the eSpeak NG project.

The following platforms are supported:

Platform	Minimum Version	Status
Linux
BSD
Android	4.0
Windows	Windows 8
Mac

User guide explains how to set up and use eSpeak NG from command line or as a library.
Building guide provides info how to compile and build eSpeak NG from the source.
Index provides full list of more detailed information for contributors and developers.
Look at contribution guide to start your contribution.
Look at eSpeak NG roadmap to participate in development of eSpeak NG.

The espeak-ng binaries use the same command-line options as espeak , with several additions to provide new functionality from espeak-ng such as specifying the output audio device name to use. The build creates symlinks of espeak to espeak-ng , and speak to speak-ng .

The espeak speak_lib.h include file is located in espeak-ng/speak_lib.h with an optional symlink in espeak/speak_lib.h . This file contains the espeak 1.48.15 API, with a change to the ESPEAK_API macro to fix building on Windows and some minor changes to the documentation comments. This C API is API and ABI compatible with espeak.

The espeak-data data has been moved to espeak-ng-data to avoid conflicts with espeak. There have been various changes to the voice, dictionary and phoneme files that make them incompatible with espeak.

The espeak-ng project does not include the espeakedit program. It has moved the logic to build the dictionary, phoneme and intonation binary files into the libespeak-ng.so file that is accessible from the espeak-ng command line and C API.

The program was originally known as speak and originally written for Acorn/RISC_OS computers starting in 1995 by Jonathan Duddington. This was enhanced and re-written in 2007 as eSpeak , including a relaxation of the original memory and processing power constraints, and with support for additional languages.

In 2010, Reece H. Dunn started maintaining a version of eSpeak on GitHub that was designed to make it easier to build eSpeak on POSIX systems, porting the build system to autotools in 2012. In late 2015, this project was officially forked to a new eSpeak NG project. The new eSpeak NG project is a significant departure from the eSpeak project, with the intention of cleaning up the existing codebase, adding new features, and adding to and improving the supported languages.

The historical branch contains the available older releases of the original eSpeak that are not contained in the subversion repository.

1.24.02 is the first version of eSpeak to appear in the subversion repository, but releases from 1.05 to 1.24 are available at http://sourceforge.net/projects/espeak/files/espeak/ .

These early releases have been checked into the historical branch, with the 1.24.02 release as the last entry. This makes it possible to use the replace functionality of git to see the earlier history:

NOTE: The source releases contain the big_endian , espeak-edit , praat-mod , riskos , windows_dll and windows_sapi folders. These do not appear in the source repository until later releases, so have been excluded from the historical commits to align them better with the 1.24.02 source commit.

eSpeak NG Text-to-Speech is released under the GPL version 3 or later license.

The getopt.c compatibility implementation for getopt support on Windows is taken from the NetBSD getopt_long implementation, which is licensed under a 2-clause BSD license.

Android is a trademark of Google LLC.

Acknowledgements

The catalan extension was funded by [Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya]( https://politiquesdigitals.gencat.cat/ca/inici/index.html#googtrans(ca|en) within the framework of Projecte AINA .

Contributors 130

Python 1.8%
Makefile 1.6%

Text to Speech for Linux: Unveiling Top Solutions for Voice Synthesis

Turn any text into audio

Peech is a text-to-speech tool that quickly converts PDFs, eBooks, articles, and more into high-quality audio

Text-to-speech (TTS) technology on Linux allows users to convert written text into spoken words. This functionality is not only useful for the visually impaired but also benefits those who prefer auditory learning or require hands-free computing. Several TTS tools are available for Linux, each offering varying features to cater to diverse needs. Popular among them is eSpeak , a compact open-source software that provides a straightforward command-line interface for speech synthesis.

The landscape of Text-to-speech for Linux encompasses a range of applications from simple, lightweight programs to more complex systems with natural-sounding voices. The quest for naturalness in computer-generated speech has given rise to projects like CMUSphinx , which aims to provide high-quality speech recognition using models trained on different languages. Accessibility and customization are focal points in the development of Linux TTS tools, as many of them are open source and enable modification to meet user-specific requirements.

While TTS technology continues to evolve, Linux users have access to a number of options for integrating speech into their computing experience. Implementations vary from simple command-line interfaces to more sophisticated GUI-based applications, ensuring there is a solution suitable for different skill levels and use cases. Through these applications, Linux upholds its commitment to inclusivity and adaptability in the realm of digital accessibility.

Linux Text to Speech Basics

In the realm of Linux computing, text to speech tools are essential for converting written text into audible speech. These tools are widely used for their accessibility benefits and in various applications where speech output from text is preferable, especially when utilizing high quality voices and natural sounding voices.

Understanding Speech Synthesis

Speech synthesis, commonly referred to as text to speech, involves the artificial production of human speech. However, the quality of the default voice often leaves much to be desired, sounding robotic and unnatural compared to other synthesized voices like Microsoft Sam. The process begins with text analysis, during which the input text is converted into a linguistic structure. Then, during the synthesis phase, this structure is transformed into the audible waveform that we hear as speech. Each TTS system features unique algorithms and technologies to accomplish this complex task, ensuring the output is as natural-sounding as possible.

TTS Engines for Linux

Linux users have access to a variety of TTS engines. High-quality speech voices are crucial for different use cases, such as adding voice instructions to videos or seeking natural and comforting voices for reading text. eSpeak is a compact, open-source TTS engine known for its simplicity and support for multiple languages. It operates via command line and can be easily integrated with different applications. Another example is Festival, which offers a framework for building speech synthesis systems and is known for its versatility in producing custom voices. Some Text-to-speech tools offer additional features like:

Adjusting pitch and speed
Controlling word gaps

For those seeking more advanced commercial solutions, engines like Cepstral provide a more natural voice quality for professional applications. It’s important to select a TTS engine that balances functionality with system resource requirements, as some engines may be more resource-intensive than others.

Implementation and Usage

Adopting text-to-speech technology on Linux systems can be streamlined by understanding the appropriate tools and their implementation within applications. Users can also convert text to audio files for various purposes, such as creating podcasts or embedding audio. Users have access to various command line and GUI tools, ensuring versatility across different use cases.

Installing TTS Software

To get started, one must install Text-to-speech software. On many Linux distributions this involves package managers like apt for Ubuntu or pacman for Arch Linux. For instance, eSpeak, a compact and open-source TTS program, can be installed using the command sudo apt-get install espeak on Ubuntu-based distributions.

Command Line TTS Tools

Using the command line, eSpeak can convert text files to speech or live input from the standard input. It supports English among other languages and is invoked using commands like espeak "Your text goes here". Advanced usage includes adjusting the pitch, speed, and saving the output to an audio file with flags like -p for pitch, -s for speed, and -w for writing to a file.

For a deep learning approach to Text-to-speech, coqui-ai/TTS offers a toolkit suitable for both research and production environments. This toolkit often requires additional steps for installation, such as working with Python virtual environments and installing dependencies.

Text-to-speech in Applications

Integrating TTS into applications can enhance the accessibility and functionality of software. For example, gosling serves as a wrapper around Google's Cloud Text-to-Speech API , allowing for natural-sounding speech synthesis through simple terminal commands after installation and setup. It shows how modern TTS technology can be leveraged even within Linux terminal environments.

The Linux Compendium

Ubuntu Text To Voice Conversion Software

Convert text to voice with eSpeak on Ubuntu

eSpeak is a compact open-source software speech synthesizer for English and other languages, for Linux and Windows. In this article, we will explain how you can install the command like tool eSpeak and its GUI alternative Gespeaker on your Ubuntu. Here is some basic intro to the two tools:

eSpeak: This command-line tool takes input in the form of a text string, input file, and also from stdin and plays the input in a computer-generated voice. This speech synthesizer supports 107 languages and accents.

Gespeaker: Gespeaker is a free GTK+ frontend for espeak. It allows you to play a text in many languages with settings for voice, pitch, volume, and speed. The text read can also be recorded to WAV file for future listening.

We have run the commands and procedures mentioned in this article on a Ubuntu 18.04 LTS system.

Install and Use eSpeak on Ubuntu

Installation.

eSpeak is easily available on the official Ubuntu repositories and can easily be installed through the command line using the apt-get command. Please follow these steps to install eSpeak via the command line.

Open your Terminal application either through the system Application Launcher Search or through the Ctrl+Alt+T shortcut.

The next step is to update your system’s repository index through the following command:

This helps you in installing the latest available version of a software from the Internet. Please note that only an authorized user can add, remove and configure software on Ubuntu.

Now you are ready to install eSpeak; you can do so by running the following command as sudo:

The system might ask you the password for sudo and also provide you with a Y/n option to continue the installation. Enter Y and then hit Enter; the software will be installed on your system. The process may, however, take some time depending on your Internet speed.

You can check the version number of the application, and also verify that it is indeed installed on your system, through the following command:

Use eSpeak for Text to Audio conversion

Through the eSpeak utility, you can easily listen to your specified text aloud. There are two ways through which you can listen to an input string:

1. Use the following command to listen to the text specified in the inverted commas:

2. Enter the following command and then hit Enter:

On the prompt that appears, enter the text you want eSpeak to say and then hit Enter.

You can enter as many lines of text as you want. Whenever you want to quit the utility, simply hit Ctrl+C

There are many other ways you can use the application; please use the following command to view help on those:

However, this is one of the very useful ways you can use this application, i.e, to listen to text from a text file. Use the following syntax to specify the text file whose text you want espeak to say out aloud.

Remove eSpeak

If you ever want to remove eSpeak installed through the above mentioned method, please use the following command to do so:

The following command will help you in removing any additional packages that were installed with eSpeak or any other software, for that matter:

Gespeaker-A GTK frontend for espeak

For a person who does not want to open the Command Line much, installing a software through the Ubuntu UI is very simple. Please follow these steps in order to install the Gespeaker tool; available on the Ubuntu Bionic Universe repository:

On your Ubuntu desktop Activities toolbar/dock, click the Ubuntu Software icon.

Click the search icon and enter ‘gspeaker’ in the search bar. The search results will list the relevant entries as follows:

The Gespeaker entry listed here is the one maintained by Ubuntu bionic Universe. Click on this search entry to open the following view:

Click the Install button to begin the installation process. The following authentication dialog will appear for you to provide your authentication details as only an authorized user can install software on Ubuntu.

Please note that only an authorized user can add/remove and configure software on Ubuntu. Enter your password and click the Authenticate button. After that, the installation process will begin, displaying a progress bar as follows:

Gespeaker will then be installed on your system and you will get the following message after a successful install:

Launch Gespeaker Linux desktop application

Through the above dialog, you can launch the tool directly and also Remove it immediately for whatever reason.

If you want to use the command line to install the same application, use the following command in your Terminal.

Launch Gespeaker

You can access Gespeaker from the Ubuntu application launcher bar as follows, or directly access it from the applications listing:

Alternatively, you can use the following command in your Terminal to launch Gespeaker through the command line:

Important: The Gspeaker UI will only launch if you have Python-dbus installed on your system. Please use the following command in your Terminal to install the said utility:

This is how the Gespeaker UI looks like:

The Gespeaker UI is pretty much straight-forward and you will have absolutely no problem in figuring out how to convert your text and text files to audio.

Remove Gespeaker

If you want to remove Gespeaker that was installed using the above method, you can remove it from your system as follows:

Open the Ubuntu Software Manager and search for Gespeaker. You will see the “Installed” status in the search entry. Click this entry and then click Remove from the following view:

Then, the system will prompt you with an Authentication dialog. The software will be removed when you provide the password for sudo user and click Authenticate on the dialog.

Whether you prefer the UI or the command line, you can easily use the Gespeaker and espeak tools to convert text from various input ways into a voice output.

Last Updated on August 30, 2019 by vadmin

Open Source

eSpeak NG – A Text To Speech Synthesizer For Linux

This guide explains what is eSpeak NG , how to install eSpeak NG in Linux and how to convert text to speech using eSpeak NG in Linux .

Table of Contents

What is eSpeak NG?

eSpeak NG is a command line, multi-lingual software speech synthesizer for English and many other languages. We can convert text to speech using eSpeak NG in Linux and Unix-like systems. eSpeak NG is an updated version of eSpeak engine created by Jonathan Duddington.

You can use eSpeak NG to listen to blogs and news sites and also convert text files to voice for visually impaired people. eSpeak includes different voices, and their characteristics can be altered.

eSpeak NG is a cross-platform application that supports Android, Linux, Mac OS and Windows. It is a free, open source program written in C programming language. The source code of eSpeak NG project is hosted in GitHub.

How eSpeak NG works?

eSpeak NG will read aloud the given text for you! It can able to speak text either from standard input or from a file. So, you can directly give the phrase to speak as input for eSpeak NG or save the text in a file and then pass that text file as an input. It uses text-to-speech to speak through the default sound device.

You can also save the output file in wav or mp3 format, instead of speaking directly. The resulting file can be played on any media players, such as VLC, SMplayer etc. It can also translate text into phoneme codes.

Supported languages

eSpeak NG does text to speech synthesis for 100+ languages and accents, including Afrikaans, Albanian, Aragonese, Armenian, Bulgarian, Cantonese, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Farsi, Finnish, French, Georgian, German, Greek, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Kannada, Kurdish, Latvian, Lithuanian, Lojban, Macedonian, Malaysian, Malayalam, Mandarin, Nepalese, Norwegian, Polish, Portuguese, Punjabi, Romanian, Russian, Serbian, Slovak, Spanish, Swahili, Swedish, Tamil, Telugu, Turkish, Vietnamese, Welsh and more. Some languages are supported better than others.

Install eSpeak NG in Linux

eSpeak NG is packaged for popular Linux operating systems, so you can install eSpeak using the default package manager.

To install eSpeak NG on Arch Linux, EndeavourOS and Manjaro Linux, run:

Debian, Ubuntu and its derivatives like Linux Mint and Pop OS:

Fedora, CentOS, AlmaLinux, and Rocky Linux:

Convert text to speech using eSpeak NG

eSpeak NG is fully compatible with its predecessor eSpeak. In fact, eSpeak NG uses the same command line options as eSpeak, with several additional functionalities. Let us see a few examples.

1. Speak a phrase aloud using eSpeak NG:

Alternatively, you can use echo command to pipe the phrase as input to eSpeak NG like below:

eSpeak NG will read aloud the given string through the default sound device.

2. As stated earlier, eSpeak NG can read aloud the contents from a file.

3. Read text input from standard input instead of a file:

Type the word to speak and hit ENTER key. To exit, press CTRL+C .

4. If you want to save output to a WAV audio file, rather than speaking it directly, use -w flag:

5. eSpeak can able to print the phonemes of a text.

The following command will speak the word "ostechnix", and print the phonemes that were spoken.

Sample output:

6. eSpeak NG supports several different voices. To list all voices supported by eSpeak NG, run:

You can also list all voices that speak a specific language, for example English (en), like below:

7. eSpeak NG will speak the given text using the default English voice. If you want to use a different voice, run:

8. For more details about eSpeak NG, refer the man pages:

Gespeaker - A GTK front-end to eSpeak

Gespeaker is a text to speech GTK+ front-end for eSpeak and mbrola. It allows you to play a text in many languages. You can adjust various settings such as voice, pitch, volume and speed.

To install Gespeaker in Debian, Ubuntu and its derivatives, run:

Once installed, launch Gespeaker from menu or application launcher. The default interface of Gespeaker will look like below:

Gespeaker usage is fairly easy! Enter the text to speak and click Play button. it's that simple!!

You can choose language and the voice (male or female) to use from Base settings tab and adjust the values for pitch, volume, speed and delay settings as you wish from the Advanced settings section.

eSpeak NG GitHub Repository
Gespeaker GitHub Repository

Bash Scripting – While And Until Loop Explained With Examples

Tr command in linux explained with examples, you may also like, savedesktop: an easy way to save your linux..., how to record your terminal activity using script..., how to make linux system to run faster..., how to test a package without installing it..., how to use nproc command to find available..., how to set or change hostname in linux, leave a comment cancel reply.

Save my name, email, and website in this browser for the next time I comment.

This site uses Akismet to reduce spam. Learn how your comment data is processed .

This website uses cookies to improve your experience. By using this site, we will assume that you're OK with it. Accept Read More

15 Open-source Text To Speech TTS Apps and Libraries

Hazem Abbas

What is text-to-speech.

Text-to-speech or speech synthesis is an artificially generated human-sounding speech from text that recognize words and formulate human speech.

The first Text-To-Speech system was introduced to the world in 1968 by Noriko Umeda et al, at the Electrotechnical Laboratory in Japan.

In 1961, physicist John Larry Kelly, Jr and his colleague Louis Gerstman used an IBM 704 computer to synthesize speech, an event among the most prominent in the history of Bell Labs.

The benefits of TTS?

The primary advantageous of this technology are people with visual and reading impairments, as they were its first users.

Nowdays, many YouTube channels use this technology in order to minimize their edit and increase their production.

In many modern operating system, Text-to-speech is a built-in accessibility feature to assist people who cannot read on-screen text easily.

About this list

In this article we offer you our collection of free, open-source Text-To-Speech (TTS) and speech synthesis apps. You can also find a new updated list for more open-source web-based TTS apps and services .

1- MARY TTS

MARY TTS is an open-source, multilingual text-to-speech synthesis system written in pure java. It is available for Windows, Linux, and macOS.

MARY TTS is released under the LGPL-3.0 License.

Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0.The source code is available at GitHub . Kaldi can run on Windows, Linux, and macOS. It also can run on Android, PowerPC, and with Web Assembly.

OpenTTS is a free, open-source Open Text to Speech Server written in Python. It is released under the MIT License. It supports several languages, and comes with an easy-to-use interface. Furthermore, it comes with numerous alternatives libraries.

Supported languages: English (27), German (7), French (3), Spanish (2), Dutch (4), Russian (3), Swedish (1), Italian (2), Swahili (1), Finnish, Korean, Japanese, Chinese, Swedish, and more.

eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows. It supports several languages, and comes with dozens of useful features, which makes it the ideal choice for many users.

Supported languages

5- Text To Speech Converter

This open-source project allows you to convert any text into speech easily by copying and paste the text into its simple interface. It is written in C# programming languages and runs on Windows for now.

6- ONLINE TTS

ONLINE TTS is a simple HTML/ JavaScript project that turns your English text into a formidable speech. ONLINE TTS features simple shortcuts, and a clean user-interface.

Flite is a small, fast run-time synthesis library suitable for embedded systems and servers. The core Flite library was developed by Alan W Black [email protected] (mostly in his so-called spare time) while employed in the Language Technologies Institute at Carnegie Mellon University. Flite supports Windows, Linux, macOS, Android, FreeBSD, and several other systems.

Julius is an open-source large vocabulary continuous speech recognition engine.

It is a high-performance, small-footprint large vocabulary continuous speech recognition (LVCSR) decoder software for speech-related researchers and developers. Based on word N-gram and context-dependent HMM.

Athena is an open-source implementation of sequence-to-sequence based speech processing engine

Athena features

Hybrid Attention/CTC based end-to-end ASR

Speech-Transformer
Unsupervised pre-training
Multi-GPU training on one machine or across multiple machines with Horovod
End-to-end Tacotron2 based TTS with support for multi-speaker and GST
Transformer based TTS and FastSpeech
WFST creation and WFST-based decoding
Deployment with Tensorflow C++

10- ESPnet: end-to-end speech processing toolkit

ESPnet is an end-to-end speech processing toolkit, mainly focuses on end-to-end speech recognition and end-to-end text-to-speech.

It is a developer-friendly application that can integrated into web projects. Developers also can install it using Docker.

11- Voice Builder

Voice Builder is an open source text-to-speech (TTS) voice building tool that focuses on simplicity, flexibility, and collaboration. Our tool allows anyone with basic computer skills to run voice training experiments and listen to the resulting synthesized voice.

The Voice Builder project is written using JavaScript and released under the Apache-2.0 License.

12- Coqui TTS

Coqui TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality.

13- Mozilla TTS

Mozilla TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality.

14- M ycoft Mimic

Mycroft is an open-source voice assistant system. Mimic is the built-in TTS library created by Mycroft team.

15- Free TTS

If you know any other open-source TTS application, toolkit, or library that we didn't mention here, let us know.

11 Essential Free Tools for Pentesting and Securing Your Docker and Kubernetes Containers

Is Your Docker Deployment Secure? Docker containers have revolutionized the way applications are deployed and managed, but with great flexibility comes the need for heightened security vigilance. Regularly checking your Docker containers is crucial to ensure they are configured correctly and free from vulnerabilities. Misconfiguration can expose your applications to

10 Best Free Self-Hosted Server Monitoring Tools for Web Developers, Freelancers, and Agencies

If you are web developers, freelancers, or web agencies, you have likely deployed many websites and apps for your clients that include open-source web apps for project management, file sharing, and other productivity tools for your team. Let's say you have around 1-10 web apps running on your

DivestOS: The Ultimate Privacy-Focused Android System If Your Value Your Privacy

DivestOS is more than just another Android-based operating system—it's a tool built for those who prioritize security and privacy above all else. Whether you're a privacy advocate, a professional working with sensitive data, or someone who values control over their device, DivestOS offers a highly

Zed Attack Proxy, a Free Must Have Cybersecurity Solution for Pentesters

The Zed Attack Proxy (ZAP) is one of the world's most popular open-source web app vulnerability scanner and security tools, actively maintained by a dedicated international team of volunteers. It assists developers and pentesters in automatically identifying security vulnerabilities in web applications during the development and testing phases.

Tailviewer is a free and active log file viewer for Windows Systems

Log viewer apps are important for developers, hackers, software engineer, and testers to check their log files, for errors, warning and app performance alerts. Tailviewer is a fast free and active open-source log file viewer for Windows, that works on Windows 7, 8, and 10. It requires .NET 4.7.

Development

Science - healthcare, open-source apps, medical apps, dev. resources.

Shell Scripting
Docker in Linux
Kubernetes in Linux
Linux interview question

How to Convert Text to Speech on Linux

Text-to-speech (TTS) is the process of transforming written text into spoken words by means of computer technology. Just imagine a computer that reads a book to you. That is, quite literally, the ultimate device from TTS. TTS, in short, is an electronic voice living in the shell of robots. We can compare it with the situation when it can read any text you provide to it. But it is totally different. The only exception is that companies are switching to automatic manufacturing which is an advantage for them.

Benefits of text-to-speech on Linux

Accessibility: Text-to-speech (TTS) is the best friend for choice compared to proprietary software.

Common use cases for text-to-speech applications

Accessibility Tools: TTS, in short, is an artificial intelligence feature that has a role in screen readers that is used, among other things, by people who cannot physically see.
Audiobooks and Podcasts: Turn text articles into audio files and publish audiobooks or podcasts that contain information and entertainment as objectives
Language Learning: This technology helps to master audio like pronunciation, listening skills, and other things well.
Content Creation: Streamers, YouTubers, and other people in this community need their TTS settings for voice-overs and to a lesser extent other videos to which they add narration.
Customer Service : Spoken responses to customer challenges such as via automated phone systems or chatbots are examples of when TTS is used.

Available TTS engines and tools

1. installing espeak.

eSpeak is straightforward to install and use:

Open your terminal.
Update your package list:
Install eSpeak:

successfully installed eSpeak in my system already

2. Converting Text to Speech

Open terminal and type :

That greatly works, When I enter after that Computer Speech What I give them

To read text from a file:

3. Installing Festival

Festival offers more natural-sounding voices and supports multiple languages:

Install Festival :

When you Firstly run then it installs in your system after that if re-run then that shows

Converting Text to Speech

Beautifully pronouns by System , when I enter that command

With eSpeak and Festival, you can add a voice to your Linux computer! This is a valuable tool for accessibility and a fun way to interact with your machine. Both engines are free and open-source, so why not give them a try?

Convert Text to Speech on Linux – FAQs

Which tts engine is best for my needs.

eSpeak: Best for lightweight, simple applications. Festival: Good for more natural-sounding voices and language support.

Can I use TTS offline?

eSpeak and Festival can be used offline after installation.

Are there any costs associated with these TTS engines?

eSpeak and Festival are free and open-source.

Improve your Coding Skills with Practice

What kind of Experience do you want to share?

Let your Linux terminal speak its mind

Jason Baker

The Linux Terminal

Top 7 terminal emulators for Linux
10 command-line tools for data analysis in Linux
Download Now: SSH cheat sheet
Advanced Linux commands cheat sheet
Linux command line tutorials

Greetings from another day in our 24-day-long Linux command-line toys advent calendar. If this is your first visit to the series, you might be asking yourself what a command-line toy even is. We’re figuring that out as we go, but generally, it could be a game, or any simple diversion that helps you have fun at the terminal.

We hope that even if you've seen some of these before, there will be something new for everybody in our series.

Some of you may be too young to remember, but before there was Alexa, Siri, or the Google Assistant, computers still had voices.

Many of us will never forget HAL 9000 from 2001: A Space Odessey helpfully conversing with the crew (sorry, Dave). But between 1960s science fiction and today, there was a whole generation of speaking computers. Some of them great, most of them, not so great.

One of my favorites is the open source project eSpeak . It's available in many forms, including a library version you can use to include speech technology in your own project, but it also coms as a command-line program that you can install and use easily. In my distribution, this was as simple as:

Invoking eSpeak then can be invoked either interactively, or by piping text to it using the output of another program or a simple echo command. There are a number of voice files available for eSpeak, and if you're especially bored over the holidays, you could even create your own.

A fork of eSpeak called eSpeak NG ("Next Generation") was created in 2015 from some developers who wanted to continue development of the otherwise lightly-updated eSpeak. eSpeak is made available as open source under a GPL version 3 license, and you can find out more about the project and download the source code on SourceForge .

I'll also throw in a bonus toy today, cava . Because I've been eager to give each of these articles a unique screenshot as the lead image, and today's toy outputs sound rather than something visual, I needed to find something to fill the space. Short for "console-based audio visualizer for ALSA" (although it supports more than just ALSA now), cava is a nice MIT-licensed terminal audio visualization tool that's fun to watch. Below, is a visualization of eSpeak's output of the following:

Do you have a favorite command-line toy that you we should have included? Our calendar is basically set for the remainder of the series, but we'd still love to feature some cool command-line toys in the new year. Let me know in the comments below, and I'll check it out. And let me know what you thought of today's amusement.

Be sure to check out yesterday's toy, Solve a puzzle at the Linux command line with nudoku , and come back tomorrow for another!

Solve a puzzle at the Linux command line with nudoku

Sudokus are simple logic games that can be enjoyed just about anywhere, including in your Linux terminal.

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

How to text-to-speech output using command-line?

How to get speech output from entered text by using command-line?

Also facility to change speech rate, pitch, volume etc using simple command .

command-line
software-recommendation
text-to-speech

1 Possible duplicate of How can I install and use text-to-speech software? – Organic Addict Commented Dec 5, 2015 at 20:25
2 Update for 2023: these two are very natural sounding: Mimic (from MyCroft) and Coqui-ai TTS. See YouTube comparison of 7 TTS in my answer: askubuntu.com/a/1447599/795299 – alchemy Commented Dec 28, 2022 at 1:34

16 Answers 16

In order of descending popularity :

say converts text to audible speech using the GNUstep speech engine.

festival General multi-lingual speech synthesis system.

spd-say sends text-to-speech output request to speech-dispatcher

espeak is a multi-lingual software speech synthesizer.

24 spd-say appears to be pre-installed in 14.04 and later: releases.ubuntu.com/trusty/… – Ciro Santilli OurBigBook.com Commented Jul 28, 2016 at 11:52
10 Also sudo pip install gTTS , (Google Text to Speech/ github.com/pndurette/gTTS ) then gtts-cli "hello" -o hello.mp3 you can pipe it to mpg123 - as well. gtts-cli "why, hello there" | mpg123 - . – Elijah Lynn Commented Apr 6, 2017 at 17:31
unfortunately, spd-say does not seem to be able to play tts simultaneously, only one a time – phil294 Commented Jul 7, 2017 at 15:51
@ElijahLynn doesn't work – Dims Commented Jan 19, 2018 at 12:49
1 @Wlad espeak.sourceforge.net/download.html is cross platform (but last release was in 2014) – Sylvain Pineau Commented Dec 13, 2019 at 10:45

espeak is a nice little tool.

I just like playing around with it in a command line. You might find it conflicts with Pulseaudio so I'm using a long-winded version that negates having to set it up properly.

espeak --help will show you the options to calibrate reading speed, pitch, voice, etc.

When you're doing your notes, save them as a text file and then:

You can then play around with ffmeg et al to compress this down from PCM to something more manageable like MP3 or OGG. But that's a different story.

1 very nice, one can also try the Graphic User Interface to espeak, espeak-gui. – Sabacon Commented Jan 16, 2011 at 13:15
Pretty rubbish compared to the Mac's text-to-speech tool. – Snowcrash Commented Nov 28, 2019 at 11:29
@Snowcrash Okay... You're free to use something else like Mary-TTS, but that's considerably more of a PITA to install: askubuntu.com/questions/981273/how-to-install-marytts-5-2/… – Oli ♦ Commented Nov 28, 2019 at 11:55

From man spd-say :

Hence you can get text-to-speech by following command:

You can also set speech rate, pitch, volume etc. see man-page.

5 spd-say -t female2 "text" makes it bearable – scorpiodawg Commented Jun 5, 2018 at 17:59
@scorpiodawg Barely, that's pretty primitive still... – Olle Härstedt Commented Jan 26 at 11:10

Python Google Speech :

Svox From Android :

Svox Nanotts :

Linked resource: Comparison of speech synthesizers Post source: Linuxhacks.org Disclosure: I am the owner of Linuxhacks.org

2 To install and use google_speech on ubuntu 18.04 I had to install python3-pip and libsox-fmt-mp3 and use pip3 install google_speech . – artm Commented Jul 1, 2018 at 9:14
Any idea why google_speech has to reboot itself for larger chunks of text? Is there a buffer setting somewhere? – Olle Härstedt Commented Jan 26 at 11:11

Mbrola doesn't work since 11.10.

SVOX (pico) tools are easy to install, easy to use and brings good quality voices in Ubuntu. Install it:

Even more easy, you can use LibreOffice in combination with SVOX (pico) tools by install the "Read Text" extension and you obtain a "GUI" for this excellent TTS software:

Set up Read Text Extension's options with Tools - Add-ons - Read selection.... Use /usr/bin/python as the external program. Select a command line option that includes the token (PICO_READ_TEXT_PY).

SVOX pico2wave

That's what I use. And it sounds natural, it's easy to understand and it recognizes units (m, °C,kg, ...).

Here is my first post about pico2wave.

All you have to do is: Go to Ubuntu Software Center and search for "pico". You'll find 4 or 5 entries with "Small Footprint Ling...". Install them.

A possible use of pico2wave is described in my first posting (follow the link above).

i have used your way can you pls tell me how to get a naturl sweet female voice using your way – user49557 Commented Jun 19, 2015 at 13:03

And yet another espeak gui: gespeaker . It uses both espeak and mbrola engines. Also, it has more options than espeak-gui .

The following is not a FLOSS solution, but you may find it worthwhile. (it is a wine solution),

I'm personally very keen on TTS, I use it quite often... eg. listening to a rambling discourse which I would never bother to stick with otherise (because I need to get another cup of coffee... :)

A few things I've discovered along the way.. or should I say, things I haven't discovered along the way... To put it bluntly: Every piece of FOSS TTS voice software I've tried is under par and therefore unsuitable for any semi-protracted listening...

I currently use ATnT's NaturalVoices. It is only available for Windows (maybe the Mac), but it does run under wine in Ubuntu .. (it has minor glytch, where I sometimes need to click on the panel when I move away from the reader... It is a minor issue when compared to the advantage gained by quality of speech from NatualVoices.

Some other things I've found to be virtually essential for a half-sensible listening experience, are;...

These TTS progamas are not intelligent (well maybe as intelligent as a young baboon) .. so they need every bit of help they can get. and there is one (and only one Reader program I've found which helps greatly in this.. The app is called ReadPlease (2003 Pro) ... It allowd you to specially modify words and groups of word to be pronounced as you want them... It is by no means perfect, but for me, it made the difference between the entire process being usable and not usable...

The speech in Natural Voices is "okay", but it is a bit boring. There are other good products too, but they are all for Windows, unfortunately).. It infeclts surprisingl well sometimes .. but OMG, initially it is a pain! .. so #2 is * patience ... and lots of updating of your "special words" list ... By patience, I mean you(I) actually became accustomed to my particular baboon's speech patterns :)... and by the way, I currently have about 3000 words that now sound "Human" enough that I no longer cringe when I hear them.

3.. "Follow the Bouncing Ball" ... Again because the voice is never as good as a real speaker, things sometimes need to be clarified .. . The Reader program I use has one feature for which I even put up with its clunky looking interface.... Is has a "select the currently being read" word option.. Many readers have this, but ReadPlease keeps the current line bang on center of the screen .. This is invaluable to be able to see ahead and behind to quickly re-read what you just missed (so auto-centering the curent line is good)...

Well that's my experience.. I'm going to make a coffee now, and while I'm doing it, I'll be listening to this, to see how it "reads".... TTS is surprisingl good for picking up typos (I make lots of typos)...

If something as good as ATnT NaturalVoices turns up on the Ubuntu repository, I'll jump at it.

Here is a link to some samples of Natural Voices : I use "MIke"

For festival (the voice seems more natural to me):

Pitch and speed configuration:

create ~/.festivalrc with the following content:

See also http://www.solomonson.com/content/ubuntu-linux-text-speech

Update: tried on another Ubuntu computer. Had to install English speech engine package to work with festival properly:

Also play is a cli command which comes with the sox package:

Even though you've already accepted an answer, I wanted to mention festival , which I like quite a lot too. This post on the Ubuntu forums has a lot of information on getting very nice voices set up for it.

Meet espeak-ng - A multi-lingual software speech synthesizer:

It uses a default English voice, but there are numerous other voices for other languages and even dialects available and can be listed with espeak-ng --voices (for all) or e.g. espeak-ng --voices=en (for English). They can be set with -v together with either the language abbreviation or the file name, e.g. for Scottish or Swahili:

There are many other options available, e.g. -s for the speed and -w to write the output to a wave file, see the manpage linked below.

Comparison table

I think what we at this point is the big summary table, notably looking out for any tool that sounds remotely natural given our 2024-ongoing "deep learning revolution" (the problem now with this Cambrian Explosion is that the packages break every week and only work on certain systems).

Tool	Sounds remotely natural	Output to file	Multilingual	Tested on
(libttspico-utils 1.0+git20130326-14)	y. Some weird distortions, but reasonable.	y		24.04
idiap/coqui-ai-TTS 0.24.1 + Tacotron2	y. Output is randomly different each time. Most words are awesome. Punctuation timing is off. Sometimes it goes completely crazy and it is hilarious.			24.04
	y. Not amazing, but OK. Slight voice distortion and punctuation off.	n		24.04
(speech-dispatcher 0.12.0)	n	n		24.04
(gnustep-gui-runtime 0.30.0)	n	n	n	24.04
1.48.15	n			24.04
2.5.0	n	n		24.04
	n			24.04
1.51	n			24.04
				24.04
toirtoise-tts 3.0.0				24.04

Empty cell means "unknown, untested".

My quick test strings are:

en : "Hello, my name is John Smith. What is your name?"
fr : "Bonjour, je m'appelle Jean Jacques. Tu t'appelles comment?"

"Remotely natural" is of course extremely subjective, and will suffer from the continual moving of AI goalposts as things evolve and we get used to better systems. For now, maybe I'd consider it something along "good enough for an informal video voiceover".

Previously mentioned at: https://askubuntu.com/a/1466489/52975

On Ubuntu 24.04 in a clean virtualenv running:

fails with:

ERROR: Cannot install piper-tts==1.1.0 and piper-tts==1.2.0 because these package versions have conflicting dependencies.

bug report: https://github.com/rhasspy/piper/issues/509

On Ubuntu 24.04:

idiap/coqui-ai-TTS

https://github.com/idiap/coqui-ai-TTS

The first time you call it it installs the necessary model automatically.

Sound takes 5-10 s to start coming out on each invocation, which is unacceptable for frequent short sentences.

The default model seems to be Tacotron2 : https://github.com/NVIDIA/tacotron2 but you can select other models from CLI.

coqui-ai/TTS

Previously mentioned at: https://askubuntu.com/a/1447599/52975

Does not support python 3.12 (Ubuntu 24.04), pip install TTF fails. Report: https://github.com/coqui-ai/TTS/issues/3257 Collaborator: https://github.com/coqui-ai/TTS/issues/3257#issuecomment-2096792618 says instead use idiap/coqui-ai-TTS

Based on the README similarity it seems to be a fork of https://github.com/mozilla/TTS

festival + festvox-us-slt-hts

Mentioned at: https://askubuntu.com/a/908889/52975 tested on Ubuntu 24.04:

tortoise-tts

https://github.com/neonbjb/tortoise-tts

No easy CLI instructions:

https://speechbrain.github.io/
https://github.com/suno-ai/bark

Bibliography:

Natural Sounding Text to Speech?
https://www.reddit.com/r/MachineLearning/comments/12kjof5/d_what_is_the_best_open_source_text_to_speech/
https://www.reddit.com/r/software/comments/176asxr/best_open_source_texttospeech_available/
https://www.reddit.com/r/opensource/comments/19cguhx/i_am_looking_for_tts_software/
https://www.reddit.com/r/LocalLLaMA/comments/1dtzfte/best_tts_model_right_now_that_i_can_self_host/

Balabolka under Wine works fine (for me) with SAPI4 voices (SAPI5 voices are not detected on my Linux system). It can open files and start reading.

Here is link to wine's AppDB entry for Balabolka.

You must log in to answer this question.

Not the answer you're looking for browse other questions tagged command-line software-recommendation text-to-speech ..

The Overflow Blog
What launching rockets taught this CTO about hardware observability
The team behind Unity 6 explains the new features aimed at helping developers
Featured on Meta
Preventing unauthorized automated access to the network
Upcoming initiatives on Stack Overflow and across the Stack Exchange network...
Do we want heavily-downvoted questions to appear on the Meta AU homepage?

Hot Network Questions

How did the money go from buyers to the firm's account?
Would radar/sonar/other spy technology be able to penetrate space station walls?
Raise the Dead rote
Can every finite metric space be approximated by a distinct distance space?
How much of this table can be filled in?
Why aren't all Major scale chords major?
Debian: apt pin not working for kernel package
How should I handle students who are very disengaged in class?
Can Congress pass a law that sets an incarcerated person free?
Misunderstandings in restaurants
What does Rich mean?
Distinction body / individual: was Foucault the first to make it? Did he coin this expression?
Optocoupler vs PSR
Question about two density matrices
Expected number of cards that are larger than both of their neighbors
Deciphering a Medieval Latin text in blackletters
Bash script not renaming file
Difference between "play your cards right" and "on the right track"
Does compressing indexes reduce the size of already compressed backups?
How to distinguish contrast from simultaneity when using the connective "while"?
What expressions (verbs) are used for the actions of adding ingredients (solid, fluid, powdery) into a container, specifically while cooking?
Roll a die in 3D
李白’s poem 《夜宿山寺》
What happens when a car starts moving? The last moment the car is at rest versus the first moment the car moves

Best free text-to-speech software of 2024

Find the best free text-to-speech software for free text to voice conversion

Best overall
Best custom voice
Best for beginners
Best Microsoft extension
Best website reader
How we test

The best free text-to-speech software makes it simple and easy to improve accessibility and productivity in your workflows.

1. Best overall 2. Best custom voice 3. Best for beginners 4. Best Microsoft extension 5. Best website reader 6. FAQs 7. How we test

In the digital era, the need for effective communication tools has led to a surge in the popularity of text-to-speech (TTS) software, and finding the best free text-to-speech software is essential for a variety of users, regardless of budget constraints.

Text-to-speech software skillfully converts written text into spoken words using advanced technology, though often without grasping the context of the content. The best text-to-speech software not only accomplishes this task but also offers a selection of natural-sounding voices, catering to different preferences and project needs.

This technology is invaluable for creating accessible content, enhancing workplace productivity, adding voice-overs to videos, or simply assisting in proofreading by vocalizing written work. While many of today’s best free word processors , such as Google Docs, include basic TTS features that are accurate and continually improving, they may not meet all needs.

Stand-alone, app-based TTS tools, which should not be confused with the best speech-to-text apps , often have limitations compared to more comprehensive, free text-to-speech software. For instance, some might not allow the downloading of audio files, a feature crucial for creating content for platforms like YouTube and social media.

In our quest to identify the best free text-to-speech software, we have meticulously tested various options, assessing them based on user experience, performance, and output quality. Our guide aims to help you find the right text-to-speech tool, whatever your specific needs might be.

The best free text-to-speech software of 2024 in full:

Why you can trust TechRadar We spend hours testing every product or service we review, so you can be sure you’re buying the best. Find out more about how we test.

The best free text-to-speech software overall

1. Natural Reader

Our expert review:

Reasons to buy

Reasons to avoid.

Natural Reader offers one of the best free text-to-speech software experiences, thanks to an easy-going interface and stellar results. It even features online and desktop versions.

You'll find plenty of user options and customizations. The first is to load documents into its library and have them read aloud from there. This is a neat way to manage multiple files, and the number of supported file types is impressive, including eBook formats. There's also OCR, which enables you to load up a photo or scan of text, and have it spoken to you.

The second option takes the form of a floating toolbar. In this mode, you can highlight text in any application and use the toolbar controls to start and customize text-to-speech. This means you can very easily use the feature in your web browser, word processor and a range of other programs. There's also a browser extension to convert web content to speech more easily.

The TTS tool is available free, with three additional upgrades with more advanced features for power-users and professionals.

Read our full Natural Reader review .

^ Back to the top

The best free custom-voice text-to-speech software

2. Balabolka

There are a couple of ways to use Balabolka's top free text-to-speech software. You can either copy and paste text into the program, or you can open a number of supported file formats (including DOC, PDF, and HTML) in the program directly.

In terms of output, you can use SAPI 4 complete with eight different voices to choose from, SAPI 5 with two, or the Microsoft Speech Platform. Whichever route you choose, you can adjust the speech, pitch and volume of playback to create a custom voice.

In addition to reading words aloud, this free text-to-speech software can also save narrations as audio files in a range of formats including MP3 and WAV. For lengthy documents, you can create bookmarks to make it easy to jump back to a specific location and there are excellent tools on hand to help you to customize the pronunciation of words to your liking.

With all these features to make life easier when reading text on a screen isn't an option, Balabolka is the best free text-to-speech software around.

For more help using Balabolka, see out guide on how to convert text to speech using this free software.

The best free text-to-speech software for beginners

3. Panopreter Basic

Panopreter Basic is the best free text-to-speech software if you’re looking for something simple, streamlined, no-frills, and hassle-free.

It accepts plain and rich text files, web pages and Microsoft Word documents as input, and exports the resulting sound in both WAV and MP3 format (the two files are saved in the same location, with the same name).

The default settings work well for quick tasks, but spend a little time exploring Panopreter Basic's Settings menu and you'll find options to change the language, destination of saved audio files, and set custom interface colors. The software can even play a piece of music once it's finished reading – a nice touch you won't find in other free text-to-speech software.

If you need something more advanced, a premium version of Panopreter is available. This edition offers several additional features including toolbars for Microsoft Word and Internet Explorer , the ability to highlight the section of text currently being read, and extra voices.

The best free text-to-speech extension of Microsoft Word

4. WordTalk

Developed by the University of Edinburgh, WordTalk is a toolbar add-on for Word that brings customizable text-to-speech to Microsoft Word. It works with all editions of Word and is accessible via the toolbar or ribbon, depending on which version you're using.

The toolbar itself is certainly not the most attractive you'll ever see, appearing to have been designed by a child. Nor are all of the buttons' functions very clear, but thankfully there's a help file on hand to help.

There's no getting away from the fact that WordTalk is fairly basic, but it does support SAPI 4 and SAPI 5 voices, and these can be tweaked to your liking. The ability to just read aloud individual words, sentences or paragraphs is a particularly nice touch. You also have the option of saving narrations, and there are a number of keyboard shortcuts that allow for quick and easy access to frequently used options.

The best free text-to-speech software for websites

5. Zabaware Text-to-Speech Reader

Despite its basic looks, Zabaware Text-to-Speech Reader has more to offer than you might first think. You can open numerous file formats directly in the program, or just copy and paste text.

Alternatively, as long as you have the program running and the relevant option enables, Zabaware Text-to-Speech Reader can read aloud any text you copy to the clipboard – great if you want to convert words from websites to speech – as well as dialog boxes that pop up. One of the best free text-to-speech software right now, this can also convert text files to WAV format.

Unfortunately the selection of voices is limited, and the only settings you can customize are volume and speed unless you burrow deep into settings to fiddle with pronunciations. Additional voices are available for an additional fee which seems rather steep, holding it back from a higher place in our list.

The best free text-to-speech software: FAQs

What are the limitations of free tts software.

As you might expect, some free versions of TTS software do come with certain limitations. These include the amount of choices you get for the different amount of voices in some case. For instance, Zabaware gives you two for free, but you have to pay if you want more.

However, the best free software on this list come with all the bells and whistles that will be more than enough for the average user.

What is SAPI?

SAPI stands for Speech Application Programming Interface. It was developed by Microsoft to generate synthetic speech to allow computer programs to read aloud text. First used in its own applications such as Office, it is also employed by third party TTS software such as those featured in this list.

In the context of TTS software, there are more SAPI 4 voices to choose from, whereas SAPI 5 voices are generally of a higher quality.

Should I output files to MP3 or WAV?

Many free TTS programs give you the option to download an audio file of the speech to save and transfer to different devices.

MP3 is the most common audio format, and compatible with pretty much any modern device capable of playing back audio. The WAV format is also highly compatible too.

The main difference between the two is quality. WAV files are uncompressed, meaning fidelity is preserved as best as possible, at the cost of being considerably larger in size than MP3 files, which do compress.

Ultimately, however, MP3 files with a bit rate of 256 kbps and above should more than suffice, and you'll struggle to tell the difference when it comes to speech audio between them and WAV files.

How to choose the best free text-to-speech software

When selecting the best free text-to-speech software is best for you depends on a range of factors (not to mention personal preference).

Despite how simple the concept of text-to-speech is, there are many different features and aspects to such apps to take into consideration. These include how many voice options and customizations are present, how and where they operate in your setup, what formats they are able to read aloud from and what formats the audio can be saved as.

With free versions, naturally you'll want to take into account how many advanced features you get without paying, and whether any sacrifices are made to performance or usability.

Always try to keep in mind what is fair and reasonable for free services - and as we've shown with our number one choice, you can get plenty of features for free, so if other options seem bare in comparison, then you'll know you can do better.

How we test the best free text-to-speech software

Our testing process for the best free text-to-speech software is thorough, examining all of their respective features and trying to throw every conceivable syllable at them to see how they perform.

We also want to test the accessibility features of these tools to see how they work for every kind of user out there. We have highlighted, for instance, whether certain software offer dyslexic-friendly fonts, such as the number two on our list, Natural Reader.

We also bear in mind that these are free versions, so where possible we compare and contrast their feature sets with paid-for rivals.

Finally, we look at how well TTS tools meet the needs of their intended users - whether it's designed for personal use or professional deployment.

Get in touch

Want to find out about commercial or marketing opportunities? Click here
Out of date info, errors, complaints or broken links? Give us a nudge
Got a suggestion for a product or service provider? Message us directly
You've reached the end of the page. Jump back up to the top ^

Are you a pro? Subscribe to our newsletter

Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!

Daryl had been freelancing for 3 years before joining TechRadar, now reporting on everything software-related. In his spare time, he's written a book, ' The Making of Tomb Raider '. His second book, ' 50 Years of Boss Fights ', came out in 2024, with a third book coming in 2025. He also has a newsletter called ' Springboard '. He's usually found playing games old and new on his Steam Deck, Nintendo Switch, and MacBook Pro. If you have a story about an updated app, one that's about to launch, or just anything Software-related, drop him a line.

John Loeffler Components Editor
Steve Clark B2B Editor - Creative & Hardware
Lewis Maddison Reviews Writer

Best collaboration platform for teams of 2024

Best IT asset management software of 2024

JLab Go Pop ANC review: some of the best ultra-cheap noise cancelling earbuds on the market

Most Popular

2 The SteelSeries Arctis Nova Pro headset just got a massive price cut on Amazon
3 Volt Typhoon is actually a CIA asset, China claims
4 Verizon-Frontier acquisition in rocky waters as investors eyeball per-share offer
5 Raycon's first pair of bone conduction headphones are built for endurance, with 14-hour battery life

eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows.

eSpeak uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.

eSpeak is available as: Features. . eSpeak converts text to phonemes with pitch and length information.

I regularly use eSpeak to listen to blogs and news sites. I prefer the sound through a domestic stereo system rather than small computer speakers, which can sound rather harsh. . The eSpeak speech synthesizer supports several languages, however in many cases these are initial drafts and need more work to improve them. Assistance from native speakers is welcome for these, or other new languages. Please contact me if you want to help.

eSpeak does text to speech synthesis for the following languages, some better than others.

Afrikaans, Albanian, Aragonese, Armenian, Bulgarian, Cantonese, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Farsi, Finnish, French, Georgian, German, Greek, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Kannada, Kurdish, Latvian, Lithuanian, Lojban, Macedonian, Malaysian, Malayalam, Mandarin, Nepalese, Norwegian, Polish, Portuguese, Punjabi, Romanian, Russian, Serbian, Slovak, Spanish, Swahili, Swedish, Tamil, Turkish, Vietnamese, Welsh. is at: . is a GUI program used to prepare and compile phoneme data. It is now available for download. Documentation is currently sparse, but if you want to use it to add or improve language support, let me know. and originally written for Acorn/RISC_OS computers starting in 1995. This version is an enhancement and re-write, including a relaxation of the original memory and processing power constraints, and with support for additional languages.

You are using an outdated browser. Please upgrade your browser to improve your experience.

UPDATED 07:00 EDT / OCTOBER 15 2024

French AI startup Gladia raises $16M and launches multilingual real-time transcription engine

by Duncan Riley

French artificial intelligence transcription and audio intelligence startup Gladia SAS announced today that it has raised $16 million in new funding and launched a multilingual real-time audio transcription and analytics engine.

Founded in 2022, Gladia aims to help companies leverage cutting-edge AI and retrieve actionable insights from audio data. The company’s application programming interface supports advanced speech recognition features in more than 100 languages, with exceptional accuracy and asynchronous and real-time transcription.

Gladia’s speech recognition tools seek to tackle the issue wherein most speech recognition models today are trained predominantly on English audio data and, at least according to Gladia, are “inherently biased.” Gladia’s solutions, in contrast, have been built to be truly multilingual, with its new fine-tuned engine debuting today offering advanced real-time transcription in more than 100 languages, along with enhanced support for accents and the ability to adapt to different languages on the fly.

The new engine is able to extract insights from calls, such as the caller’s sentiment, key information and conversation summary, in real time, taking less than a second to generate both transcript and insights from a call or meeting using Gladia.

The new product also overcomes challenges such as language understanding and real-time data handling with continuous optimization and maintenance. The real-time speech-to-text engine has a latency of under 300 milliseconds without compromising accuracy, regardless of the language, geography, or tech stack used.

“Our single API is compatible with all existing tech stacks and protocols, including SIP, VoIP, FreeSwitch and Asterisk,” said co-founder and Chief Technology Officer Jonathan Soto. “This allows us to easily integrate real-time transcription and analysis into our customers’ AI platforms so they can focus on delivering the best services to their end users.”

The company’s first async transcription and audio intelligence API launched in June 2023 and has gained traction in the enterprise market, particularly with meeting recorders and note-taking assistants. The API is now used by more than 600 customers around the world, including Attention Inc., Circleback Inc., Method Financial Inc., Recall AI Inc., Sana Labs AB and VEED.IO Ltd.

The $16 million Series A funding round was led by XAnge SAS, with Illuminate Financial Management LLP, XTX Ventures Ltd., Athletico Ventures, Gaingels, Mana Ventures, Motier Ventures SARL, Roosh Ventures GmbH and Soma Capital also participating. The new funding will be used for research and development, to soon bring to market a one-stop AI toolkit for audio and for Gladia to expand its product offering with additional à-la-carte models — including large language models and retrieval-augmented generation.

Image: Gladia

A message from john furrier, co-founder of siliconangle:, your vote of support is important to us and it helps us keep the content free., one click below supports our mission to provide free, deep, and relevant content. , join our community on youtube, join the community that includes more than 15,000 #cubealumni experts, including amazon.com ceo andy jassy, dell technologies founder and ceo michael dell, intel ceo pat gelsinger, and many more luminaries and experts..

Like Free Content? Subscribe to follow.

LATEST STORIES

OutSystems integrates generative AI with low-code to accelerate and simplify app development

Google to buy nuclear energy from small modular reactor startup Kairos Power

Lidar chip startup Lidwave closes $10M investment

Domino Data Lab seeks to embed governance in AI development

Omnea raises $25M to streamline procurement with AI

AI - BY DUNCAN RILEY . 1 MIN AGO

AI - BY MIKE WHEATLEY . 3 HOURS AGO

EMERGING TECH - BY MIKE WHEATLEY . 11 HOURS AGO

EMERGING TECH - BY MARIA DEUTSCHER . 12 HOURS AGO

AI - BY PAUL GILLIN . 13 HOURS AGO

AI - BY MARIA DEUTSCHER . 13 HOURS AGO

IMAGES

How to Download More Voices for Windows Narrator
39 Javascript Speech Synthesis Api
Speech-to-Text Software and Apps: The Complete Guide
Chuyển giọng nói thành văn bản online
Top 7 Best Text To Speech Software Online and Offline
Top text to speech software for Windows

VIDEO

Best Text to Speech Software for PC in 2024
How to disable Kali Linux orca Narrator Screen Reader
Best Text to Speech Tool for Windows 2024
Speech Note 4.6
How to Write a Text file in C++ (Linux)
Turn Your Text Into Audio Files With Text To Speech Module in Linux

An In-Depth Guide to Open Source Text-to-Speech Engines for Linux

Introduction to Text-to-Speech

eSpeak – Lightweight Open Source TTS

Festival – Framework for Building TTS Voices

Pico TTS – Optimized Small Footprint Engine

gTTS – Leveraging Google‘s TTS API

Comparing Voice Quality Between TTS Engines

Additional Tips and Tricks

Leveraging TTS Engines in Shell Scripts

Appendix: Quick Reference of Engines

You maybe like,

11 Best IDEs for Web Development

30 Best GNOME Extensions for Ubuntu in 2023

4 Best Open Source Video Editors for Linux, Mac and Windows: A Complete 2023 Guide

5 Best Free and Open Source NAS Software for Linux

5 Best Linux Distros to Learn Linux

13 Best Free Linux Speech Recognition Tools

Navigation Menu

Saved searches

Licenses found

Repository files navigation

Documentation

Acknowledgements

Contributors 130

Text to Speech for Linux: Unveiling Top Solutions for Voice Synthesis

Linux Text to Speech Basics

Understanding Speech Synthesis

TTS Engines for Linux

Implementation and Usage

Installing TTS Software

Command Line TTS Tools

Text-to-speech in Applications

Convert text to voice with eSpeak on Ubuntu

Install and Use eSpeak on Ubuntu

Use eSpeak for Text to Audio conversion

Remove eSpeak

Gespeaker-A GTK frontend for espeak

Launch Gespeaker

Remove Gespeaker

eSpeak NG – A Text To Speech Synthesizer For Linux

What is eSpeak NG?

How eSpeak NG works?

Supported languages

Install eSpeak NG in Linux

Convert text to speech using eSpeak NG

Gespeaker - A GTK front-end to eSpeak

Bash Scripting – While And Until Loop Explained With Examples

15 Open-source Text To Speech TTS Apps and Libraries

Hazem Abbas

The benefits of TTS?

About this list

1- MARY TTS

5- Text To Speech Converter

6- ONLINE TTS

Athena features

10- ESPnet: end-to-end speech processing toolkit

11- Voice Builder

12- Coqui TTS

13- Mozilla TTS

14- M ycoft Mimic

15- Free TTS

Read More Articles in Development

11 Essential Free Tools for Pentesting and Securing Your Docker and Kubernetes Containers

10 Best Free Self-Hosted Server Monitoring Tools for Web Developers, Freelancers, and Agencies

DivestOS: The Ultimate Privacy-Focused Android System If Your Value Your Privacy

Zed Attack Proxy, a Free Must Have Cybersecurity Solution for Pentesters

Tailviewer is a free and active log file viewer for Windows Systems

Development

How to Convert Text to Speech on Linux

Benefits of text-to-speech on Linux

Common use cases for text-to-speech applications

Available TTS engines and tools

2. Converting Text to Speech

3. Installing Festival

Converting Text to Speech

Convert Text to Speech on Linux – FAQs

Can I use TTS offline?

Are there any costs associated with these TTS engines?

Similar Reads

Improve your Coding Skills with Practice