espeak text to speech sounds very weird while running pyttsx3 code on ubuntu 20.04 LTS
I am bit new to linux and I tried to run this python code. At the first time it asked me to install libespeak-dev and try to run the code again. After installing it when I ran the code the sound was very weird and very robotic and it was terrible to listen. Here is the code:
import pyttsx3
engine = pyttsx3.init()
def speak(text):
engine.say(text)
engine.runAndWait()
speak("Hello World and this is a test.")
When I tried to run the same code on windows 10, with the same python version, it sounded normal, but when I tried to run the code on ubuntu 20.04 LTS, it sounded very terrible.
For some reasons I am unable to attach the mp3 file to tell how terrible it sounds. By the way I am using Ubuntu 20.04 LTS which comes with default installation of python 3.8.5. Is there any fix to this? Because it sounds very bad. Thanks in advance...
Solution 1:
I have also faced a problem like this when I compiling a program JARVIS like this in Ubuntu20.04 LTS and the voice is very terrible, So after some research, I came to a solution.
The code you wrote will work fine in OS like Windows because according to the pyttsx3 library the following are the drivers used in different OS
- sapi5 - SAPI5 on Windows
- nsss - NSSpeechSynthesizer on Mac OS X
- espeak - eSpeak on every other platform(Linux)
So as you are compiling in ubuntu20.04 First, you need to download espeak
$ sudo apt-get install espeak
you can use the code as follows:
import pyttsx3
engine = pyttsx3.init("espeak")
voices = engine.getProperty('voices')
engine.setProperty('voice',voices[11].id) #English
def speak(text):
engine.say(text)
engine.runAndWait()
speak("Hello World and this is a test.")
In line 4 voices[11].id
is used to declare the output voice language and it is now set to English, by changing the index the language will change. To see all the voices which are present in the espeak module run the following command in the terminal
sudo espeak --voices
Finally, you can listen to some better understandable output voice compared to the initial voice.