How to convert text string to speech sound

I am looking for a way to convert text(string) in ENG to speech(sound) in c#. do anyone know for a way or some open-source lib that can help me with this task?


Solution 1:

You can use .NET lib(System.Speech.Synthesis).

According to Microsoft:

The System.Speech.Synthesis namespace contains classes that allow you to initialize and configure a speech synthesis engine, create prompts, generate speech, respond to events, and modify voice characteristics. Speech synthesis is often referred to as text-to-speech or TTS.

A speech synthesizer takes text as input and produces an audio stream as output. Speech synthesis is also referred to as text-to-speech (TTS).

A synthesizer must perform substantial analysis and processing to accurately convert a string of characters into an audio stream that sounds just as the words would be spoken. The easiest way to imagine how this works is to picture the front end and back end of a two-part system.

Text Analysis

The front end specializes in the analysis of text using natural language rules. It analyzes a string of characters to determine where the words are (which is easy to do in English, but not as easy in languages such as Chinese and Japanese). This front end also figures out grammatical details like functions and parts of speech. For instance, which words are proper nouns, numbers, and so forth; where sentences begin and end; whether a phrase is a question or a statement; and whether a statement is past, present, or future tense.

All of these elements are critical to the selection of appropriate pronunciations and intonations for words, phrases, and sentences. Consider that in English, a question usually ends with a rising pitch, or that the word "read" is pronounced very differently depending on its tense. Clearly, understanding how a word or phrase is being used is a critical aspect of interpreting text into sound. To further complicate matters, the rules are slightly different for each language. So, as you can imagine, the front end must do some very sophisticated analysis.

Sound Generation

The back end has quite a different task. It takes the analysis done by the front end and, through some non-trivial analysis of its own, generates the appropriate sounds for the input text. Older synthesizers (and today's synthesizers with the smallest footprints) generate the individual sounds algorithmically, resulting in a very robotic sound. Modern synthesizers, such as the one in Windows Vista and Windows 7, use a database of sound segments built from hours and hours of recorded speech. The effectiveness of the back end depends on how good it is at selecting the appropriate sound segments for any given input and smoothly splicing them together.

Ready to Use

The text-to-speech capabilities described above are built into the Windows Vista and Windows 7 operating systems, allowing applications to easily use this technology. This eliminates the need to create your own speech engines. You can invoke all of this processing with a single function call. See Speak the Contents of a String.

try this code:

using System.Speech.Synthesis;

namespace ConsoleApplication5
{
    class Program
    {

        static void Main(string[] args)
        {
            SpeechSynthesizer synthesizer = new SpeechSynthesizer();
            synthesizer.Volume = 100;  // 0...100
            synthesizer.Rate = -2;     // -10...10

            // Synchronous
            synthesizer.Speak("Hello World");

            // Asynchronous
            synthesizer.SpeakAsync("Hello World");



        }

    }
}

Solution 2:

This functionality exists in the main Class Library in the System.Speech namespace. Particularly, look in System.Speech.Synthesis.

Note that you will likely need to add a reference to System.Speech.dll.

The SpeechSynthesizer class provides access to the functionality of a speech synthesis engine that is installed on the host computer. Installed speech synthesis engines are represented by a voice, for example Microsoft Anna. A SpeechSynthesizer instance initializes to the default voice. To configure a SpeechSynthesizer instance to use one of the other installed voices, call the SelectVoice or SelectVoiceByHints methods. To get information about which voices are installed, use the GetInstalledVoices method.

As with all MSDN documentation, there are code samples to use. The following is from the System.Speech.Synthesis.SpeechSynthesizer class.

using System;
using System.Speech.Synthesis;

namespace SampleSynthesis
{
  class Program
  {
    static void Main(string[] args)
    {

      // Initialize a new instance of the SpeechSynthesizer.
      SpeechSynthesizer synth = new SpeechSynthesizer();

      // Configure the audio output. 
      synth.SetOutputToDefaultAudioDevice();

      // Speak a string.
      synth.Speak("This example demonstrates a basic use of Speech Synthesizer");

      Console.WriteLine();
      Console.WriteLine("Press any key to exit...");
      Console.ReadKey();
    }
  }
}

Solution 3:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Speech.Synthesis; // first import this package

    namespace textToSpeech
    {
        public partial class home : Form
        {
            public string s = "pran"; // storing string (pran) to s

            private void home_Load(object sender, EventArgs e)
                {
                    speech(s); // calling the function with a string argument
                }

            private void speech(string args) // defining the function which will accept a string parameter
                {
                    SpeechSynthesizer synthesizer = new SpeechSynthesizer();
                    synthesizer.SelectVoiceByHints(VoiceGender.Male , VoiceAge.Adult); // to change VoiceGender and VoiceAge check out those links below
                    synthesizer.Volume = 100;  // (0 - 100)
                    synthesizer.Rate = 0;     // (-10 - 10)
                    // Synchronous
                    synthesizer.Speak("Now I'm speaking, no other function'll work");
                    // Asynchronous
                    synthesizer.SpeakAsync("Welcome" + args); // here args = pran
                }       
         }
    }
  • It'll be better choice to use "SpeakAsync" because when "Speak" function is executing/running none of other function will work until it finishes it's work (personally recommended)

Change VoiceGender
Change VoiceAge

Solution 4:

Recently Google published Google Cloud Text To Speech.

.NET Client version of Google.Cloud.TextToSpeech can be found here: https://github.com/jhabjan/Google.Cloud.TextToSpeech.V1

Nuget: Install-Package JH.Google.Cloud.TextToSpeech.V1

Here is short example how to use the client:

GoogleCredential credentials =
    GoogleCredential.FromFile(Path.Combine(Program.AppPath, "jhabjan-test-47a56894d458.json"));

TextToSpeechClient client = TextToSpeechClient.Create(credentials);

SynthesizeSpeechResponse response = client.SynthesizeSpeech(
    new SynthesisInput()
    {
        Text = "Google Cloud Text-to-Speech enables developers to synthesize natural-sounding speech with 32 voices"
    },
    new VoiceSelectionParams()
    {
        LanguageCode = "en-US",
        Name = "en-US-Wavenet-C"
    },
    new AudioConfig()
    {
        AudioEncoding = AudioEncoding.Mp3
    }
);

string speechFile = Path.Combine(Directory.GetCurrentDirectory(), "sample.mp3");

File.WriteAllBytes(speechFile, response.AudioContent);