C# Speech Recognition - Is this what the user said?
Solution 1:
A similar question was asked on Joel on Software a while back. You can use the System.Speech.Recognition namespace to do this...with some limitations. Add System.Speech (should be in the GAC) to your project. Here's some sample code for a WinForms app:
public partial class Form1 : Form
{
SpeechRecognizer rec = new SpeechRecognizer();
public Form1()
{
InitializeComponent();
rec.SpeechRecognized += rec_SpeechRecognized;
}
void rec_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
lblLetter.Text = e.Result.Text;
}
void Form1_Load(object sender, EventArgs e)
{
var c = new Choices();
for (var i = 0; i <= 100; i++)
c.Add(i.ToString());
var gb = new GrammarBuilder(c);
var g = new Grammar(gb);
rec.LoadGrammar(g);
rec.Enabled = true;
}
This recognizes the numbers from 1 to 100, and displays the resulting number on the form. You'll need a form with a label called lblLetter on it.
System.Speech only works with a pre-defined list of words or phrases; it's not exactly NaturallySpeaking, either in versatility or in recognition quality. But you don't have to train it to the user's voice, and if you only have a few different things the user can say, it works reasonably well. And it's free! (if you have Visual Studio)
It won't work well if you use very short phrases; I made a program for my kid to say letters of the alphabet and see them on-screen, but it doesn't do that well since many of the letters sound alike (especially from the mouth of a four-year-old).
As for more flexible options...well, there's the aforementioned NaturallySpeaking, which has an SDK. But you have to contact sales to get any sort of access to it, and no pricing is listed, so it comes across as one of those "How much does it cost? Well, how much have you got?" kind of things. There doesn't seem to be a "download and play around with it" option. :(
As for text-to-speech, System.Speech.Synthesis does this. It's even easier than the speech recognition. I wrote a small program to let me type, hit Enter, and read the text aloud. My four-year-old gets mesmerized by it. :) ("Daddy, I wanna tawk to da wobot.")
Solution 2:
[Note: I was the development lead for the managed speech recognition API in .NET 3.0]
System.Speech is part of .NET 3.0, so it is available on both Vista and XP. In Vista you have the added benefit of having a speech recognition engine pre-installed by the OS. On XP you choices are: use the SAPI 5.1 SDK with a very old engine (but might work well enough for your command and control scenario), install Office 2003 which installs a newer version of the recognizer. There are a few SAPI 5 complient speech recognition engines available as well.
If you need to switch languages, you will want to use the System.Speech.Recognition.SpeechRecognitionEngine class which allows you to choose the SR engine for the language you need to support. Note that engines are defined by a set of languages they support (they might be using the same binary, only swapping data files to support additional languages).
Comment if you need to know more.
Philipp
Solution 3:
Before this add 'Speech' reference
Found that the code example posted by Kyralessa on Oct 22nd didn't work for me but a slightly revised version did. When adding strings into the Choices object use full text English words not numbers. Seems the MS speech recognition engine can't recognize numbers by themselves.
I have marked these modifications with some commenting added to the previous example.
public partial class Form1 : Form
{
SpeechRecognizer rec = new SpeechRecognizer();
public Form1()
{
InitializeComponent();
rec.SpeechRecognized += rec_SpeechRecognized;
}
void rec_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
lblLetter.Text = e.Result.Text;
}
void Form1_Load(object sender, EventArgs e)
{
var c = new Choices();
// Doens't work must use English words to add to Choices and
// populate grammar.
//
//for (var i = 0; i <= 100; i++)
// c.Add(i.ToString());
c.Add("one");
c.Add("two");
c.Add("three");
c.Add("four");
// etc...
var gb = new GrammarBuilder(c);
var g = new Grammar(gb);
rec.LoadGrammar(g);
rec.Enabled = true;
}
Solution 4:
If the engine is what you're asking about then I've found (beware, I'm just listing, I haven't tried any of them):
Lumenvox engine
you also have the SAPI SDK from Microsoft itself, I've only tried it for text to speech but according to its definition:
The SDK also includes freely distributable text-to-speech (TTS) engines (in U.S. English and Simplified Chinese) and speech recognition (SR) engines (in U.S. English, Simplified Chinese, and Japanese).
Solution 5:
Be warned that you're not going to get good results if you don't require training first. Speech recognition is a statistical application of phonetics, a field which is pretty frank about the fact that there's so much variation in the signal that it's almost a miracle anyone can understand what anyone else says. An off-the-shelf speech recognition engine will most likely tend towards a more general accent of English, but will fail miserably for anything even slightly different.
That's why training is so important. We can do well by overfitting with ease, especially if we reduce the problem space. But creating an extensible machine learning solution? Therein always lies the rub.
That being says, consider Sphinx-4. It's an off-the-shelf solution written in Java available at http://cmusphinx.sourceforge.net/sphinx4/