Help with SAPI v5.1 SpeechRecognitionEngine always gives same wrong result with C#

I was playing around with this SAPI v5.1 library. So I was testing a sample WAV file I have. (Download it from here). Anyway, the sound in that file is clear and easy. It contains only one word which is number three. Now when I run the following code, I get number 8 or "eight". If I remove it, I get 7. If I try to randomize the list I get different results and so on. I'm really getting confused and started to think that SpeachRecognition in SAPI library doesn't work at all...

Anyway here is what I'm doing,

    private void button1_Click(object sender, EventArgs e)
    {
        //Add choices to grammar.
        Choices mychoices = new Choices();
        mychoices.Add("one");
        mychoices.Add("two");
        mychoices.Add("three");
        mychoices.Add("four");
        mychoices.Add("five");
        mychoices.Add("six");
        mychoices.Add("seven");
        mychoices.Add("eight");
        mychoices.Add("nine");
        mychoices.Add("zero");
        mychoices.Add("1");
        mychoices.Add("2");
        mychoices.Add("3");
        mychoices.Add("4");
        mychoices.Add("5");
        mychoices.Add("6");
        mychoices.Add("7");
        mychoices.Add("8");
        mychoices.Add("9");
        mychoices.Add("0");

        Grammar myGrammar = new Grammar(new GrammarBuilder(mychoices));

        //Create the engine.
        SpeechRecognitionEngine reco = new SpeechRecognitionEngine();

        //Read audio stream from wav file.
        reco.SetInputToWaveFile("3.wav");
        reco.LoadGrammar(myGrammar);

        //Get the recognized value.
        reco.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(reco_SpeechRecognized);

        reco.RecognizeAsync(RecognizeMode.Multiple);
    }

    void reco_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
    {
        MessageBox.Show(e.Result.Text);
    }

Solution 1:

How did you create your WAV file? It looks like it has a high bitrate. There are only certain formats supported by the recognizer. Try:

  • 8 bits per sample
  • single channel mono
  • 22,050 samples per second
  • PCM encoding

You have about 3 seconds of audio and the file size is 520 KB. That seems too big for the supported formats.

You can use the RecognizerInfo class to find the supported audio formats (SupportedAudioFormats) for your recognizer - RecognizerInfo.SupportedAudioFormats Property.

Update:

Your audio file is kind of a mess. It is very noisy. It is also in an unsupported format. Audacity reports it as stereo, 44.1 kHz, and 32-bit float. I silenced the noise in the beginning and end, resampled to 22.050 kHz, removed the stereo track, and then exported as uncompressed 8-bit unsigned WAV. It then works fine.

On my Windows 7 machine, my default recognizer supports only the following audio formats:

  0:
  Encodingformat = Pcm
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 16000

  1:
  Encodingformat = Pcm
  BitsPerSample = 16
  BlockAlign = 2
  ChannelCount = 1
  SamplesPerSecond  = 16000

  2:
  Encodingformat = Pcm
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 22050

  3:
  Encodingformat = Pcm
  BitsPerSample = 16
  BlockAlign = 2
  ChannelCount = 1
  SamplesPerSecond  = 22050

  4:
  Encodingformat = ALaw
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 22050

  5:
  Encodingformat = ULaw
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 22050

You should also remove the numeric choices from the grammar. Right now the recognizer returns two alternates: "three" and "3". This probably isn't what you want. You could use a semantic result value in your grammar to return the number 3 for the word "three".