Learn TO Live: Speech Recognition

Speech Recognition

Introduction

In this article, I tell you how to program speech recognition, speech to text, text to speech and speech synthesis in C# using the System.Speech library.

Speech recognition in C#

Speech recognition

To create a program with speech recognition in C#, you need to add the System.Speech library. Then, add thisusing namespace statement at the top of your code file:

Hide Copy Code

using System.Speech.Recognition;
using System.Speech.Synthesis;
using System.Threading;

Then, create an instance of the SpeechRecognitionEngine:

Hide Copy Code

SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();

Then, we need to load grammars into the SpeechRecognitionEngine. If you don't do that, the speech recognizer will not recognize phrases. For example, add a grammar with the phrase "test" and we give the grammar the name "testGrammar":

Hide Copy Code

_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) { Name = "testGrammar" }); // load a grammar "test"

Or:

Hide Copy Code

Grammar gr = new Grammar(new GrammarBuilder("test"));
gr.Name = "testGrammar";
_recognizer.LoadGrammar(gr);

If you don't want to give a name to the grammar, do this:

Hide Copy Code

_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test"))); // load a "test" grammar

Adding a name is only necessary if you want to unload a grammar in your program. To load grammars asynchronous, use the method LoadGrammarAsync. If you want to load a grammar while the recognizer is running, call the RequestRecognizerUpdate method[^] before loading the grammar, and load the grammar(s) in a RecognizerUpdateReached[^] event handler.

Then, add this event handler:

Hide Copy Code

_recognizer.SpeechRecognized += _recognizer_SpeechRecognized;

If the speech is recognized, the method _recognizer_SpeechRecognized will be invoked. So, we need to create the method. What you can do, is when the program recognized the phrase "test", that you write "The test was successful!". To do that, use this:

Hide Copy Code

void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
     if (e.Result.Text == "test") // e.Result.Text contains the recognized text
     {
         Console.WriteLine("The test was successful!");
     } 
}

As you can see in the comment line, e.Result.Text contains the recognized text. That's useful if you've more then one grammar. But, the speech recognizer wasn't started. To do that, add this code after the_recognizer.SpeechRecognized += _recognizer_SpeechRecognized line:

Hide Copy Code

_recognizer.SetInputToDefaultAudioDevice(); // set the input of the speech recognizer to the default audio device
_recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech asynchronous

Now, if we merge all methods, we get this:

Hide Copy Code

static void Main(string[] args)
{
     SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
     _recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) Name = { "testGrammar" }); // load a grammar
     _recognizer.SpeechRecognized += _recognizer_SpeechRecognized; 
     _recognizer.SetInputToDefaultAudioDevice(); // set the input of the speech recognizer to the default audio device
     _recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech asynchronous
} 
void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
     if (e.Result.Text == "test") // e.Result.Text contains the recognized text
     {
         Console.WriteLine("The test was successful!");
     } 
}

If you run that, it will not work. The program will be ended immediately. So, we must ensure that the program does not stop before the speech recognition is completed. We need to create a ManualResetEvent(System.Threading.ManualResetEvent), with the name _completed, and if the speech recognition is completed, we will call the Set method, and then the program will end. I loaded also a "exit" grammar. If the user says "exit", we will call the Set method. Because there're two threads, the Main thread and the speech recognition thread, we can pause the Main thread until the speech recognition thread isn't completed. And after the speech recognition is completed, we dispose the speech recognition engine (can take 3 seconds time at worst, at best 50 milliseconds):

Hide Copy Code

static ManualResetEvent _completed = null;
static void Main(string[] args)
{
     _completed = new ManualResetEvent(false);
     SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
     _recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) Name = { "testGrammar" }); // load a grammar
     _recognizer.LoadGrammar(new Grammar(new GrammarBuilder("exit")) Name = { "exitGrammar" }); // load a "exit" grammar
     _recognizer.SpeechRecognized += _recognizer_SpeechRecognized; 
     _recognizer.SetInputToDefaultAudioDevice(); // set the input of the speech recognizer to the default audio device
     _recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech asynchronous
     _completed.WaitOne(); // wait until speech recognition is completed
     _recognizer.Dispose(); // dispose the speech recognition engine
} 
void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
     if (e.Result.Text == "test") // e.Result.Text contains the recognized text
     {
         Console.WriteLine("The test was successful!");
     } 
     else if (e.Result.Text == "exit")
     {
         _completed.Set();
     }
}

If you're programming a Windows application, you don't need to create a ManualResetEvent, because the UI thread ends only if the user closes the form.

To unload a grammar, use the method UnloadGrammar in the speech recognition engine, and to unload all grammars use the method UnloadAllGrammars. Don't forget to invoke the methodRequestRecognizerUpdate and to load the grammars in a RecognizerUpdateReached event handler if the recognizer is running.
Unloading the "test" grammar for example:

Hide Copy Code

foreach (Grammar gr in _recognizer.Grammars)
{
       if (gr.Name == "testGrammar")
       {
             _recognizer.UnloadGrammar(gr);
             break;
       }
}

Create a grammar and load the grammar like this:

Hide Copy Code

Grammar testGrammar = new Grammar(new GrammarBuilder("test"));
_recognizer.LoadGrammar(testGrammar);

Then, you can unload the grammar like this:
_recognizer.UnloadGrammar(testGrammar);

If you unload a grammar with the second way, then you must ensure that all access modifiers are right. The first way is the easiest way, because if you use the first way, the access modifiers don't matter.

Speech rejected

If you add a SpeechRecognitionRejected event handler to the SpeechRecognitionEngine, you can show candidate phrases found by the speech recognition engine. First, add a SpeechRecognitionRejected event handler:

Hide Copy Code

_recognizer.SpeechRecognitonRejected += _recognizer_SpeechRecognitionRejected;

Then, create the _recognizer_SpeechRecognitionRejected function:

Hide Copy Code

static void _recognizer_SpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
{
   if (e.Result.Alternates.Count == 0)
   {
     Console.WriteLine("Speech rejected. No candidate phrases found.");
     return;
   }
   Console.WriteLine("Speech rejected. Did you mean:");
   foreach (RecognizedPhrase r in e.Result.Alternates)
   {
    Console.WriteLine("    " + r.Text);
   }
}

This function shows all candidate phrases found by the speech recognition engine if the speech recognition was rejected.

Learn TO Live

Friday, August 26, 2016

Speech Recognition

Speech Recognition

Introduction

Speech recognition in C#

Speech rejected

No comments:

Post a Comment