Speech Recognition
Speech Recognition
Introduction
In this article, I tell you how to program speech recognition, speech to text, text to speech and speech synthesis in C# using the
System.Speech
library.Speech recognition in C#
To create a program with speech recognition in C#, you need to add the System.Speech library. Then, add this
using
namespace statement at the top of your code file:
Hide Copy Code
using System.Speech.Recognition;
using System.Speech.Synthesis;
using System.Threading;
Then, create an instance of the
SpeechRecognitionEngine
:
Hide Copy Code
SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
Then, we need to load grammars into the
SpeechRecognitionEngine
. If you don't do that, the speech recognizer will not recognize phrases. For example, add a grammar with the phrase "test" and we give the grammar the name "testGrammar":
Hide Copy Code
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) { Name = "testGrammar" }); // load a grammar "test"
Or:
Hide Copy Code
Grammar gr = new Grammar(new GrammarBuilder("test"));
gr.Name = "testGrammar";
_recognizer.LoadGrammar(gr);
If you don't want to give a name to the grammar, do this:
Hide Copy Code
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test"))); // load a "test" grammar
Adding a name is only necessary if you want to unload a grammar in your program. To load grammars asynchronous, use the method
LoadGrammarAsync
. If you want to load a grammar while the recognizer is running, call the RequestRecognizerUpdate method[^] before loading the grammar, and load the grammar(s) in a RecognizerUpdateReached[^] event handler.
Then, add this event handler:
Hide Copy Code
_recognizer.SpeechRecognized += _recognizer_SpeechRecognized;
If the speech is recognized, the method
_recognizer_SpeechRecognized
will be invoked. So, we need to create the method. What you can do, is when the program recognized the phrase "test", that you write "The test was successful!". To do that, use this:
Hide Copy Code
void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result.Text == "test") // e.Result.Text contains the recognized text
{
Console.WriteLine("The test was successful!");
}
}
As you can see in the comment line,
e.Result.Text
contains the recognized text. That's useful if you've more then one grammar. But, the speech recognizer wasn't started. To do that, add this code after the_recognizer.SpeechRecognized += _recognizer_SpeechRecognized
line:
Hide Copy Code
_recognizer.SetInputToDefaultAudioDevice(); // set the input of the speech recognizer to the default audio device
_recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech asynchronous
Now, if we merge all methods, we get this:
Hide Copy Code
static void Main(string[] args)
{
SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) Name = { "testGrammar" }); // load a grammar
_recognizer.SpeechRecognized += _recognizer_SpeechRecognized;
_recognizer.SetInputToDefaultAudioDevice(); // set the input of the speech recognizer to the default audio device
_recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech asynchronous
}
void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result.Text == "test") // e.Result.Text contains the recognized text
{
Console.WriteLine("The test was successful!");
}
}
If you run that, it will not work. The program will be ended immediately. So, we must ensure that the program does not stop before the speech recognition is completed. We need to create a
ManualResetEvent
(System.Threading.ManualResetEvent
), with the name _completed
, and if the speech recognition is completed, we will call the Set
method, and then the program will end. I loaded also a "exit"
grammar. If the user says "exit", we will call the Set method. Because there're two threads, the Main thread and the speech recognition thread, we can pause the Main thread until the speech recognition thread isn't completed. And after the speech recognition is completed, we dispose the speech recognition engine (can take 3 seconds time at worst, at best 50 milliseconds):
Hide Copy Code
static ManualResetEvent _completed = null;
static void Main(string[] args)
{
_completed = new ManualResetEvent(false);
SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) Name = { "testGrammar" }); // load a grammar
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder("exit")) Name = { "exitGrammar" }); // load a "exit" grammar
_recognizer.SpeechRecognized += _recognizer_SpeechRecognized;
_recognizer.SetInputToDefaultAudioDevice(); // set the input of the speech recognizer to the default audio device
_recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech asynchronous
_completed.WaitOne(); // wait until speech recognition is completed
_recognizer.Dispose(); // dispose the speech recognition engine
}
void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result.Text == "test") // e.Result.Text contains the recognized text
{
Console.WriteLine("The test was successful!");
}
else if (e.Result.Text == "exit")
{
_completed.Set();
}
}
If you're programming a Windows application, you don't need to create a
ManualResetEvent
, because the UI thread ends only if the user closes the form.
To unload a grammar, use the method
Unloading the "test" grammar for example:
UnloadGrammar
in the speech recognition engine, and to unload all grammars use the method UnloadAllGrammars
. Don't forget to invoke the methodRequestRecognizerUpdate
and to load the grammars in a RecognizerUpdateReached
event handler if the recognizer is running.Unloading the "test" grammar for example:
Hide Copy Code
foreach (Grammar gr in _recognizer.Grammars)
{
if (gr.Name == "testGrammar")
{
_recognizer.UnloadGrammar(gr);
break;
}
}
- Create a grammar and load the grammar like this:Hide Copy Code
Grammar testGrammar = new Grammar(new GrammarBuilder("test")); _recognizer.LoadGrammar(testGrammar);
- Then, you can unload the grammar like this:
- _recognizer.UnloadGrammar(testGrammar);
If you unload a grammar with the second way, then you must ensure that all access modifiers are right. The first way is the easiest way, because if you use the first way, the access modifiers don't matter.
Speech rejected
If you add a
SpeechRecognitionRejected
event handler to the SpeechRecognitionEngine
, you can show candidate phrases found by the speech recognition engine. First, add a SpeechRecognitionRejected
event handler:
Hide Copy Code
_recognizer.SpeechRecognitonRejected += _recognizer_SpeechRecognitionRejected;
Then, create the
_recognizer_SpeechRecognitionRejected
function:
Hide Copy Code
static void _recognizer_SpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
{
if (e.Result.Alternates.Count == 0)
{
Console.WriteLine("Speech rejected. No candidate phrases found.");
return;
}
Console.WriteLine("Speech rejected. Did you mean:");
foreach (RecognizedPhrase r in e.Result.Alternates)
{
Console.WriteLine(" " + r.Text);
}
}
This function shows all candidate phrases found by the speech recognition engine if the speech recognition was rejected.
No comments:
Post a Comment