Apart from Vision API another interesting part of Cognitive Services is Speech API. Here we can use Bing Speech API to convert text to speech, custom speech API or Speaker Recognition API. Let’s look at the last one.

Speaker Recognition API

Here we can use two main options. We can either use Speaker Identification or Speaker Verification.

Speaker Identification

If we have a group of people – we can try to determine which of them is speaking on the provided audio sample. On the Cognitive Services web page there’s an interesting example with US presidents.

Example with US presidents

To test that we could this out.

Above we have a sample of Bill Clinton’s famous speech – if we apply it to the sample on the page. We get the correct recognition of the speaker.

Bill Clinton correctly identified on audio sample

We can try also with George Bush. Also a famous speech. Here we’re almost correct – the API recognises his father as the one who speaks on the sample.

George Bush not recognised on audio sample

Speaker Verification

We can also use our voice for verification purposes. To do that we need to record our voice when reading one of the specified phrases. Later we can try to verify ourself with another audio sample to get an indication whether the person on the recording is who she claims to be. The service works only in English (at least for the time of this writing) – I could not make to verify my voice correctly although the enrolment – the process of correctly recording test phrases was ok. I guess I need to work on my pronunciation.

Have you used those features already in your apps?

Founder of Octal Solutions a .NET software house.
Passionate dev, blogger, occasionally speaker, one of the leaders of Wroc.NET user group. Microsoft MVP