Microsoft Cognitive Services

Apart from Vision API another interesting part of Cognitive Services is Speech API. Here we can use Bing Speech API to convert text to speech, custom speech API or Speaker Recognition API. Let’s look at the last one.

Here we can use two main options. We can either use Speaker Identification or Speaker Verification.

Speaker Identification

If we have a group of people – we can try to determine which of them is speaking on the provided audio sample. On the Cognitive Services web page there’s an interesting example with US presidents.

To test that we could this out.

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Above we have a sample of Bill Clinton’s famous speech – if we apply it to the sample on the page. We get the correct recognition of the speaker.

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

We can try also with George Bush. Also a famous speech. Here we’re almost correct – the API recognises his father as the one who speaks on the sample.

Speaker Verification

We can also use our voice for verification purposes. To do that we need to record our voice when reading one of the specified phrases. Later we can try to verify ourself with another audio sample to get an indication whether the person on the recording is who she claims to be. The service works only in English (at least for the time of this writing) – I could not make to verify my voice correctly although the enrolment – the process of correctly recording test phrases was ok. I guess I need to work on my pronunciation.

Have you used those features already in your apps?

Paweł Łukasik

Founder of Octal Solutions a .NET software house.
Passionate dev, blogger, occasionally speaker, one of the leaders of Wroc.NET user group. Microsoft MVP. Podcaster – Ostrapila.pl

Microsoft Cognitive Services – Speech API

Speaker Identification

Speaker Verification