Last week Russia’s Speech Technology Center (STC) and ITMO University saw their speaker diarisation and speech recognition technology win an international contest, the CHiME Speech Separation and Recognition Challenge (CHiME-6).
The technology demonstrated a superior ability to recognise English speech from multiple microphones in a natural, noisy environment. It took the first place the most difficult track of the challenge – which involved, for the first time in history, non-segmented speech.
STC described this track as follows:
“The recordings for the contest were made at 20 dinners in real houses, at parties where people cooked, ate, washed up, communicated freely and emotionally, joked and laughed. The simultaneous speech of 2–4 people, reverberation and intense noise, such as clinking tableware, water pouring from the tap, whirring A/C, footsteps or laughter, are the biggest difficulties here. The goal of the participants was to create a recognition system that would ‘listen’ to the recordings and return a full transcript with the fewest errors.”
“This success was achieved by developing a unique algorithm for allocating separate speech segments for each of the speakers. The team also created a complex leveraging several neural networks of different architectures to distinguish different speakers, implement the beam-forming (pointing a microphone at a particular speaker) effect and directly recognise speech.”
According to STC, non-segmented speech processing may be used to identify speakers speaking simultaneously in meetings. It may also be used to “automate the work of call centers by recognizing spontaneous speech, classifying voice calls, checking compliance with the script, assessing customer satisfaction and quality of dialogue, etc.”
Founded in St. Petersburg in 1990, STC develops voice and facial biometric solutions for professional data processing and machine learning. Its products are sold in more than 70 countries, including the USA, Canada, Latin American and the Middle East. Its main competitors on the global market are Nuance, NEC and Agnitio.
In Russia, STC’s technology is used by such major corporations as Gazprombank, Rostelecom, Russian Railways Sberbank, Vimpelcom and VTB.
Last year Sberbank, the Russian financial and tech giant, took control of STC. Digital Horizon, a VC and business incubation firm with offices in Moscow and Tel Aviv, acquired a minority stake while Gazprombank remained a strategic shareholder.