Case Study

Unique audio segmentation project

Diskusija helped to prepare audio data for voice recognition software. Facing the double challenge of strict time limits and audio complexity, we managed to successfully complete the testing corpus in three languages.

Project summary

Client type:
a language service provider operating as a contractor for the end-client

Duration:
2 months

Languages, technology:
Russian, Polish and English; software similar to transcription software

Goals achieved:
creation of audio corpus for testing voice recognition software, timely deliveries, approval for a second project phase.

Introduction

The client wanted to develop and test tools that could recognise and interpret human speech in various audio formats, regardless of audio quality.

Challenge

The project required teams of linguists who could meticulously identify and label sound activities in bad audio conditions. It was important to accurately complete as much data as possible within a few weeks before project funding ran out.

Solution

Initially, Diskusija was assigned to handle Russian audio recordings, but soon enough the client faced repetitive issues with the Polish and English teams and asked us to take over these languages, too.

We tested and evaluated the linguists immediately.

The client’s aim was to be able to distinguish music from speech, as well as from noises that might “seem” musical (machinery, birdsong, etc.). Capturing vocalised human sounds like laughing, grunting, humming and sighing was also involved. If executed poorly, the client would have ended up with sloppy data.

Part of the project was to train the team by giving them feedback on the completed files. We had to closely monitor deliveries and answer questions on a daily basis. Fortunately, we enjoyed the full support of and great involvement from the client who was enthusiastic about the project, which led to the production of quality data output

Results

Audio recordings segmented:

20.7 hours of Russian
10 hours of Polish
8.4 hours of English

It takes between 6 to 8 hours of work to complete 1 audio hour (20-30 clips).
All in all, over 250 hours were spent on all three languages.

The success of the first phase meant the client could re-apply to receive funding for a second project phase. The progress achieved was a strong argument in favour of its continuation and resulted in approved financing for the further development of voice recognition software.