Hands of a guy on laptop keyboard

Can AI understand our language?

Published on 14 June 2023
Updated on 19 March 2024

Transcription software is now crucial for many businesses, journalists, researchers, and other professionals who need to convert audio or video recordings into text. These software programs automate the transcription process, which saves time and energy while improving accuracy. It can support various languages, feature editing options, speaker identification, and timestamps. In general, transcription software is a valuable resource for ensuring efficient and precise transcriptions for activities such as research, content development, and documentation. There are several kinds of transcription software available, such as automatic, professional, and speech-to-text software.

Over 60 AI-based transcription and software applications use automatic speech recognition. These transcription software programs are often bundled with meeting platforms like Zoom or are available as standalone apps, exhibiting increasing power and new capabilities such as meeting summarisation. At Diplo, we use these applications extensively, but our primary challenge is the diverse accents and dialects of English spoken by our professors and students, ranging from those coming from Asia, Africa, and the Balkans to Latin America. Since ASR systems are mostly trained on datasets of native speakers, they may struggle to accurately transcribe speech with accents and dialects. Differences in the speed of speech and the usage of professional or technical language can also affect transcription accuracy. This is commonly the case at Diplo, which deals with digital diplomacy, governance, and more. Our comparative research looked at two types of speeches: general diplomatic speeches and specialised internet governance speeches, spoken in accents or dialects of English such as Indian, Chinese, African (Kenyan and Nigerian), Russian, Balkan (Serbian), German, French, Spanish (Spain and Mexico), Portuguese (Portugal and Brazil), and other languages.

After transcribing video and audio content from multiple speakers with diverse dialects of English, we obtained the following outcomes.

After conducting a detailed analysis, we recommend Otter and Grain as the best software options for an organisation like Diplo. These two software have the ability to recognise different dialects, which is a crucial aspect of our organisation. Furthermore, they both offer fast transcribing with an impressive accuracy rate of 99%. The only difference between the two is that Grain supports more languages as compared to Otter.

Related resources

Load more
0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

The reCAPTCHA verification period has expired. Please reload the page.

Subscribe to Diplo's Blog