Transcriptions and subtitling: Speech recognition as a tool for text creation

It´s not just because of Covid-19. As more and more people have been working from home and using video conferencing over the past years, the boundaries between the spoken and written word have become increasingly blurred. Transcriptions and subtitles can now be created by just pushing a button.

But how can these rough transcriptions be integrated into larger content projects in a productive way? In addition to the new possibilities, there are also some pitfalls lurking in transcription and subtitling projects. Here are some tips on how to avoid them.

It’s not so long ago that pens and notepads had pride of place at every meeting and interview. They were indispensable when it came to jotting down quick thoughts. And there were no lack of options for recording the spoken word, but analysing audio recordings was laborious. Any number of students or journalists can tell you that transcribing a single hour of audio material can take a whole working day (not counting analysis of the material).

The era of laborious typing up has, however, become a thing of the past. Voice recognition using artificial intelligence (AI) has made great strides in a short time. Estonia, known for its openness to innovative IT solutions, recently announced that a new AI system would be used to make stenographers’ jobs easier. But, even here, machine transcriptions are only around 95% reliable: humans still need to check the work afterwards.

Use cases: deployment of transcriptions and subtitles

Modern voice recognition is not just a useful tool for cutting down on time-consuming transcription work. Rather, it opens up new potential for complex content projects in which the spoken and written word seamlessly interlock. For example, translated subtitles allow international teams to work with the same video content despite language barriers. Short video messaging can replace time-consuming emails if the technology makes it easy to transfer the video to writing at a later date.

The following five examples illustrate how transcriptions and subtitles in several languages could become part of larger projects:

Online conferences: Scientists and experts from around the world are spending more time in virtual meetings due to the pandemic. Recordings of these conferences can then be used as the basis for subsequent publications.
Video interviews: Recorded expert interviews are an excellent instrument for preparing blog posts, expert articles or white papers. Time-consuming briefings and approval meetings that first have to be coordinated between several departments can be avoided if all the stakeholders get together online for short content sessions. Any queries can be dealt with straight away.
Transcription for SEO purposes and for reuse: Perhaps you already produced hours of video content for your company (e.g. for your YouTube channel). Why not dig up these treasures? Then transcribe them and make them accessible to Google etc. In written form. A relatively small amount of effort can be leveraged to turn yesterday’s video tutorial into tomorrow’s blog post or white paper.
Market research: Customers around the world can answer surveys in their native language by video conference, and you do not have to cover the travelling costs. These videos can be simply transcribed or provided with translated subtitles in any desired language.
Production of image films: Collaboration among multilingual teams is now the standard when creating marketing videos. Once you have a translated transcript, including time markers, the language barrier no longer plays a significant role.

Best practices for content projects with transcriptions and subtitles

This brave new world of voice recognition is not, however, without its pitfalls. Today, video and audio content can be produced quickly, but when it comes to processing and evaluation, there can be some nasty surprises: Even with AI support, the work and costs involved in transcription and subtitling can still be huge. This brings us to the first of our four best practices for content projects with voice recognition:

Don’t underestimate the effort required. One hour of audio content can contain more than one thousand spoken words. The costs of transcription have sunk drastically due to modern AI technologies. If this content needs to be evaluated for a blog post or translated for subtitling purposes, human rather than artificial intelligence is required. This can cause costs to skyrocket, but it doesn’t have to. One thing that helps is
advance planning. Define the aims of your content project and only produce the content that is really needed. When it comes to video interviews, for example, time is money. Even if the work involved varies from case to case, we advise limiting expert interviews for preparing blog posts or white papers to 20 or 30 minutes. Not only your expert interviewee will be grateful if the interview is well prepared and the question catalogue is structured. When it comes to all further steps, from writing the text to – potentially – translation and analysis of the interview, every additional minute of interview costs time and money. For this reason,
it is best to obtain all these services from a single provider. Transcriptions, translations and subtitles are sometimes offered on the internet at dumping prices. Be careful, with low prices you run the risk of paying more in the end if you tender all services separately for the lowest bids. This is because an agency dedicated exclusively to transcription or translation usually does not have the editorial expertise to plan sophisticated text projects with integrated speech recognition. Another important factor is
having optimal recording conditions: A low-noise environment and a reliable and fast Internet connection are the be-all and end-all for video and audio recordings that are going to be transcribed. Together with clear enunciation, they support the AI with transcription and significantly reduce the human correction workload. Wherever possible, strong dialects and accents – which can also be cost drivers in the transcription of content – should also be avoided.

In summary

Modern speech recognition makes complex text projects possible at a comparatively low price, something that only a few years ago would have required much more work. Video conferencing completely eliminates travel costs for interviews, and transcription costs have also fallen significantly. The best way to take full advantage of these benefits is through comprehensive and holistic editorial support. From the planning stages to the execution of content projects.