Audio Media Accessibility: How To Improve Audio Media Transcriptions

Ayu Adiati - Nov 23 '22 - - Dev Community

Hello Friends ๐Ÿ‘‹,

In the last post, I shared with you why we must improve audio transcriptions and video subtitles.
Go ahead and read the previous post if you haven't ๐Ÿ˜Š.

In this article, I want to share how to improve transcriptions to make them accessible.

Background: How I get involved in improving Virtual Coffee podcast transcriptions

It all started with my contribution to Software Engineer Unlocked podcast. I then observed Virtual Coffee podcast's transcriptions. And I saw many typos and suspected missing words because some of them were out of context and difficult to understand.

So I reached out to one of the Virtual Coffee's maintainers, Dan. I asked if I could help improve the transcriptions. And he agreed.

Then I started asking around and searching for information on improving the transcriptions.

Types of transcriptions

First, we need to know that there are several types of transcriptions.

Verbatim transcriptions

Verbatim transcriptions mean that we transcribe the transcriptions word by word. We keep the repetition, stuttering, and filler words such as 'I mean', 'like', 'you know', 'um', etc.
We must transcribe everything we hear (yes, even cursing words!) and not leave anything out. It also includes non-verbal ones like laughter, cough, pauses, etc.

For example:

You do need Java -- you do- you do need to know- to know JavaScript, for example, to- to- to work in React, and things like that. But if you spend too much time on that, you'll never -- y-y-you'll- you'll get a little bit later to these things, and y-y-you might missed- missed the opportunity [silence] to get excited about it [chuckles].

Clean verbatim transcriptions

This type is also known as edited transcripts. Like the verbatim, we want to keep everything. But we can omit repetition, stuttering, and filler words. We may also leave out non-verbal communication to make the transcript easier to read.

For example:

You do need Java -- you do need to know JavaScript, for example, to work in React, and things like that. But if you spend too much time on that, you'll never -- you'll get a little bit later to these things, and you might missed the opportunity to get excited about it.

Intelligent transcriptions

This type allows us to light edit the transcripts, such as the grammar. It also allows us to omit unnecessary words, even the off-topics talk. The goal is to deliver the meaning of the speech more naturally so readers can get the purpose of the whole conversation.

For example:

You need to know JavaScript, for example, to work in React. But if you spend too much time on that, you'll get a little bit later to these things, and you might missed the opportunity to get excited about it.

How to improve audio media transcriptions

This section will go through how to improve transcriptions from scratch. And it's based on my experience with the Virtual Coffee podcast.

Deciding the type of transcriptions

After discussing with Dan, we decided to use the verbatim type because we wanted to capture the whole atmosphere of the talk. We keep everything except for the filler words 'uh' and 'um'.
Then Dan provided the repo for the podcast. Some podcasts use markdown for the transcriptions. But we use the .srt format for our podcast transcriptions at Virtual Coffee.

Improving a transcription

Now I began with improving one of the episodes. This was to test things out.
First, I read the transcription. Then I listened to the episode while fixing the typos and adding words and proper punctuation. After my pull request merged, I read the transcription on the website without listening to the audio.
It was somewhat challenging to read because of the repetition and the false starts. So I researched more on how to make the transcription easier to read and more understandable with proper punctuations.

Provide the guidelines

After several times editing the transcription, I was finally happy with the result. Based on my notes, I wrote the guidelines to improve the transcriptions. Having a guideline is essential to maintain consistency throughout the transcriptions.

Formatting rules

There might be additional rules, such as the format, within the guidelines.

I take the Software Engineer Unlocked podcast as an example. They want every line to have ~80 characters for a better pull request.
At Virtual Coffee, we want each section to have three or four lines. It should contain an index, timestamps, and one or two lines of text.

At the beginning of improving the transcriptions, I did everything manually. I checked and fixed the index, timestamps, and line breaks. Shoutout to Dan for providing a tool we can run with yarn check-srt to help our contributors. This tool will check for invalid timestamp formatting. It will also fix the line breaks and the order of the index. We only need to fix the list of incorrect timestamp formats and don't have to manually check them one by one.

Things we need to pay attention to when improving transcriptions

Read the guidelines

Every organization has rules to make their transcriptions consistent. And we must read and follow their guidelines. Say that we're contributing to an open-source, we must also read their CODE_OF_CONDUCT.md and CONTRIBUTING.md, if any. Ask the maintainers whenever you're in doubt or when you have questions.

Do research

Sometimes, we improve a transcription where the topic is not too familiar. Or there would be times when we need clarification about the capitalization of a word. We always want to do research to write them correctly and with proper capitalization. For example, iPhone, Ruby on Rails, Log4j, etc.

Do our best to recognize every word

Some speakers are not native English speakers. And they might have an accent that can't get picked up by the speech-to-text apps. Or when two speakers talk simultaneously, it would be hard for us to hear clearly what each speaker is saying.
But we must always try to best guess their words. If we still can't figure it out in any way, transcribe it as unintelligible or inaudible, depending on the guidelines.

Read the transcription without listening to the audio

When we finish improving the transcription, read the transcript one more time without listening to the audio. It's important to make sure that the transcription is readable and understandable. Because that what makes transcription is accessible for everyone.

Final words

Until today, it's still a challenge for us to make the web 100% accessible to everyone. But every slight accessibility improvement is a small step towards making the web more accessible for everyone.
And now you know how to improve audio media transcriptions, let's make the web more accessible together! ๐Ÿ˜€
Note: Check out the Virtual Coffee podcast and the Virtual Coffee podcast repository to get an insight into everything I'm talking about in this article ๐Ÿ˜Š.


Thank you for reading!
Last, you can find me on Twitter and Mastodon. Let's connect! ๐Ÿ˜Š

๐Ÿ“ธ Cover image by Leo Wieling on Unsplash

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player