Which Is the Best Automatic Captioning Tool for Video Calls?

One good thing came out of COVID-19.

It forced me to try video calls again.

The reason I don’t love video calls is that I miss too much. While I’m a strong lipreader, video calls have a few challenges.

Why I Shied Away from Video Calls

Often, the video jerks or jumps. It’s like ________ of a thought. Exactly. It’s like missing part of a thought.

Lipreading is hard when the sound doesn’t sync with lip movements. Another problem is that some videos look pixelated or blurry. That affects lipread-ability

These factors together create a frustrating experience. For me, listening is more important than speaking. Listening to video calls requires I work harder to hear than the average person. My brain is multitasking. It has to convert lip movement and sound into sentences. It has to fill in the blanks. It has to absorb what the person says. It has to make meaning of it. It’s a tall order in a short time.

You’ve probably seen the studies on multitasking. That most of us become less efficient at each task. That’s what happens to me on a video call.

The Turning Point

A client sent an invitation to a friendly lunch video conference. No pressure. Not mandatory. I decided to give it a try. At the very least, I could see their faces.

Just before the call, Ann Marie Beebout tells me she’s going to caption the call.

Wait. What?!

She researched it and found a way to do it with Google Slides.

It worked! It wasn’t perfect. I followed the conversation well enough. And even chimed in a couple of times.

Knowing when to speak up in a group video call is hard. I don’t want to interrupt anyone.

One-on-one video calls are best, of course. The more attendees there are, the harder it is to follow. Even with live subtitles from an automatic captioning tool.

Automatic Captioning Guidelines for Video Calls

None of the free automatic captioning tools is perfect. Still, a couple tend to work better than others.

Live transcription while on a video call is a wholly different experience than simply transcribing a call after the fact. The guidelines for automatically captioning a video call is also different from the captioning guidelines for videos.

Based on my experience as someone who depends on captions to hear, I’ve created these automatic captioning guidelines for video calls. These determine the effectiveness of the tool.

1. Readability

Readability is a big factor. This refers to the ability to read the captions, not the actual content of the captions. Readability has three components: size, format (color), and scrolling. This factor quickly knocked some apps out of contention. Nonetheless, I tried the apps because I wanted to give them a fair chance.

Size

A larger text size works better. Where possible, I placed the captions below the video, which brings them closer to the lips. (Cover in the next item: Caption Placement.) I constantly go back and forth between the two.

Format (Color)

You know how most captions contain a black-ish background with white-ish text? Yup, this combination also works well for live video calls. I’ve done many experiments using feedback from people including those with color blindness, dyslexia, and ADHD. I use the following in my captions.

  • Background: #242424 (slightly off-black)
  • Text: #FFFFFD (slightly off-white)

Scrolling

Another factor is the scrolling of the text. Does it do it on its own? Or do I have to play with it to keep it in place? Does it keep disappearing or jumping?

2. Caption Placement

I asked people if they prefer captions on the top or bottom of videos. At 98 percent, almost everyone picked the bottom. One of the biggest reasons is because it puts the captions closest to the lips. This is especially true in video conferencing.

To follow a conversation, I depend on reading lips, my cochlear implant, and the captions. It’s all happening in real-time. Conversations move fast. Remember that part at the beginning about my brain multitasking to listen (in my own way)? That’s why the placement of the captions matter. See the section on eye contact for more information.

Perhaps, a compromise is to give users a choice on caption placement. The key is that it needs to be part of the app. In other words, when I select the video app, the captions will be right there. I won’t have to work to bring both the captions and the video app back into view.

Some people suggested using a phone for the captions. This doesn’t work well because of caption placement. It takes my eyes off the video. As such, I miss too much information trying to read the captions on the phone during the call. Remember that automatic caption accuracy is far from perfect.

3. Accuracy

The same goes for accuracy. Every call is different in sound quality, volume, and content. In testing the tools, I couldn’t fairly say whether one is more or less accurate than the others. That’s why I tested all the apps against a single podcast. This provides an apples-to-apples comparison in terms of content.

4. Synchronized

Timing, of course, matters. You want the text to keep up with the audio. With live captions, you can count on some lag. Some apps have less lag than others. And some have more.

5. Caption Flow and Movement

Live captions tend to scroll along where you read them from right to left and top to bottom. Typically, like any other captions, you see about one or two lines at a time. This helps with tracking. Anything longer makes it easy to lose your place as your eyes dart between the video and the captions.

In some live captions, the captions jerk. They don’t flow smoothly and move unpredictably. The captions don’t keep up with the audio. There should be no more than a few milliseconds of delay between the audio and its associated captions. For more on this and examples, see Flow in the Caption Guide.

6. Transcript

Can you save the transcript after the call? For important meetings, I’ll want to review the transcript. Remember, my brain is multitasking during these calls to follow the conversation. It doesn’t remember everything.

Important Note About Microphones

Listening refers to the app’s ability to listen. It can’t caption without the ability to hear the conversation. I want to use my desktop for video calls. But it doesn’t have a built-in microphone like my laptop does. Every app except Google Meet need this microphone to work. The apps cannot “hear” through the desktop’s internal speaker. They need a microphone.

If I had a webcam on my desktop, it probably will work just like the laptop. What I do is turn my phone into a webcam with an app. The app doesn’t use the phone’s mic. But there are apps that do. The app didn’t work, which is why I couldn’t use them.

So, these are the guidelines I follow in analyzing the quality of each tool.

Eye Contact on Video Calls

Another person who is deaf like me expressed concern about caption placement on video calls. He believes it’s harmful because it takes away from eye contact. This is not an issue limited to people who read captions. I’ve been on many calls since writing this and eye contact is rarely direct. People cannot always set up their webcams to get the kind of eye contact we get in person.

An informal poll reveals that many people agree eye contact isn’t important on a video call. It’s all about the content.

In fact, I was in a meeting with captions. If I turn off the captions, people will still see me looking down instead of at the camera. The video conference window often required looking at the bottom. There isn’t a way to change this.

In person, I struggle to talk to people who don’t make eye contact. I cannot explain why this happens when I’m doing the talking. A deaf friend expressed the same thing. Obviously, when I’m listening, eye contact improves lip-readability.

The challenge with some of these tools is that you’re managing two apps: the video call app (i.e. Zoom) and the captioning tool. If you switch to another app or share your screen, it can mess up everything. And you’ll have to fiddle with the apps to get them all back.

That’s why captions built into the app is the ideal solution. This way, you’re only dealing with one app instead of two. Only two apps have this capability: Skype and Google Meet. Microsoft Teams will have this feature, but it’s not widely available yet. Users appreciate it when they can resize the captions and select where they want to position the captions.

Portable Captions Hack

A couple of automatic speech recognition (ASR) apps would make good portable captions. You may find yourself in a situation where you need to use the ASR app on your phone for captions. Or you want to improve eye contact on a video call. Here’s a hack from Simon Lau.

Take a large binder clip as shown in the following image. Use the clip to place the phone on the top or bottom of the monitor where you’d want the captions.

Binder clip holds phone on top of monitor.
Use a binder clip to put a phone providing the captions on the monitor.

Important Note About Microphones and Headsets

Microphones and headsets complicate things with automatically captioning video calls. For most automatic captioning tools, I use my laptop with its built-in camera and microphone. It’s the only way the apps could “hear” voices in videoconferencing.

My desktop computer has a speaker, but no mic. when I plug in a mic, the sound disappears. And when there’s no sound coming through the speakers, there’s nothing to caption. Catch-22. After fiddling with the sound settings, I finally got the computer to allow the mic to work and play sound over the speakers. It worked for most of the tools and apps except Skype. It just stood there silent.

Headsets with a mic won’t work with tools that need to hear the audio. The apps can’t hear anything. Thus, they can’t provide captions.

This is where video calls with built-in automatic captioning have an advantage. These include Google Meet, Skype, and Microsoft Teams.

The Best Captioning Tools for Video Calls

Of course, the best way to caption video calls is with a human typing the captions. The reality is that it’s not feasible for many calls. But if accuracy is important, this is your best bet. It’s ideal for large companies, conferences, seminars, and classes.

How to caption video calls with a human depends on the software you’re using for the call. For example, Zoom explains how to add captions to its calls.

For the majority of calls, automatic captions is the next best thing. If I’m in a small meeting or a one-on-one call, it’s not feasible to bring in a human to caption it. But humans provide the highest accuracy rate. Yay, humans!

One thing to note. Most of these apps are not designed for captioning video calls. A negative review doesn’t mean the tool isn’t good. It means the tool’s focus is elsewhere. And I focus on free options not paid or free trial options.

Here are the tools covered in this article:

All righty. Let’s dig in!

Skype with Automatic Captioning

Thank goodness I didn’t stop video conferencing after my second experience with live automatic captions.

It was bad. Very bad.

I couldn’t believe it. Microsoft’s Skype. Really?

It surprised me because Microsoft is an ardent supporter of accessibility. They have a Chief Accessibility Officer named Jenny Lay-Flurrie. And get this. She happens to be deaf like me.

The Skype caption formatting is good except the text size is too small.

But the real problem is the live captioning. If one person says a few sentences, Skype displays one or two words at a time quickly. No one reads that fast.

A moment or two — much too long compared to other apps — after the person finishes speaking, the whole paragraph shows up as one big block of text. A Microsoft employee reports this is an issue that recently regressed.

Skype interface with automatic captions
Automatically captioning a podcast in Skype.

You can work around it by switching to the transcript view. I try that and the flow of the text is the same as the standard view.

The employee also shares a workaround for the text size. Open Skype in a web browser and use the browser’s zoom tools to enlarge the captions. And this is what happens.

Skype interface with zoomed in transcript that caused the transcript to overlap the buttons.
Enlarging the captions in Skype’s transcript view.

You can’t control the box placement and it covers up other tools.

And finally, Skype’s automatic captions live up to their nickname of autocraptions. While on a call with my family, my husband and I were laughing hard and taking screenshots of the captions. I sent it to my family during the call. We saw them crack up in seeing the humorous autocraptions.

Google Meet with Automatic Captioning

Google Meet with captions for video calls is now available free!

I discovered it by accident while adding a meeting in Google Calendar.

To create a Google Meet videoconferencing meeting, Open Google Calendar. Select “Add Google Meet video conferencing” and set up your meeting.

Adding Google Calendar entry to Schedule Google Meet.
Schedule Google Meet meeting in Google Calendar.

As soon as you enter Google Meet, you’ll see a CC button to add captions.

And the GAME-CHANGER is that it tells you who is talking. It’ll be interesting to see how well (or not) this works for a larger group.

Another thing I like is that the captions appear on the bottom and they’re readable. I don’t have to open a second app for the captions and then rearrange the video and captioning apps to line them up. When I do something on the screen that takes the captions and video out of view, I have to fix both to get back to the meeting with captions. That’s not an issue with Google Meet.

However, Google Meet does not save the conversation. This is its biggest weakness. Updated on August 26, 2020: Chrome extension Tactiq will save Google Meet transcripts! Thanks to Susan Edens for the heads up. Check out the video to see how it works.

How to use Tactiq to save Google Meet transcripts

Updated on May 13, 2020: I received the following email from Google saying that my feature request for transcripts is in the correct hands to make it potentially happen in the future!

Email from Google confirming feature request to include transcript is submitted and in the correct hands to make it potentially happen.
Email confirming transcript feature request in the right hands to potentially make it happen.

I tested Google Meet captions on the same podcast that I used to test the different automatic captioning tools for video calls. This way you have an apples-to-apples comparison. The accuracy was about the same as the rest of them — far from perfect.

However, I used the free Google Meet in a personal call. It did a fantastic job captioning both of us. Yes, my deaf accent!

Google Meet works on my desktop when I use a headset with a microphone. The captions always work whether you use a mic, headset, or neither. All the other apps — except Skype and Microsoft Teams — require using a device with its own mic. This could be a laptop with a built-in webcam or any device with a webcam.

Artificial intelligence is improving. I hope it’ll eventually understand my accent that hails from nowhere. I look forward to the day it gets my name right!

Several studies say eye contact affects trust. I watched the captions while creating this video (at the end of this article). You can see I’m looking down a little and reading.

Google Live Transcribe’s team has posted a detailed guide for Live Transcribe Best Practices to make the most out of it. Some of their tips will work with other apps.

Google Slides: Automatic Captioning in Presentation Mode

Next victim … err … tool is Google Slides automatic captioning.

Google Slides allows presenters to add subtitles to their presentation. Using a black background with white text, it’s one of the most readable captions. And the text size is Goldilocks approved: just right.

Scrolling causes no problems. Google Slides’ captions are about as synchronized as it can get for live captioning. As for accuracy, it has one of the better rates out of all the apps reviewed.

The downside is that you can’t download a transcript. And that it requires managing two apps during the call. If you switch apps or share your screen, you have to fiddle with getting them all back together again.

Here are the steps to turn on Google Slides captions:

  1. Make sure your microphone is on. (It won’t give you the CC option without it.)
  2. Open a blank Google Slide presentation.
  3. Select “Present” to go into presentation mode.
  4. Select “CC” (you can format it here)

I put the Zoom screen above the captions. When I used this for a group meeting, I followed the conversation pretty well.

The following image is a snapshot of the podcast. During a call, the video shows up instead of the white box.

Blank screen with two lines of captions
Google Slides automatically caption a podcast.

Microsoft PowerPoint Presentation Translator Add-In

Update August 26, 2020: Microsoft retired this add-in. Now you can only get it using Office 365. Disappointing. Tweet @PowerPoint to ask them to bring it back and explain why you can’t use Office 365.

PowerPoint for the web and Office 365 subscriptions come with subtitles. All other editions will need to download Presentation Translator, a PowerPoint add-in.

It’s like Google Slides. Both automatically caption a live presentation. Both let you post the captions at the bottom or the top. And both have captions with a black background and white text.

Both involve managing two apps during the call and fiddling with them if you switch apps or share your screen.

And that’s where the similarities end.

PowerPoint has more features than Google Slides. You can add a slide with a QR code for others to scan to view the live captions on their devices! So. Cool.

The add-in can save the full transcript. You can choose the language spoken and the language for the subtitles.

Here are the steps to use Presentation Translator:

  1. Open a blank PowerPoint presentation. (I created one called Captions.pptx to use every time.)
  2. Select the “Slide Show” tab.
  3. Select “Start Subtitles.”

The following pop-up box appears:

PowerPoint add-in pop up box with language options
PowerPoint Presentation Translator add-in options

Select “Additional Settings” to see the following options:

PowerPoint Presentation Translator add-in settings box
PowerPoint Presentation Translator add-in settings box

The “Add instructional slide” contains the QR Code that lets attendees see the captions on their own devices.

The only negative is the URL that shows up in the captions. At times, it’d cover up some of the captions. I’ve found a solution.

After starting the subtitles in PowerPoint, press “Esc” to give the captions its own box. You can adjust the box’s width, length, and placement. I put it right below the video app.

You can view more than two lines of captions at a time. Adjust the width to your liking. Mine is somewhere in the middle — neither too long or too short. Longer makes it harder to track while reading the captions.

Beware that it automatically mutes the captions after you press Esc. Just un-mute it and ta-da! The URL stops blocking the captions. Here’s what the transcript looks like.

PowerPoint Presentation Translator add-in caption transcript with timings
PowerPoint Presentation Translator add-in caption transcript.

Microsoft Teams for iOS and Android

I could only test on the Microsoft Teams iOS app because it’s the only free version that comes with captions as explained in The Verge. I don’t have an Android device, but I suspect it’s similar to the iOS. The captions are built-in like Google Meet. You won’t have to fiddle with two separate applications to caption a video call.

This scenario requires opening a second app for the captions and rearranging the video call and captioning app to line them up. As soon as I do something on the screen that sends the captions and video behind other windows, I have to fix both to get back to the meeting with captions. That’s not an issue with Microsoft Teams.

To turn on the captions, join a meeting in the app. Select the ellipsis … (three dots) and “Turn on live captions.”

The captions appear on the bottom with a mild transparent black background with white text as shown in the next image. It’d be great if Microsoft would allow users to adjust the captions and transparency. Sometimes the movement behind the transparent captions can be distracting.

Caption style on Microsoft Teams
Caption style on Microsoft Teams

The captions on the iPhone are small. If you have an iPad or Android tablet, try those. They will appear larger. I hope that Microsoft will make a web-based or desktop version available free like with the iOS apps.

Microsoft Team’s caption accuracy is decent. The captions don’t contain much punctuation and they incorrectly captured a few words. Unlike Google Meet, it does not tell you who is talking. Moreover, Microsoft Teams does not save the conversation.

Otter.ai Voice Meeting Notes

Otter.ai has a few options for using its transcription tool as a captioning tool. The end of the article shows videos of all three Otter.ai methods.

Otter.ai for iOS (Free)

The best free option is with the Otter.ai app for iOS. Sometimes we need a portable captioning option. This is one of the better options for portability.

1. Select Record button.
2. Tap the Maximize in the upper-right corner. This turns the screen black with the text in white.
3. Change the size of the text by tapping the Size icon next to the Minimize icon.

It did a good job of captioning the podcast. The scrolling is decent. It could use some paragraph spacing to break up large blocks of test for improved readability.

Otter.ai Desktop Premium

A similar option is available on the desktop for premium versions of Otter.ai. Here are the steps to use this option:

  1. Open Otter.ai in a web browser.
  2. Select Record.
  3. Select Present button at the upper right corner.
  4. Resize the window.
  5. Select one of the icons at the upper right to adjust the font size.

And this method works exactly like the iOS version. I could move the Otter transcription below the podcast screen (which is where a video call would be). It worked great. So much better than the following approach. However, this option is only available on the premium version.

Apple podcast episode appears on top of captions
Otter.ai Desktop Premium works as captions below the “video call” window.

Otter.ai Desktop Free

If you don’t have a paid Otter account, Otter requires a different set-up than the others. That’s because the app is inside the browser.

Yes, Google Slides is also in a web browser. But it puts the captions at the bottom, so I just have to move the Zoom app in front of it. Then, I resized the Zoom screen to fit right on top.

The free desktop version of Otter requires a different approach. To set it up, I opened Zoom in its own window. Then, I loaded Otter in a web browser. I put the browser side-by-side with Zoom like in the following image.

Video on the left and Otter.ai transcription on the right.
Lining up the video and Otter.ai transcript side-by-side.

Yes, I tried resizing the web browser to put Otter below Zoom. But it didn’t work well. Parts of the text would appear in light grey, which is hard to read. Otter is designed to act like a notetaker rather than a captioner. Like PowerPoint and Google Slides, it involves managing multiple apps. And it gets crazy when you switch apps or share your screen.

Here are the challenges with this method:

  • Words didn’t flow smoothly.
  • Unpredictable content movement.
  • Text is small for captioning and hard to track.
  • Couldn’t put it below the video, which made reading difficult.

Remember, I’m looking at the person for lipreading. So, my eyes dart back and forth. Reading Otter’s captions is like trying to find something in an email with no paragraph breaks.

During the call, I barely looked at Otter’s transcription because it required a lot of effort. With smaller text in blocks of paragraphs, I couldn’t follow it.

How is the accuracy? I don’t know because I couldn’t follow the transcription. However, I had Otter transcribe the podcast. the accuracy is about the same as Google and PowerPoint as the next image shows.

Updated on September 2, 2020: Mark Rejhon has created a Chrome extension called Closed Caption Mode for Otter. What it does is add a “Large Text Mode” + “Dark Theme” to make the Otter text larger. It works with free and personal Otter accounts.

Otter.ai transcript of podcast transcript.
Otter.ai transcript of podcast transcript.

Descript for Creating Transcripts

To use Descript, first download the app. Run the app and hit record whenever you’re ready to transcribe a conversation.

Descript requires a different set-up than the others. That’s because it’s a desktop app. The text is more suited for notetaking and transcriptions than for captioning. The text is smaller and uses a mix of transparent grey and violet for the font color.

Putting the app’s window below the video doesn’t work well. A better option is to put the video and Descript window side-by-side. However, reading lips while glancing at the Descript text proves tiring.

When the text reaches the bottom of the screen, it does not automatically scroll down. I had to continuously use the mouse to scroll down to the next lines. This takes my focus away from the conversation. Remember, I’m relying on lipreading and reading the text all while comprehending the content. It’s quadruple-tasking to add scrolling.

Descript has one advantage over some transcription services: it breaks up the text into multiple paragraphs. Transcription apps tend to create one long, unreadable paragraph. It’s too easy to lose your place. The accuracy is about the same as the other apps.

Descript screen shot of podcast transcript.
Descript screen shot of podcast transcript.

Web Captioner: Captioning in the Browser

Update: This app is no longer available.

Web Captioner is a web-based captioning software.

Out of all the tools, it has one major drawback. It only works with Chrome.

Web Captioner’s formatting is great. You can change the font, color, text size, and other traits.

The speed in keeping up with the live conversation is good.

Accuracy-wise. Ehhhh … I think it may be less accurate than Google Slides and the PowerPoint add-in.

One of its automatic caption errors had me LOL. I can’t share it with you as it’s not G-rated. Let’s just say the autocraptions easily beat another app’s “jock itch.”

While I was laughing, my friend on the video call looked puzzled. I could barely get a word in and said, “Captions.”

Web Captioner wrote, “Cat urine.” LOL again.

Like Otter, using Web Captioner means contending with two apps or screens. It can make it tricky to manage especially if you share your screen or switch apps.

Web Captioner has the following unique features:

  • Saves transcripts as a text file.
  • Saves transcripts to Dropbox automatically.
  • Accepts word replacements. For example: Replace “Merrill” and “Marilyn” with “Meryl”.
  • Supports multiple languages.
  • Offers the option to censor profane language.

The transcript appears in the browser tab. To put it below the Zoom video, I had to resize the entire Chrome browser. Even after resizing it, the text would fly off the screen. I had to keep an eye on the scrolling to ensure the captions stayed visible.

If you create an account, Web Captioner will save your settings.

And here’s the transcript of the podcast à la Web Captioner.

Web Captioner transcript of podcast
Web Captioner transcript of podcast

InnoCaption

Another app / website to try is InnoCaption. The catch is that it requires a call-in number. You can view the captions on a laptop/desktop with DeskView or through the app on your phone. I tested this and it caused a lot of problems. First, there was a beeping noise. Not sure where it was coming from, but the source was definitely InnoCaptions because it stopped when I muted my phone.

Second, InnoCaption added an unrecognized phone number to the video call. I didn’t realize it was the service until after we started the meeting.

Third, the captions stopped and I kept getting the message that “You’ve been put on hold by the host.” We could all hear each other, but InnoCaption could not. I’m guessing the host muted the unrecognized phone number. If I try this again, it’ll be with a friend because these issues were unacceptable during a meeting.

InnoCaption uses humans and has great captioning when I make phone calls. But the dial-in number requirement is a big issue. It does save the transcript from the call and you can email the transcript. However, it’s one big blob of text as there weren’t paragraph breaks to improve readability.

Ava Closed Captioning

Ava did not make the cut when I wrote this post. Now, she’s been revamped and worth a second look. The app is now available for web browsers, macOS, Microsoft Windows, iOS, and Android. The company wants deaf and hard of hearing people to have equal access. So, it’s free for up to 40 minutes. That’s a fair deal for the most part. Unfortunately, many meetings including nonprofit ones can run 45 to 60 minutes.

Anyway, the video shows the web-based version, which is easy to customize and figure out. You can effortlessly switch between light and dark mode. However, the captions on the far right hid behind the right-side icons. I couldn’t find a way to fix this.

I also took a look at the Ava app for Windows. And it’s fantastic because it turns into a caption box that you can easily place over a video call screen. She also identifies who is speaking. And best of all, the box always stays in front as shown in the next image. Yes, even if you select the Zoom or other video app window.

Video call screen with a headshot photo of Meryl. Ava captions appear at the bottom of the video call.
Ava’s Window app keeps the captions in the forefront

Ava is available in multiple languages and can save transcripts. Ava’s accuracy is about a medium. Not the worst. Not the best. It had misspelled words that other apps didn’t misspell. While it has some punctuation, it’s still missing some and it didn’t capitalize the first word after the end of sentences.

Microsoft Group Transcribe

Available for iOS devices, Microsoft Group Transcribe gives viewers a way to take notes or join a meeting where each speaker has to have the app to be identified as a speaker. Any tool that requires people have the app is awkward. But it’s useful for mobile captioning or when you’re on a laptop and can’t overlay captions on a single screen.

It’s very simple to use. Open the app and “Start.” You can’t resize the font within the app. It uses whatever system setting you have for font size. The app has a black background and white text, so it’s readable. The accuracy was one of the better ones out of the apps. And it saves all transcripts.

The scrolling pace is good for the most part. However, whatever current paragraph it’s on looks greyed out. It lacks contrast. Once it has finished the paragraph, it shows up clearly. But it needs to be clear while live even if it has to make corrections otherwise it’s hard to see.

Other Apps for Automatically Captioning Video Calls

You may be wondering why I didn’t include [fill in the blank] tool.

First, several people sent me the link to a knowledge base that lists live captioning and automated captioning tools. What do you know? I’ve tried all of the possible ones on the list and then some.

Live Transcribe for Androids

Now as to why some aren’t included. Live Transcribe is for Androids only. I don’t have an Android device. However, a dear friend of mine who is deaf speaks highly of the app. Here’s a comprehensive guide for using Live Transcribe.

Microsoft Translator

Microsoft Translator is a non-starter. It can’t handle constant talking. It’s made for conversations. I tried using it to transcribe a podcast and it’d stop after about a paragraph. It’s tedious to keep pressing the microphone button to run. And it still doesn’t always go.

Thanks to Tilak for this Microsoft Translator tip. He advised going into conversation mode. Select the icon with two people chatting and tap “Start.” Type your name, select your language, and tap “Enter.” Tap the mic icon to talk. This works better than the previous method. But it ran into a lot of bumps. Sometimes it worked and sometimes it didn’t.

Microsoft Teams

Microsoft Teams automatic captioning works. Microsoft has added noise suppression in Teams, which may help captions. But I find it difficult to use because it requires the app. When you’re not on a Teams network, it’s hard to schedule a meeting and have captions. (It requires the app. The browser does not have captions.)

Here’s how complicated it can get. I attended a meeting with a company as a guest. Before that meeting, one of the attendees and I did a test to see if captions would work. It did not work from the Teams app on my desktop.

However, it worked with the Teams app on my iPhone. But that wasn’t feasible. I contacted Microsoft Disability Answer Desk online chat and they provided an answer:

The Teams Administrator or the one organizing the meeting has the option to disable and enable it. The one setting the policies for your meetings, and that includes the captioning, is the Teams Administrator. Here’s the Microsoft Reference on managing meeting policies.

Under Enable live captions, there are two values available. One is the Disabled but the user can override, and the other is Disabled. Other links that explain the policies: Office365 IT Pros and Microsoft Tech Community.

Zoom, GoToMeeting, Webex, and Blue Jeans

Zoom free accounts allow you to assign a participant to type the captions or use a third party service (not free). Here’s how to start closed captioning in Zoom. It still does not have free built-in automatic captions like Google Meet.

To have built-in automatic captions requires using a paid service. Otherwise, you can use one of the free tools with Zoom, just not inside the Zoom browser.

Thanks to the efforts of Shari Eberts, Zoom is making ASR captions (automated) available free. However, if the caption viewer with the free basic account isn’t the host, the viewer can’t get captions unless the host offers it. Again, we’re dependent on others for captions inside Zoom or resorting to one of the tools in this article.

As for GoToMeeting and Webex, the only way to caption is with a third-party live captioner. Blue Jeans will have automatic captioning when the host turns it on. Alas, it’s not a free service.

A Comparison of Automatic Caption Tools

As a comparison, I tested all the apps on a podcast with my daughter. This way the content is the same across the board.

The following table has all the scores. The rating is on a scale of 1 to 5 with 1 being poor and 5 is excellent.

ProductReadabilitySizeFormatScrollingSyncAccuracyTranscript
Skype223113Yes
Google Meet455444No
Google Slides455443.5No
PowerPoint Presentation Translator555444Yes
Microsoft Teams435344No
Otter.ai Desktop Free221133.5Yes
Otter.ai iOS454344Yes
Descript222134Yes
Web Captioner553.5243.5Yes
Microsoft Group Transcribe333.53.544Yes
Ava555443Yes
Comparison of tools used to automatically caption video calls

Here is a description of each factor.

  • Readability: Ability to read the captions. This relies on a mix of size, font choice, and colors.
  • Size: The size of the captions.
  • Format: How the captions look including font type, font color, and background color.
  • Scrolling: How well the captions scroll through to the next lines.
  • Synchronized: How well the captions keep up with the audio.
  • Accuracy: How accurate the captions are and punctuation, if any.
  • Transcript: If you can go back and review the text of the conversation.

Skype Automatic Captioning Video

Video of Skype automatically captioning a podcast

Google Meet Automatic Captioning Video

Video of Google Meet automatically captioning a podcast

Google Slides Automatic Captioning Video

Video of Google Slides automatically captioning a podcast

PowerPoint Presentation Translator Add-in Automatic Captioning Video

Video of PowerPoint Add-in automatically captioning a podcast

Microsoft Teams iOS App

Video of Microsoft Teams iOS automatically captioning a podcast

Otter.ai iOS Free

Video of Otter iOS automatically captioning a podcast

Otter.ai Desktop Premium

Video of Otter Desktop Premium automatically captioning a podcast

Otter.ai Desktop Free

Video of Otter Desktop Free automatically captioning a podcast

Descript Automatic Captioning Video

Video of Descript automatically captioning a podcast

Web Captioner Automatic Captioning Video

Video of Web Captioner automatically captioning a podcast

Ava Closed Captioning

Video of Ava web browser automatically captioning a podcast

Microsoft Group Transcribe

Video of Microsoft Group Transcribe automatically captioning a podcast

When it’s up to me, I use Google Meet because the captions are built-in. They don’t require a second tool. And now I can save the transcripts with Tactiq Chrome extension.

Facebook added automatic captions to live video and audio. So far, it’s a disaster. The pacing is off. The font is small. The captions often end up in one corner. It’s wildly inaccurate.

Thanks to Ann Marie and these tools, I’m enjoying video calls instead of trying to duck ‘n’ dodge ’em.

What’s your tip for video calls?

Resources

Accessible Virtual Meetings presented at the U.S. Access Board Meeting covers different platforms. It delves into the pros and cons of each. The presentation looks at screen reader accessibility and other options besides captions.

Captions in video calls: better accessibility, but harmful side effects: Quinn Keast is concerned about the effects of captions on video calls. He references a study that shows when you don’t make eye contact, it affects trust. I share my experience in this post. I’ve had video calls with people who aren’t using captions and they’re not making eye contact. It’s not their fault. I’m interested in the content of the call and their lack of eye contact doesn’t affect trust.

Online Meetings and Google Speech to Text Technology: Hamish Drewry shares his experience with video calls. I love that he points out that what works for him doesn’t necessarily work for another person who is deaf. Absolutely. People who are deaf and hard of hearing as just as diverse as the world.

Originally posted April 22, 2020

Updated April 26, 2021: Added Microsoft Group Transcribe.

Updated February 25, 2021: Added news about Zoom offering ASR free.

Updated January 22, 2021: Added Accessible Virtual Meetings presentation.

Updated December 16, 2020: Added more notes about Zoom and Microsoft Teams. Added Ava as a new entry.

Want More Content Like This?

Did you like this content? Would you like to know when the next post comes out? Sign up to receive piping hot content you can use.

[no_toc]

30 thoughts on “Which Is the Best Automatic Captioning Tool for Video Calls?”

  1. Thanks for creating this. It is so important at this time in the world’s huge reliance upon such tech in response to COVID-19. What I played around with using Google Slides was instead of that start presentation and CC button, I opened a blank/new presentation and then hit Tools>>Voice Type Speaker Notes. Then turned on the mic that appears on the screen and it starts capturing everything. All that text can then be copied/pasted at the end. One problem I encountered while testing this while a Zoom session was running was that the focus needed to be upon the google slide tab in my browser. As soon as I clicked mouse focus on the Zoom window, the recording of text stopped.

  2. Thanks so much for writing a detailed Cons & Pros for popular video-conference tools.
    I will go and try each of these and do a self-usability study myself, and run them by with someone.
    Only thing is CC would work if everyone speaks in English. I am from Pakistan and our native language is Urdu. Being deaf by birth and hard of hearing technically, I have to rely on a combination of lip-reading and body gestures to orient towards spoken conversations. Lately I have started using Live Transcribe app after getting an Android phone and finally felt like I was being included among the billions of podcasts and live talks on fb. I will try and think of creative ways to use video-calls and stop being shy. Thank you again!

    • Thanks for sharing your experience, Khaula. I’m experiencing the same things you do — minus the language option. I just checked the PowerPoint add-in and it has Urdu! But the question is … can you get the add-in? Or do you use a premium version of Office 365?

  3. Ms Translator as a separate app – works better than within Powerpoint and gives you control in meetings online. I set up my phone (join as presenter – so free flowing captioning) and join meetings online on laptop. Have small stand that can sit in laptop without covering screen…

  4. I can’t thank you enough for documenting all of your experiences with such detail, and with video examples, too! This will help me so much to use live captioning to teach ESL listening comprehension.

    • Thank you, Skip! Wow — this shows how technology can be used in many ways. Love that you shared another benefit — those learning another language.

  5. I have been using the AVA inside the zoom window…AVA “joins” the conversation as a participant. Caption size is adjustable to your liking. There is a setting to hide or show the captions (subtitles). You can save the transcript within zoom…but I prefer to copy/paste it from the AVA website into a word doc, after the meeting is over, as it looks much better. Using the zoom “speaker” view is better so captions are below the speaker. Just another consideration for electronic transcription.(Of course if you pay more, you get better accuracy with a person involved.)
    There is a free demo if you go here to sign up on this calendar:
    https://calendly.com/pieter-1/30min?month=2020-05

    • Thanks for sharing your experience, Carolyn. I’ve had AVA for awhile and it’s not very useful as it freaks out and has more mistakes than the other apps.

  6. Have you looked at WebEx or GoToMeeting? Do you know if they are similar to Zoom in that they need a third-party integration to provide live captioning?

    • Yes. It’s in the “Other Automatic Captioning Apps” section: “As for GoToMeeting and Webex, the only way to caption is with a third-party live captioner.”

  7. Hi Meryl,

    I came across this and it is the most helpful blog I’ve seen on this topic! Thank you!!! I’m trying to support small businesses who don’t often consider the needs of their audience on video calls, and this is great insight for me to share.

  8. Hi Meryl,
    Thanks for this review! I’ve sent it to our team. I particularly agree with your comments about lip-reading. From our experience with people who have a severe or profound hearing impairment, the ability to read lips and facial expressions adds considerably to a conversation: You can see those little signs of acknowledgement and non-verbal cues (nods, eyebrow raises, head tilts, smiles and frowns) as they occur.

    Full disclosure: Our company Konnekt develops and sells a captioning videophone appliance for those who need a MUCH easier user interface. Although we have users in their 20s to 50s, most of our users are 70 to 95, with the oldest being 103 years old.

    Making eye contact (“camera contact”) during video calls is a learned behaviour. I’ve heard of software that corrects your eye gaze to make it look as if you’re watching the other person, but I haven’t yet seen it available as a “video filter” for PCs or tablets/phones. Have you?

    • John, thank you for the comment and disclosure. I have not seen a filter like that. Camera placement cannot always be in the ideal location. Some use laptops and I learned that some laptops put it on the keyboard instead of up top. Ridiculous. I have a webcam in the middle between my two monitors up top. But I cannot look at it because I need to look at the screen to see people’s faces, my notes, etc. People are forgiving when it comes to camera contact.

  9. Unfortunately, i cant’ use webcaptionier because new version of google chrome is not supported in the moment. So it will be fixed later. But I can not wait until it is fixed

    Now, i finally switch to https://translator.microsoft.com
    It’s not the fastest, but very easy to read and in my opinion has the best result for german language.

    Thx for the article

  10. What captioner is best if i just want to video-chat with my gf on Facebook Messenger? GF has MS which makes lots of work for her mom, so sometimes her phone dies and stays dead; otherwise audio calls woud be best for me. She’s mostly lying down, and my 72yo fingers have turned on me lately, so typing doesnt last long for either of us.

Comments are closed.