Video call with automatic captioning by Google Slides

Which Is the Best Automatic Captioning Tool for Video Calls?

One good thing came out of COVID-19.

It forced me to try video calls again.

The reason I don’t love video calls is that I miss too much. While I’m a strong lipreader, video calls have a few challenges.

Why I Shied Away from Video Calls

Often, the video jerks or jumps. It’s like ________ of a thought. Exactly. It’s like missing part of a thought.

Lipreading is hard when the sound doesn’t sync with lip movements. Another problem is that some videos look pixelated or blurry. That affects lipread-ability

These factors together create a frustrating experience. For me, listening is more important than speaking. Listening to video calls requires I work harder to hear than the average person. My brain is multitasking. It has to convert lip movement and sound into sentences. It has to fill in the blanks. It has to absorb what the person says. It has to make meaning of it. It’s a tall order in a short time.

You’ve probably seen the studies on multitasking. That most of us become less efficient at each task. That’s what happens to me on a video call.

The Turning Point

A client sent an invitation to a friendly lunch video conference. No pressure. Not mandatory. I decided to give it a try. At the very least, I could see their faces.

Just before the call, Ann Marie Beebout tells me she’s going to caption the call.

Wait. What?!

She researched it and found a way to do it with Google Slides.

It worked! It wasn’t perfect. I followed the conversation well enough. And even chimed in a couple of times.

Knowing when to speak up in a group video call is hard. I don’t want to interrupt anyone.

One-on-one video calls are best, of course. The more attendees there are, the harder it is to follow. Even with live subtitles from an automatic captioning tool.

Automatic Captioning Guidelines for Video Calls

None of the automatic captioning tools are perfect. Still, a couple tend to work better than others.

Live transcription while on a video call is a wholly different experience than simply transcribing a call after the fact. The guidelines for automatically captioning a video call is also different from the captioning guidelines for videos.

Based on my experience as someone who depends on captions to hear, I’ve created these automatic captioning guidelines for video calls. These determine the effectiveness of the tool.

1. Readability

Readability is a big factor. This refers to the ability to read the captions, not the actual content of the captions. Readability has three components: size, format (color), and scrolling. This factor quickly knocked some apps out of contention. Nonetheless, I tried the apps because I wanted to give them a fair chance.

Size

A larger text size works better. Where possible, I placed the captions below the video, which brings them closer to the lips. (Cover in the next item: Caption Placement.) I constantly go back and forth between the two.

Format (Color)

You know how most captions contain a black-ish background with white-ish text? Yup, this combination also works well for live video calls. I’ve done many experiments using feedback from people including those with color blindness, dyslexia, and ADHD. I use the following in my captions.

  • Background: #242424 (slightly off-black)
  • Text: #FFFFFD (slightly off-white)

Scrolling

Another factor is the scrolling of the text. Does it do it on its own? Or do I have to play with it to keep it in place? Does it keep disappearing or jumping?

2. Caption Placement

I asked people if they prefer captions on the top or bottom of videos. At 98 percent, almost everyone picked the bottom. One of the biggest reasons is because it puts the captions closest to the lips. This is especially true in video conferencing.

To follow a conversation, I depend on reading lips, my cochlear implant, and the captions. It’s all happening in real-time. Conversations move fast. Remember that part at the beginning about my brain multitasking to listen (in my own way)? That’s why the placement of the captions matter. See the section on eye contact for more information.

Perhaps, a compromise is to give users a choice on caption placement. The key is that it needs to be part of the app. In other words, when I select the video app, the captions will be right there. I won’t have to work to bring both the captions and the video app back into view.

Some people suggested using a phone for the captions. This doesn’t work well because of caption placement. It takes my eyes off the video. As such, I miss too much information trying to read the captions on the phone during the call. Remember that automatic caption accuracy is far from perfect.

3. Accuracy

The same goes for accuracy. Every call is different in sound quality, volume, and content. In testing the tools, I couldn’t fairly say whether one is more or less accurate than the others. That’s why I tested all the apps against a single podcast. This provides an apples-to-apples comparison in terms of content.

4. Synchronized

Timing, of course, matters. You want the text to keep up with the audio. With live captions, you can count on some lag. Some apps have less lag than others. And some have more.

5. Caption Flow and Movement

Live captions tend to scroll along where you read them from right to left and top to bottom. Typically, like any other captions, you see about one or two lines at a time. This helps with tracking. Anything longer makes it easy to lose your place as your eyes dart between the video and the captions.

In some live captions, the captions jerk. They don’t flow smoothly and move unpredictably. The captions don’t keep up with the audio. There should be no more than a few milliseconds of delay between the audio and its associated captions. For more on this and examples, see Flow in the Caption Guide.

6. Transcript

Can you save the transcript after the call? For important meetings, I’ll want to review the transcript. Remember, my brain is multitasking during these calls to follow the conversation. It doesn’t remember everything.

Important Note About Microphones

Listening refers to the app’s ability to listen. It can’t caption without the ability to hear the conversation. I want to use my desktop for video calls. But it doesn’t have a built-in microphone like my laptop does. Every app except Google Meet need this microphone to work. The apps cannot “hear” through the desktop’s internal speaker. They need a microphone.

If I had a webcam on my desktop, it probably will work just like the laptop. What I do is turn my phone into a webcam with an app. The app doesn’t use the phone’s mic. But there are apps that do. The app didn’t work, which is why I couldn’t use them.

So, these are the guidelines I follow in analyzing the quality of each tool.

Eye Contact on Video Calls

Another person who is deaf like me expressed concern about caption placement on video calls. He believes it’s harmful because it takes away from eye contact. This is not an issue limited to people who read captions. I’ve been on many calls since writing this and eye contact is rarely direct. People cannot always set up their webcams to get the kind of eye contact we get in person.

An informal poll reveals that many people agree eye contact isn’t important on a video call. It’s all about the content.

In person, I struggle to talk to people who don’t make eye contact. I cannot explain why this happens when I’m doing the talking. A deaf friend expressed the same thing. Obviously, when I’m listening, eye contact improves lip-readability.

The challenge with some of these tools is that you’re managing two apps: the video call app (i.e. Zoom) and the captioning tool. If you switch to another app or share your screen, it can mess up everything. And you’ll have to fiddle with the apps to get them all back.

That’s why captions built into the app is the ideal solution. This way, you’re only dealing with one app instead of two. Only two apps have this capability: Skype and Google Meet. Microsoft Teams will have this feature, but it’s not widely available yet. Users appreciate it when they can resize the captions and select where they want to position the captions.

Important Note About Microphones and Headsets

Microphones and headsets complicate things with automatically captioning video calls. For most automatic captioning tools, I use my laptop with its built-in camera and microphone. It’s the only way the apps could “hear” voices in videoconferencing.

My desktop computer has a speaker, but no mic. when I plug in a mic, the sound disappears. And when there’s no sound coming through the speakers, there’s nothing to caption. Catch -22. After fiddling with the sound settings, I finally got the computer to allow the mic to work and play sound over the speakers. It worked for most of the tools and apps except Skype. It just stood there silent.

Headsets with a mic won’t work with tools that need to hear the audio. The apps can’t hear anything. Thus, they can’t provide captions.

This is where video calls with built-in automatic captioning have an advantage. These include Google Meet, Skype, and Microsoft Teams.

The Best Captioning Tool for Video Calls

Of course, the best way to caption video calls is with a human typing the captions. The reality is that it’s not feasible for many calls. But if accuracy is important, this is your best bet. It’s ideal for large companies, conferences, seminars, and classes.

How to caption video calls with a human depends on the software you’re using for the call. For example, Zoom explains how to add captions to its calls.

For the majority of calls, automatic captions is the next best thing. If I’m in a small meeting or a one-on-one call, it’s not feasible to bring in a human to caption it. But humans provide the highest accuracy rate. Yay, humans!

One thing to note. Most of these apps are not designed for captioning video calls. A negative review doesn’t mean the tool isn’t good. It means the tool’s focus is elsewhere.

Here are the tools covered in this article:

All righty. Let’s dig in!

Skype Automatic Captioning

Thank goodness I didn’t stop video conferencing after my second experience with live automatic captions.

It was bad. Very bad.

I couldn’t believe it. Microsoft’s Skype. Really?

It surprised me because Microsoft is an ardent supporter of accessibility. They have a Chief Accessibility Officer named Jenny Lay-Flurrie. And get this. She happens to be deaf like me.

The Skype caption formatting is good except the text size is too small.

But the real problem is the live captioning. If one person says a few sentences, Skype displays one or two words at a time quickly. No one reads that fast.

A moment or two — much too long compared to other apps — after the person finishes speaking, the whole paragraph shows up as one big block of text. A Microsoft employee reports this is an issue that recently regressed.

Skype interface with automatic captions
Automatically captioning a podcast in Skype

You can work around it by switching to the transcript view. I try that and the flow of the text is the same as the standard view.

The employee also shares a workaround for the text size. Open Skype in a web browser and use the browser’s zoom tools to enlarge the captions. And this is what happens.

Skype interface with zoomed in captions in the transcript
Enlarging the captions in Skype’s transcript view

You can’t control the box placement and it covers up other tools.

And finally, Skype’s automatic captions live up to their nickname of autocraptions. While on a call with my family, my husband and I were laughing hard and taking screenshots of the captions. I sent it to my family during the call. We saw them crack up in seeing the humorous autocraptions.

Skype automatic captioning score

Scoring: 1 is poor. 5 is excellent.

Readability: 3 out of 5

Size: 2 out of 5

Format: 3 out of 5

Scrolling: 3 out of 5

Synchronized: 2 out of 5

Accuracy: 2 out of 5

Transcript: Yes

Google Meet

Google Meet with captions for video calls is now available free!

I discovered it by accident while adding a meeting in Google Calendar.

To create a Google Meet videoconferencing meeting, Open Google Calendar. Select “Add Google Meet video conferencing” and set up your meeting.

Schedule Google Meet meeting in Google Calendar
Schedule Google Meet meeting in Google Calendar

As soon as you enter Google Meet, you’ll see a CC button to add captions.

And the GAME-CHANGER is that it tells you who is talking. It’ll be interesting to see how well (or not) this works for a larger group.

Another thing I like is that the captions appear on the bottom and they’re readable. I don’t have to open a second app for the captions and then rearrange the video and captioning apps to line them up. When I do something on the screen that takes the captions and video out of view, I have to fix both to get back to the meeting with captions. That’s not an issue with Google Meet.

However, Google Meet does not save the conversation. This is its biggest weakness. Updated on May 13, 2020: I received the following email from Google saying that my feature request for transcripts is in the correct hands to make it potentially happen in the future!

Email from Google confirming feature request to include transcript is submitted and in the correct hands to make it potentially happen.
Email confirming transcript feature request in the right hands to potentially make it happen.

I tested Google Meet captions on the same podcast that I used to test the different automatic captioning tools for video calls. This way you have an apples-to-apples comparison. The accuracy was about the same as the rest of them — far from perfect.

However, I used the free Google Meet in a personal call. It did a fantastic job captioning both of us. Yes, my deaf accent!

Google Meet works on my desktop when I use a headset with a microphone. The captions always work whether you use a mic, headset, or neither. All the other apps — except Skype and Microsoft Teams — require using a device with its own mic. This could be a laptop with a built-in webcam or any device with a webcam.

Artificial intelligence is improving. I hope it’ll eventually understand my accent that hails from nowhere. I look forward to the day it gets my name right!

Several studies say eye contact affects trust. I watched the captions while creating this video (at the end of this article). You can see I’m looking down a little and reading.

Google Meet automatic captioning score

Scoring: 1 is poor. 5 is excellent.

Readability: 4 out of 5

Size: 5 out of 5

Format: 5 out of 5

Scrolling: 3 out of 5

Synchronized: 5 out of 5

Accuracy: 4 out of 5

Transcript: No

Google Slides: Automatic Captioning in Presentation Mode

Next victim … err … tool is Google Slides automatic captioning.

Google Slides allows presenters to add subtitles to their presentation. Using a black background with white text, it’s one of the most readable captions. And the text size is Goldilocks approved: just right.

Scrolling causes no problems. Google Slides’ captions are about as synchronized as it can get for live captioning. As for accuracy, it has one of the better rates out of all the apps reviewed.

The downside is that you can’t download a transcript. And that it requires managing two apps during the call. If you switch apps or share your screen, you have to fiddle with getting them all back together again.

Here are the steps to turn on Google Slides captions:

  1. Make sure your microphone is on. (It won’t give you the CC option without it.)
  2. Open a blank Google Slide presentation.
  3. Select “Present” to go into presentation mode.
  4. Select “CC” (you can format it here)

I put the Zoom screen above the captions. When I used this for a group meeting, I followed the conversation pretty well.

The following image is a snapshot of the podcast. During a call, the video shows up instead of the white box.

Blank screen with two lines of captions
Google Slides automatically captions a podcast

Google Slides Automatic Captioning Score

Scoring: 1 is poor. 5 is excellent.

Readability: 5 out of 5

Size: 5 out of 5

Format: 5 out of 5

Scrolling: 5 out of 5

Synchronized: 4 out of 5

Accuracy: 4 out of 5

Transcript: No

Microsoft PowerPoint Presentation Translator Add-In

PowerPoint for the web and an Office 365 subscription come with subtitles. All other editions will need to download Presentation Translator, a PowerPoint add-in.

It’s like Google Slides. Both automatically caption a live presentation. Both let you post the captions at the bottom or the top. And both have captions with a black background and white text.

Both involve managing two apps during the call and fiddling with them if you switch apps or share your screen.

And that’s where the similarities end.

PowerPoint has more features than Google Slides. You can add a slide with a QR code for others to scan to view the live captions on their devices! So. Cool.

The add-in can save the full transcript. You can choose the language spoken and the language for the subtitles.

Here are the steps to use Presentation Translator:

  1. Open a blank PowerPoint presentation. (I created one called Captions.pptx to use every time.)
  2. Select the “Slide Show” tab.
  3. Select “Start Subtitles.”

The following pop-up box appears:

PowerPoint add-in pop up box with language options
PowerPoint Presentation Translator add-in options

Select “Additional Settings” to see the following options:

PowerPoint Presentation Translator add-in settings box
PowerPoint Presentation Translator add-in settings box

The “Add instructional slide” contains the QR Code that lets attendees see the captions on their own devices.

The only negative is the URL that shows up in the captions. At times, it’d cover up some of the captions. I’ve found a solution.

After starting the subtitles in PowerPoint, press “Esc” to give the captions its own box. You can adjust the box’s width, length, and placement. I put it right below the video app.

You can view more than two lines of captions at a time. Adjust the width to your liking. Mine is somewhere in the middle — neither too long or too short. Longer makes it harder to track while reading the captions.

Beware that it automatically mutes the captions after you press Esc. Just un-mute it and ta-da! The URL stops blocking the captions. Here’s what the transcript looks like.

PowerPoint Presentation Translator add-in caption transcript with timings
PowerPoint Presentation Translator add-in caption transcript

PowerPoint Presentation Translator Add-in Automatic Captioning Score

Scoring: 1 is poor. 5 is excellent.

Readability: 5 out of 5

Size: 5 out of 5

Format: 5 out of 5

Scrolling: 5 out of 5

Synchronized: 4 out of 5

Accuracy: 4 out of 5

Transcript: Yes

Otter.ai Voice Meeting Notes for Automatic Captions

Otter.ai has announced a partnership with Zoom. It automatically transcribes meetings during the call for paid Zoom plans. If the transcription is anything like I’ve experienced, it’ll be hard to follow. Read on for the details.

If you don’t have a paid Zoom account, Otter requires a different set-up than the others. That’s because the app is inside the browser. Yes, Google Slides is also in a web browser. But it puts the captions at the bottom, so I just have to move the Zoom app in front of it. Then, I resized the Zoom screen to fit right on top.

Otter is a different story. To set it up, I opened Zoom in its own window.

Then, I loaded Otter in a web browser. I put the browser side-by-side with Zoom like in the following image.

Video on the left side and Otter.ai note on the right side with the captions
Lining up the video and Otter.ai side-by-side

Yes, I tried resizing the web browser to put Otter below Zoom. But it didn’t work well. Parts of the text would appear in light grey, which is hard to read. Otter is designed to act like a notetaker rather than a captioner. Like PowerPoint and Google Slides, it involves managing multiple apps. And it gets crazy when you switch apps or share your screen.

Otter turned out to be my least favorite method. Here’s why:

  • Words didn’t flow smoothly.
  • Unpredictable content movement.
  • Text is small for captioning and hard to track.
  • Couldn’t put it below the video, which made reading difficult.

Remember, I’m looking at the person for lipreading. So, my eyes dart back and forth. Reading Otter’s captions is like trying to find something in an email with no paragraph breaks.

During the call, I barely looked at Otter’s transcription because it required a lot of effort. With smaller text in blocks of paragraphs, I couldn’t follow it.

How is the accuracy? I don’t know because I couldn’t follow the transcription. However, I had Otter transcribe the podcast. the accuracy is about the same as Google and PowerPoint as the next image shows.

Otter.ai transcript of podcast
Otter.ai transcript of podcast

Otter.ai Automatic Captioning Score

Scoring: 1 is poor. 5 is excellent.

Readability: 2 out of 5

Size: 2 out of 5

Format: 1 out of 5

Scrolling: 1 out of 5

Synchronized: 3 out of 5

Accuracy: 3 out of 5

Transcript: Yes

Web Captioner Automatic Captioning for Video Calls

Web Captioner is a web-based captioning software.

Out of all the tools, it has one major drawback. It only works with Chrome.

Web Captioner’s formatting is great. You can change the font, color, text size, and other traits.

The speed in keeping up with the live conversation is good.

Accuracy-wise. Ehhhh … I think it may be less accurate than Google Slides and the PowerPoint add-in.

One of its automatic caption errors had had me LOL. I can’t share it with you as it’s not G-rated. Let’s just say the autocraptions easily beat another app’s “jock itch.”

While I was laughing, my friend on the video call looked puzzled. I could barely get a word in and said, “Captions.”

Web Captioner wrote, “Cat urine.” LOL again.

Like Otter, using Web Captioner means contending with two apps or screens. It can make it tricky to manage especially if you share your screen or switch apps.

Web Captioner has the following unique features:

  • Saves transcripts as a text file.
  • Saves transcripts to Dropbox automatically.
  • Accepts word replacements. For example: Replace “Merrill” and “Marilyn” with “Meryl”.
  • Supports multiple languages.
  • Offers option to censor profane language.

The transcript appears in the browser tab. To put it below the Zoom video, I had to resize the entire Chrome browser. Even after resizing it, the text would fly off the screen. I had to keep an eye on the scrolling to ensure the captions stay visible.

If you create an account, Web Captioner will save your settings.

And here’s the transcript of the podcast à la Web Captioner.

Web Captioner transcript of podcast
Web Captioner transcript of podcast

WebCaptioner.com Automatic Captioning Score

Scoring: 1 is poor. 5 is excellent.

Readability: 5 out of 5

Size: 5 out of 5

Format: 5 out of 5

Scrolling: 2 out of 5

Synchronized: 4 out of 5

Accuracy: 4 out of 5

Transcript: Yes

Other Automatic Captioning Apps

You may be wondering why I didn’t include [fill in the blank] tool.

First, several people sent me the link to a knowledge base that lists live captioning and automated captioning tools. What do you know? I’ve tried all of the possible ones on the list and then some.

Now as to why some aren’t included. Live Transcribe is for Androids only. I don’t have an Android device. However, a dear friend of mine who is deaf speaks highly of the app.

Microsoft Translator is a non-starter. It can’t handle constant talking. It’s made for conversations. I tried using it to transcribe a podcast and it’d stop after about a paragraph. It’s tedious to keep pressing the microphone button to run. And it still doesn’t always go.

Thanks to Tilak for this Microsoft Translator tip. He advised going into conversation mode. Select the icon with two people chatting and tap “Start.” Type your name, select your language, and tap “Enter.” Tap the mic icon to talk. This works better than the previous method. But it ran into a lot of bumps. Sometimes it worked and sometimes it didn’t.

Microsoft Teams automatic captioning is in preview mode. So, it’s not yet available. Someone says it won’t save the captions. That’s surprising. Let’s hope that changes.

Ava for iOS and Android also doesn’t make the cut. I’ve used Ava for a few calls and conversations. It starts strong and then falls apart after a few minutes. It’s like the more she translates, the more tired she gets.

As for GoToMeeting and Webex, the only way to caption is with a third-party live captioner. Blue Jean will have automatic captioning when the host turns it on. Alas, it’s not a free service.

A Comparison of Automatic Caption Tools

As a comparison, I tested all the apps on a podcast with my daughter. This way the content is the same across the board.

Skype Automatic Captioning Video

Google Meet Automatic Captioning Video

Google Slides Automatic Captioning Video

PowerPoint Presentation Translator Add-in Automatic Captioning Video

Otter.ai Automatic Captioning Video

Web Captioner Automatic Captioning Video

When it’s up to me, I use PowerPoint’s add-in for automatically captioning live video calls. That’s because it scores high on all the factors, saves the transcript, and puts the captions in their own box. The box is adjustable and works well.

I just learned that Facebook plans to add automatic captions to live video and audio. It’s not out yet, but I will update this post when it’s available. You may want to bookmark this.

Thanks to Ann Marie and these tools, I’m enjoying video calls instead of trying to duck ‘n’ dodge ’em.

What’s your tip for video calls?

Resources

Captions in video calls: better accessibility, but harmful side effects: Quinn Keast is concerned about the effects of captions on video calls. He references a study that shows when you don’t make eye contact, it affects trust. I share my experience in this post. I’ve had video calls with people who aren’t using captions and they’re not making eye contact. It’s not their fault. I’m interested in the content of the call and their lack of eye contact doesn’t affect trust.

Online Meetings and Google Speech to Text Technology: Hamish Drewry shares his experience with video calls. I love that he points out that what works for him doesn’t necessarily work for another person who is deaf. Absolutely. People who are deaf and hard of hearing as just as diverse as the world.

Originally posted April 22, 2020

Updated May 12, 2020: Added note about microphones and headsets.

Updated May 13, 2020: Added note about Google Meet feature request for transcripts.

Updated May 27, 2020: Added Blue Jean.

Want More Content Like This?

Did you like this content? Would you like to know when the next post comes out? Sign up to receive piping hot content you can use.

6 thoughts on “Which Is the Best Automatic Captioning Tool for Video Calls?”

  1. Thanks for creating this. It is so important at this time in the world’s huge reliance upon such tech in response to COVID-19. What I played around with using Google Slides was instead of that start presentation and CC button, I opened a blank/new presentation and then hit Tools>>Voice Type Speaker Notes. Then turned on the mic that appears on the screen and it starts capturing everything. All that text can then be copied/pasted at the end. One problem I encountered while testing this while a Zoom session was running was that the focus needed to be upon the google slide tab in my browser. As soon as I clicked mouse focus on the Zoom window, the recording of text stopped.

    Reply
  2. Thanks so much for writing a detailed Cons & Pros for popular video-conference tools.
    I will go and try each of these and do a self-usability study myself, and run them by with someone.
    Only thing is CC would work if everyone speaks in English. I am from Pakistan and our native language is Urdu. Being deaf by birth and hard of hearing technically, I have to rely on a combination of lip-reading and body gestures to orient towards spoken conversations. Lately I have started using Live Transcribe app after getting an Android phone and finally felt like I was being included among the billions of podcasts and live talks on fb. I will try and think of creative ways to use video-calls and stop being shy. Thank you again!

    Reply
    • Thanks for sharing your experience, Khaula. I’m experiencing the same things you do — minus the language option. I just checked the PowerPoint add-in and it has Urdu! But the question is … can you get the add-in? Or do you use a premium version of Office 365?

      Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.