One good thing came out of COVID-19.
It forced me to try video calls again.
The reason I don’t love video calls is that I miss too much. While I’m a strong lipreader, video calls have a few challenges.
- Why I shied away from video calls
- The turning point
- Automatic captioning guidelines for video calls
- Eye contact on video calls
- Portable captions hack
- Important note about microphones and headsets
- The best captioning tool for video calls
- A Comparison of Automatic Caption Tool and Scores
Why I Shied Away from Video Calls
Often, the video jerks or jumps. It’s like ________ of a thought. Exactly. It’s like missing part of a thought.
Lipreading is hard when the sound doesn’t sync with lip movements. Another problem is that some videos look pixelated or blurry. That affects lipread-ability
These factors together create a frustrating experience. For me, listening is more important than speaking. Listening to video calls requires I work harder to hear than the average person. My brain is multitasking. It has to convert lip movement and sound into sentences. It has to fill in the blanks. It has to absorb what the person says. It has to make meaning of it. It’s a tall order in a short time.
You’ve probably seen the studies on multitasking. That most of us become less efficient at each task. That’s what happens to me on a video call.
The Turning Point
A client sent an invitation to a friendly lunch video conference. No pressure. Not mandatory. I decided to give it a try. At the very least, I could see their faces.
Just before the call, Ann Marie Beebout tells me she’s going to caption the call.
She researched it and found a way to do it with Google Slides.
It worked! It wasn’t perfect. I followed the conversation well enough. And even chimed in a couple of times.
Knowing when to speak up in a group video call is hard. I don’t want to interrupt anyone.
One-on-one video calls are best, of course. The more attendees there are, the harder it is to follow. Even with live subtitles from an automatic captioning tool.
Automatic Captioning Guidelines for Video Calls
None of the automatic captioning tools are perfect. Still, a couple tend to work better than others.
Live transcription while on a video call is a wholly different experience than simply transcribing a call after the fact. The guidelines for automatically captioning a video call is also different from the captioning guidelines for videos.
Based on my experience as someone who depends on captions to hear, I’ve created these automatic captioning guidelines for video calls. These determine the effectiveness of the tool.
Readability is a big factor. This refers to the ability to read the captions, not the actual content of the captions. Readability has three components: size, format (color), and scrolling. This factor quickly knocked some apps out of contention. Nonetheless, I tried the apps because I wanted to give them a fair chance.
A larger text size works better. Where possible, I placed the captions below the video, which brings them closer to the lips. (Cover in the next item: Caption Placement.) I constantly go back and forth between the two.
You know how most captions contain a black-ish background with white-ish text? Yup, this combination also works well for live video calls. I’ve done many experiments using feedback from people including those with color blindness, dyslexia, and ADHD. I use the following in my captions.
- Background: #242424 (slightly off-black)
- Text: #FFFFFD (slightly off-white)
Another factor is the scrolling of the text. Does it do it on its own? Or do I have to play with it to keep it in place? Does it keep disappearing or jumping?
2. Caption Placement
I asked people if they prefer captions on the top or bottom of videos. At 98 percent, almost everyone picked the bottom. One of the biggest reasons is because it puts the captions closest to the lips. This is especially true in video conferencing.
To follow a conversation, I depend on reading lips, my cochlear implant, and the captions. It’s all happening in real-time. Conversations move fast. Remember that part at the beginning about my brain multitasking to listen (in my own way)? That’s why the placement of the captions matter. See the section on eye contact for more information.
Perhaps, a compromise is to give users a choice on caption placement. The key is that it needs to be part of the app. In other words, when I select the video app, the captions will be right there. I won’t have to work to bring both the captions and the video app back into view.
Some people suggested using a phone for the captions. This doesn’t work well because of caption placement. It takes my eyes off the video. As such, I miss too much information trying to read the captions on the phone during the call. Remember that automatic caption accuracy is far from perfect.
The same goes for accuracy. Every call is different in sound quality, volume, and content. In testing the tools, I couldn’t fairly say whether one is more or less accurate than the others. That’s why I tested all the apps against a single podcast. This provides an apples-to-apples comparison in terms of content.
Timing, of course, matters. You want the text to keep up with the audio. With live captions, you can count on some lag. Some apps have less lag than others. And some have more.
5. Caption Flow and Movement
Live captions tend to scroll along where you read them from right to left and top to bottom. Typically, like any other captions, you see about one or two lines at a time. This helps with tracking. Anything longer makes it easy to lose your place as your eyes dart between the video and the captions.
In some live captions, the captions jerk. They don’t flow smoothly and move unpredictably. The captions don’t keep up with the audio. There should be no more than a few milliseconds of delay between the audio and its associated captions. For more on this and examples, see Flow in the Caption Guide.
Can you save the transcript after the call? For important meetings, I’ll want to review the transcript. Remember, my brain is multitasking during these calls to follow the conversation. It doesn’t remember everything.
Important Note About Microphones
Listening refers to the app’s ability to listen. It can’t caption without the ability to hear the conversation. I want to use my desktop for video calls. But it doesn’t have a built-in microphone like my laptop does. Every app except Google Meet need this microphone to work. The apps cannot “hear” through the desktop’s internal speaker. They need a microphone.
If I had a webcam on my desktop, it probably will work just like the laptop. What I do is turn my phone into a webcam with an app. The app doesn’t use the phone’s mic. But there are apps that do. The app didn’t work, which is why I couldn’t use them.
So, these are the guidelines I follow in analyzing the quality of each tool.
Eye Contact on Video Calls
Another person who is deaf like me expressed concern about caption placement on video calls. He believes it’s harmful because it takes away from eye contact. This is not an issue limited to people who read captions. I’ve been on many calls since writing this and eye contact is rarely direct. People cannot always set up their webcams to get the kind of eye contact we get in person.
An informal poll reveals that many people agree eye contact isn’t important on a video call. It’s all about the content.
In person, I struggle to talk to people who don’t make eye contact. I cannot explain why this happens when I’m doing the talking. A deaf friend expressed the same thing. Obviously, when I’m listening, eye contact improves lip-readability.
The challenge with some of these tools is that you’re managing two apps: the video call app (i.e. Zoom) and the captioning tool. If you switch to another app or share your screen, it can mess up everything. And you’ll have to fiddle with the apps to get them all back.
That’s why captions built into the app is the ideal solution. This way, you’re only dealing with one app instead of two. Only two apps have this capability: Skype and Google Meet. Microsoft Teams will have this feature, but it’s not widely available yet. Users appreciate it when they can resize the captions and select where they want to position the captions.
Portable Captions Hack
A couple of automatic speech recognition (ASR) apps would make good portable captions. You may find yourself in a situation where you need to use the ASR app on your phone for captions. Or you want to improve eye contact on a video call. Here’s a hack from Simon Lau.
Take a large binder clip as shown in the following image. Use the clip to place the phone on the top or bottom of the monitor where you’d want the captions.
Important Note About Microphones and Headsets
Microphones and headsets complicate things with automatically captioning video calls. For most automatic captioning tools, I use my laptop with its built-in camera and microphone. It’s the only way the apps could “hear” voices in videoconferencing.
My desktop computer has a speaker, but no mic. when I plug in a mic, the sound disappears. And when there’s no sound coming through the speakers, there’s nothing to caption. Catch -22. After fiddling with the sound settings, I finally got the computer to allow the mic to work and play sound over the speakers. It worked for most of the tools and apps except Skype. It just stood there silent.
Headsets with a mic won’t work with tools that need to hear the audio. The apps can’t hear anything. Thus, they can’t provide captions.
This is where video calls with built-in automatic captioning have an advantage. These include Google Meet, Skype, and Microsoft Teams.
The Best Captioning Tool for Video Calls
Of course, the best way to caption video calls is with a human typing the captions. The reality is that it’s not feasible for many calls. But if accuracy is important, this is your best bet. It’s ideal for large companies, conferences, seminars, and classes.
How to caption video calls with a human depends on the software you’re using for the call. For example, Zoom explains how to add captions to its calls.
For the majority of calls, automatic captions is the next best thing. If I’m in a small meeting or a one-on-one call, it’s not feasible to bring in a human to caption it. But humans provide the highest accuracy rate. Yay, humans!
One thing to note. Most of these apps are not designed for captioning video calls. A negative review doesn’t mean the tool isn’t good. It means the tool’s focus is elsewhere.
Here are the tools covered in this article:
- Google Meet
- Google Slides
- Microsoft PowerPoint Presentation Translator Add-In
- Web Captioner
All righty. Let’s dig in!
Skype with Automatic Captioning
Thank goodness I didn’t stop video conferencing after my second experience with live automatic captions.
It was bad. Very bad.
I couldn’t believe it. Microsoft’s Skype. Really?
It surprised me because Microsoft is an ardent supporter of accessibility. They have a Chief Accessibility Officer named Jenny Lay-Flurrie. And get this. She happens to be deaf like me.
The Skype caption formatting is good except the text size is too small.
But the real problem is the live captioning. If one person says a few sentences, Skype displays one or two words at a time quickly. No one reads that fast.
A moment or two — much too long compared to other apps — after the person finishes speaking, the whole paragraph shows up as one big block of text. A Microsoft employee reports this is an issue that recently regressed.
You can work around it by switching to the transcript view. I try that and the flow of the text is the same as the standard view.
The employee also shares a workaround for the text size. Open Skype in a web browser and use the browser’s zoom tools to enlarge the captions. And this is what happens.
You can’t control the box placement and it covers up other tools.
And finally, Skype’s automatic captions live up to their nickname of autocraptions. While on a call with my family, my husband and I were laughing hard and taking screenshots of the captions. I sent it to my family during the call. We saw them crack up in seeing the humorous autocraptions.
Google Meet with Automatic Captioning
Google Meet with captions for video calls is now available free!
I discovered it by accident while adding a meeting in Google Calendar.
To create a Google Meet videoconferencing meeting, Open Google Calendar. Select “Add Google Meet video conferencing” and set up your meeting.
As soon as you enter Google Meet, you’ll see a CC button to add captions.
And the GAME-CHANGER is that it tells you who is talking. It’ll be interesting to see how well (or not) this works for a larger group.
Another thing I like is that the captions appear on the bottom and they’re readable. I don’t have to open a second app for the captions and then rearrange the video and captioning apps to line them up. When I do something on the screen that takes the captions and video out of view, I have to fix both to get back to the meeting with captions. That’s not an issue with Google Meet.
However, Google Meet does not save the conversation. This is its biggest weakness. Updated on May 13, 2020: I received the following email from Google saying that my feature request for transcripts is in the correct hands to make it potentially happen in the future!
I tested Google Meet captions on the same podcast that I used to test the different automatic captioning tools for video calls. This way you have an apples-to-apples comparison. The accuracy was about the same as the rest of them — far from perfect.
However, I used the free Google Meet in a personal call. It did a fantastic job captioning both of us. Yes, my deaf accent!
Google Meet works on my desktop when I use a headset with a microphone. The captions always work whether you use a mic, headset, or neither. All the other apps — except Skype and Microsoft Teams — require using a device with its own mic. This could be a laptop with a built-in webcam or any device with a webcam.
Artificial intelligence is improving. I hope it’ll eventually understand my accent that hails from nowhere. I look forward to the day it gets my name right!
Several studies say eye contact affects trust. I watched the captions while creating this video (at the end of this article). You can see I’m looking down a little and reading.
Google Slides: Automatic Captioning in Presentation Mode
Next victim … err … tool is Google Slides automatic captioning.
Google Slides allows presenters to add subtitles to their presentation. Using a black background with white text, it’s one of the most readable captions. And the text size is Goldilocks approved: just right.
Scrolling causes no problems. Google Slides’ captions are about as synchronized as it can get for live captioning. As for accuracy, it has one of the better rates out of all the apps reviewed.
The downside is that you can’t download a transcript. And that it requires managing two apps during the call. If you switch apps or share your screen, you have to fiddle with getting them all back together again.
Here are the steps to turn on Google Slides captions:
- Make sure your microphone is on. (It won’t give you the CC option without it.)
- Open a blank Google Slide presentation.
- Select “Present” to go into presentation mode.
- Select “CC” (you can format it here)
I put the Zoom screen above the captions. When I used this for a group meeting, I followed the conversation pretty well.
The following image is a snapshot of the podcast. During a call, the video shows up instead of the white box.
Microsoft PowerPoint Presentation Translator Add-In
PowerPoint for the web and an Office 365 subscription come with subtitles. All other editions will need to download Presentation Translator, a PowerPoint add-in.
It’s like Google Slides. Both automatically caption a live presentation. Both let you post the captions at the bottom or the top. And both have captions with a black background and white text.
Both involve managing two apps during the call and fiddling with them if you switch apps or share your screen.
And that’s where the similarities end.
PowerPoint has more features than Google Slides. You can add a slide with a QR code for others to scan to view the live captions on their devices! So. Cool.
The add-in can save the full transcript. You can choose the language spoken and the language for the subtitles.
Here are the steps to use Presentation Translator:
- Open a blank PowerPoint presentation. (I created one called Captions.pptx to use every time.)
- Select the “Slide Show” tab.
- Select “Start Subtitles.”
The following pop-up box appears:
Select “Additional Settings” to see the following options:
The “Add instructional slide” contains the QR Code that lets attendees see the captions on their own devices.
The only negative is the URL that shows up in the captions. At times, it’d cover up some of the captions. I’ve found a solution.
After starting the subtitles in PowerPoint, press “Esc” to give the captions its own box. You can adjust the box’s width, length, and placement. I put it right below the video app.
You can view more than two lines of captions at a time. Adjust the width to your liking. Mine is somewhere in the middle — neither too long or too short. Longer makes it harder to track while reading the captions.
Beware that it automatically mutes the captions after you press Esc. Just un-mute it and ta-da! The URL stops blocking the captions. Here’s what the transcript looks like.
Microsoft Teams for iOS and Android
I could only test on the Microsoft Teams iOS app because it’s the only free version that comes with captions as explained in The Verge. I don’t have an Android device, but I suspect it’s similar to the iOS. The captions are built-in like Google Meet. You won’t have to fiddle with two separate applications to caption a video call.
This scenario requires opening a second app for the captions and rearranging the video call and captioning app to line them up. As soon as I do something on the screen that sends the captions and video behind other windows, I have to fix both to get back to the meeting with captions. That’s not an issue with Microsoft Teams.
To turn on the captions, join a meeting in the app. Select the ellipsis … (three dots) and “Turn on live captions.”
The captions appear on the bottom with a mild transparent black background with white text as shown in the next image. It’d be great if Microsoft would allow users to adjust the captions and transparency. Sometimes the movement behind the transparent captions can be distracting.
The captions on the iPhone are small. If you have an iPad or Android tablet, try those. They will appear larger. I hope that Microsoft will make a web-based or desktop version available free like with the iOS apps.
Microsoft Team’s caption accuracy is decent. The captions don’t contain much punctuation and they incorrectly captured a few words. Unlike Google Meet, it does not tell you who is talking. Moreover, Microsoft Teams does not save the conversation.
Otter.ai Voice Meeting Notes
Otter.ai for iOS (Free)
The best free option is with the Otter.ai app for iOS. Sometimes we need a portable captioning option. This is one of the better options for portability.
1. Select Record button.
2. Tap the Maximize in the upper-right corner. This turns the screen black with the text in white.
3. Change the size of the text by tapping the Size icon next to the Minimize icon.
It did a good job of captioning the podcast. The scrolling is decent. It could use some paragraph spacing to break up large blocks of test for improved readability.
Otter.ai Desktop Premium
A similar option is available on the desktop for premium versions of Otter.ai. Here are the steps to use this option:
- Open Otter.ai in a web browser.
- Select Record.
- Select Present button at the upper right corner.
- Resize the window.
- Select one of the icons at the upper right to adjust the font size.
And this method works exactly like the iOS version. I could move the Otter transcription below the podcast screen (which is where a video call would be). It worked great. So much better than the following approach. However, this option is only available on the premium version.
Otter.ai Desktop Free
FYI: Otter.ai has a partnership with Zoom. It automatically transcribes meetings during the call for paid Zoom plans. If you don’t have a paid Zoom account, Otter requires a different set-up than the others. That’s because the app is inside the browser.
Yes, Google Slides is also in a web browser. But it puts the captions at the bottom, so I just have to move the Zoom app in front of it. Then, I resized the Zoom screen to fit right on top.
The free desktop version of Otter requires a different approach. To set it up, I opened Zoom in its own window. Then, I loaded Otter in a web browser. I put the browser side-by-side with Zoom like in the following image.
Yes, I tried resizing the web browser to put Otter below Zoom. But it didn’t work well. Parts of the text would appear in light grey, which is hard to read. Otter is designed to act like a notetaker rather than a captioner. Like PowerPoint and Google Slides, it involves managing multiple apps. And it gets crazy when you switch apps or share your screen.
Here are the challenges with this method:
- Words didn’t flow smoothly.
- Unpredictable content movement.
- Text is small for captioning and hard to track.
- Couldn’t put it below the video, which made reading difficult.
Remember, I’m looking at the person for lipreading. So, my eyes dart back and forth. Reading Otter’s captions is like trying to find something in an email with no paragraph breaks.
During the call, I barely looked at Otter’s transcription because it required a lot of effort. With smaller text in blocks of paragraphs, I couldn’t follow it.
How is the accuracy? I don’t know because I couldn’t follow the transcription. However, I had Otter transcribe the podcast. the accuracy is about the same as Google and PowerPoint as the next image shows.
Descript for Creating Transcripts
To use Descript, first download the app. Run the app and hit record whenever you’re ready to transcribe a conversation.
Descript requires a different set-up than the others. That’s because it’s a desktop app. The text is more suited for notetaking and transcriptions than for captioning. The text is smaller and uses a mix of transparent grey and violet for the font color.
Putting the app’s window below the video doesn’t work well. A better option is to put the video and Descript window side-by-side. However, reading lips while glancing at the Descript text proves tiring.
When the text reaches the bottom of the screen, it does not automatically scroll down. I had to continuously use the mouse to scroll down to the next lines. This takes my focus away from the conversation. Remember, I’m relying on lipreading and reading the text all while comprehending the content. It’s quadruple-tasking to add scrolling.
Descript has one advantage over some transcription services: it breaks up the text into multiple paragraphs. Transcription apps tend to create one long, unreadable paragraph. It’s too easy to lose your place. The accuracy is about the same as the other apps.
Web Captioner: Captioning in the Browser
Web Captioner is a web-based captioning software.
Out of all the tools, it has one major drawback. It only works with Chrome.
Web Captioner’s formatting is great. You can change the font, color, text size, and other traits.
The speed in keeping up with the live conversation is good.
Accuracy-wise. Ehhhh … I think it may be less accurate than Google Slides and the PowerPoint add-in.
One of its automatic caption errors had had me LOL. I can’t share it with you as it’s not G-rated. Let’s just say the autocraptions easily beat another app’s “jock itch.”
While I was laughing, my friend on the video call looked puzzled. I could barely get a word in and said, “Captions.”
Web Captioner wrote, “Cat urine.” LOL again.
Like Otter, using Web Captioner means contending with two apps or screens. It can make it tricky to manage especially if you share your screen or switch apps.
Web Captioner has the following unique features:
- Saves transcripts as a text file.
- Saves transcripts to Dropbox automatically.
- Accepts word replacements. For example: Replace “Merrill” and “Marilyn” with “Meryl”.
- Supports multiple languages.
- Offers option to censor profane language.
The transcript appears in the browser tab. To put it below the Zoom video, I had to resize the entire Chrome browser. Even after resizing it, the text would fly off the screen. I had to keep an eye on the scrolling to ensure the captions stay visible.
If you create an account, Web Captioner will save your settings.
And here’s the transcript of the podcast à la Web Captioner.
Other Apps for Automatically Captioning Video Calls
You may be wondering why I didn’t include [fill in the blank] tool.
First, several people sent me the link to a knowledge base that lists live captioning and automated captioning tools. What do you know? I’ve tried all of the possible ones on the list and then some.
Now as to why some aren’t included. Live Transcribe is for Androids only. I don’t have an Android device. However, a dear friend of mine who is deaf speaks highly of the app.
Microsoft Translator is a non-starter. It can’t handle constant talking. It’s made for conversations. I tried using it to transcribe a podcast and it’d stop after about a paragraph. It’s tedious to keep pressing the microphone button to run. And it still doesn’t always go.
Thanks to Tilak for this Microsoft Translator tip. He advised going into conversation mode. Select the icon with two people chatting and tap “Start.” Type your name, select your language, and tap “Enter.” Tap the mic icon to talk. This works better than the previous method. But it ran into a lot of bumps. Sometimes it worked and sometimes it didn’t.
Microsoft Teams automatic captioning is in preview mode. So, it’s not yet available. Someone says it won’t save the captions. That’s surprising. Let’s hope that changes.
Ava for iOS and Android also doesn’t make the cut. I’ve used Ava for a few calls and conversations. It starts strong and then falls apart after a few minutes. It’s like the more she translates, the more tired she gets. The same goes for Live Transcribe Voice to Text for iOS (not the same as Google’s Live Transcribe for Androids).
A Comparison of Automatic Caption Tools
As a comparison, I tested all the apps on a podcast with my daughter. This way the content is the same across the board.
The following table has all the scores. The rating is on a scale of 1 to 5 with 1 being poor and 5 is excellent.
|PowerPoint Presentation Translator||5||5||5||4||4||4||Yes|
|Otter.ai Desktop Free||2||2||1||1||3||3.5||Yes|
Here is a description of each factor.
- Readability: Ability to read the captions. This relies on a mix of size, font choice, and colors.
- Size: The size of the captions.
- Format: How the captions look including font type, font color, and background color.
- Scrolling: How well the captions scroll through to the next lines.
- Synchronized: How well the captions keep up with the audio.
- Accuracy: How accurate the captions are and punctuation, if any.
- Transcript: If you can go back and review the text of the conversation.
Skype Automatic Captioning Video
Google Meet Automatic Captioning Video
Google Slides Automatic Captioning Video
PowerPoint Presentation Translator Add-in Automatic Captioning Video
Microsoft Teams iOS App
Otter.ai iOS Free
Otter.ai Desktop Premium
Otter.ai Desktop Free
Descript Automatic Captioning Video
Web Captioner Automatic Captioning Video
When it’s up to me, I use PowerPoint’s add-in for automatically captioning live video calls. That’s because it scores high on all the factors, saves the transcript, and puts the captions in their own box. The box is adjustable and works well.
I just learned that Facebook plans to add automatic captions to live video and audio. It’s not out yet, but I will update this post when it’s available. You may want to bookmark this.
Thanks to Ann Marie and these tools, I’m enjoying video calls instead of trying to duck ‘n’ dodge ’em.
What’s your tip for video calls?
Captions in video calls: better accessibility, but harmful side effects: Quinn Keast is concerned about the effects of captions on video calls. He references a study that shows when you don’t make eye contact, it affects trust. I share my experience in this post. I’ve had video calls with people who aren’t using captions and they’re not making eye contact. It’s not their fault. I’m interested in the content of the call and their lack of eye contact doesn’t affect trust.
Online Meetings and Google Speech to Text Technology: Hamish Drewry shares his experience with video calls. I love that he points out that what works for him doesn’t necessarily work for another person who is deaf. Absolutely. People who are deaf and hard of hearing as just as diverse as the world.
Originally posted April 22, 2020
Updated May 12, 2020: Added note about microphones and headsets.
Updated May 13, 2020: Added note about Google Meet feature request for transcripts.
Updated May 27, 2020: Added Blue Jean.
Updated June 1, 2020: Added Descript.
Updated June 11, 2020: Added Otter.ai iOS and Desktop Premium.
Updated June 15, 2020: Added portable captions hack.
Updated June 17, 2020: Created comparison table to compare the tools.
Updated July 1, 2020: Added Microsoft Teams.
Want More Content Like This?
Did you like this content? Would you like to know when the next post comes out? Sign up to receive piping hot content you can use.