Closed Captioning in Livestreams

geralt / Pixabay

Total admission, I’m going to tell you how to have closed captioning on your livestreams with what’s commonly referred to in the deaf community as “Craptions” (which is better than nothing), but I’m going to also tell you right now, right off the bat, that this is going to take several hours of work on your part.  This method isn’t something you can just install and put into OBS and suddenly magically work with accuracy, not by a long shot.  This closed captioning system also only works on Windows 7 and up.  So if you’re using OS X by Apple or using Linux, this will not work for you.

The Required Program

I’ll go ahead and get the software bit that you need to download out of the way.  Download This and unzip it somewhere, your desktop is fine or in an assets directory if you use one.  Then, once extracted, open the program itself up and close it to create the settings file.  You can use that settings file to change background color to green for chromakey if you use those filters in OBS.  Now open it back up, and do a window capture on the program and place it where you want (and add the chromakey filter if you’re going that route).

Best results need 3 lines.  2 lines for the stuff already generated, and the very bottom line that’s live captions.  This produces enough clarity for your hearing impaired audience.

Testing The First Time

Now, if you leave the caption software open and try speaking, you’re going to notice that it’s not accurate in the slightest.  So you should logically be aware that we’re not going to be able to comprehend what you’re trying to convey in speech via that inaccurate as all hell closed caption, it has to be trained.  Remember how I said you couldn’t just download and install something and it magically work?

kmicican / Pixabay (How you’re going to feel after 30 minutes of training your software)

Training Your Speech Recognition Software

This software is based on the Microsoft Speech Engine which is built into Windows 7 and up.  Hit up search in the task bar or cortana and type Speech and open Microsoft Speech Engine.  Set up your microphone accordingly and start training.  The more you train it with your voice, the more accurate it becomes.  And yes it can and will become quite frustrating.  You’ll also be reading through the same training sessions alternating between two or three of them, this is normal.

Example of Training Speech Recognition Software

I trained my speech recognition software for 4 hours.  My video on dlive titled “Testing and Training my Live Captions” shows you the captions going from inaccurate as all hell to catching some of what I say down to being okish within a couple hours.  I’ve further trained my speech recognition software built into my Windows system an additional 2 hours off camera.

Other Alternatives?

geralt / Pixabay

You can use Google Docs for their speech recognition as a form of diction software and overall using Google’s system is already pretrained by countless hours of numerous people all around the world training it in a cloud based system with AI in it’s backend.  So technically you could work something out with showing a Google Docs page on your browser and letting it’s diction be your closed captioning which would be easier to set up and use than my method in regards to not having to train anything, but at the current point in time, I’m not sure of anyone that’s set this up.

You could also, if you’re willing to spend money, purchase software like Dragon Naturally Speaking to create closed captioning using text files embedded in OBS as a text source.  These methods are methods I’ve not personally explored but they are available to you should you wish to tinker.  If you’ve developed a better method and would like to help out, tell me your process and I’ll help spread the word about how to do it better and easier.

Future Updates?

I will continue to experiment with other closed captioning options to improve ease of implementing so others wanting to make their live streams accessible don’t have to train their own software too much and make it easier for their dictions to have improved accuracy all the same.  As well as try to make it easy to overall implement into their streaming software.  This is an area of technology that is rather important to me since livestreams are hugely inaccessible and lip reading, while I can do it, is a royal pain in the ass and quite exhausting.

Leave a Reply