How to Transcribe .MP3 Audio from Podcasts or .MP4 Movies to Text on Mac OS

 June 2017: a key component for these instructions is no longer actively maintained, so these instructions are no longer valid for Modern Mac configurations.

I listen to podcasts. I watch videos. I watch podcasts of different languages. But more than anything I read and write. I practice languages. That’s just how I roll. And sometimes, my ramblings bring me as far as understanding English meaning of some specific kikuyu translation texts.

Frequently I want to save an audio snippet or video clip for future reference. Sure I could save the source media file, if I had unlimited disk space. But what I usually do is keep a link to the original source and text synopsis of the snippet. That both saves on storage and makes future searches for that particular item simpler.

If you’re like me, you really want the original text more than a synopsis. It take s a bit of extra effort, but I have a nice solution that uses only a Mac and open source software. Read below for instructions on converting an MP3 audio file to a text document.

The Basics of Configuring Your Mac to Transcribe .MP3 Audio

Here’s what you need:

  • The original media (.mp3 file, for example)
  • Soundflower. Soundflower is an application that creates a virtual audio channel and directs audio input and output to physical or virtual devices.
  • Audacity. Audacity is a free application for recording and editing sounds.
  • TextEdit is the default text editor/word processor that is included in Mac OS X.

Follow the instructions on the developer websites to get all of the software installed and working on your system. Once you have the software installed, the next step is to configure your Mac to use Soundflower for dictation.

Transcribe mp3 audio: Dictation and Speech
  • Open System Preferences and click on  “Dictation & Speech”
  • Select the Dictation tab
  • Select “Soundflower (2ch)” as the dictation input source
  • Click Dictation to “On”
  • Tick the “Use Enhanced Dictation” box

Your Mac is ready for dictation. When dictation is turned on in TextEdit (or a another word processing app), your Mac will transcribe sound from the Soundflower input source.

Getting Your Audio and Text Files Ready

Next, you need to queue up the audio file in Audacity and direct output to Soundflower. For those who are new to Audacity, this will be the trickiest step. But relax, you don’t need to learn much about Audacity beyond deciding what section of sound to play and how to select the audio output from the default speakers to Soundflower.

Transcribe .MP3 Audio to Text - Audacity
  • Launch Audacity
  • Import your audio file into audacity (File–> Import, or simply drag the file into the center of the Audacity screen.)
  • Click the play button to give it a listen, then click stop once your confident you have the right sound clip/transcription area.
  • Choose Audacity –> Preferences –> Devices. Under playback, choose “Soundflower (2ch)” to switch the output from the onboard speakers to Soundflower. Click “OK”
Transcribe .mp3 audio: Audacity Preferences Dictation

With Audacity and your sound file queued up, its time to turn your attention to TextEdit.

  • Launch TextEdit
  • Create a “New Document”
  • You may want to add some meta data to the document, such as the podcast name, episode #, publish date and URL, to go along with the key transcript.
  • Position the cursor in the file where you want the transcript to appear.

And … Action!

It’s time to start audio playback and dictation transcription. Here both sequence and timing are important:

Transcribe .MP3 audio: Start Dictation
  1. In Audacity, move the scrubber start location 10-15 seconds before the key transcription area.
  2. Press “Play.” The scrubber and meters will start moving, though you won’t hear any sound. The audio signal is going to Soundflower instead of to the speakers.
  3. Put focus on Text edit and position the cursor where you want the transcription to begin.
  4. Select Edit –> Start Dictation. (or use the hot key combination, Fn Fn). A microphone icon with a “Done” button will appear to the left of your document.
  5. Text will start appearing in the document. It will likely lag by about 3-5 seconds.
  6. After approximately 30 seconds press the “done” button. Transcription will continue until complete.

This is the fun part: watch as transcription happens in real time right in the document window. Look Ma, no hands!

And now you have the original text (and most likely a few errors) as text to save. In the future you can easily search and retrieve the information.

An Excellent Alternative: Google Docs Voice Typing

While the solution above works great for offline work, one alternative with a lot of promise is Google Docs. The Voice Typing feature work much like the dictation service in Mac OS. It has the crowdsourcing advantages and privacy disadvantages of other Google products. If you’re OK with that, I found Voice Typing to do an very good job with accuracy and it can go longer that Mac OS dictation.

To use Google Voice Typing, follow all of the steps above with Soundflower, Dictation preferences and configuring Audacity.  Instead of using TextEdit, you’ll want to start the Chrome browser and create a Google Doc. Once you are in document, Select Tools –> Voice typing

Transcribe .mp3 Audio with Google Voice Typing

The user interface and process of starting and stopping transcription is the same as with TextEdit.

Dictation and Transcription Limitations

This process sets you well on you way to the goal of a high fidelity audio transcription. But it will be short of perfect. Here’s what you can do to go from good to perfect:

  • Understand that Mac OS dictation transcription works for a maximum of 30 seconds at a time. If you need longer, you may want to use an alternate technology such as Dragon.
  • Audio playback needs to start before dictation/transcription begins in TextEdit. TextEdit needs to be in focus for dictation to work. If you set the Audacity scrubber a few seconds ahead of target snippet, you’ll be fine.
  • Transcription cannot intuit punctuation. You’ll need to add that after the fact.
  • If you have multiple speakers or a noisy background, you may need to complete one additional step of creating a pristine audio file to work from. This can be done by listening to the sound through headphones and speaking the text into an audio recorder. Use the recording of your voice to drive the transcription.

Open Source Products I Use for Fun and Profit

I’ve devoted several years of my career to creating sustainable businesses around open source technology.

I’m not an open source zealot by any means. Rather, I’m an optimistic capitalist that believes there is money to be made from transforming the way enterprise software is conceived, developed, marketed, deployed, supported and enhanced. I have deep personal connections to the large and growing set of stakeholders that see value in the transparency, innovation, longevity and support systems emerging around open source solutions. Understanding that proprietary software vendors can’t match these advantages, I see an opportunity to transform the economics of enterprise software, create happy customers and make a buck for myself and others.

Open source is a great idea, but the fabulous products distributed via open source licenses are the real heroes. Below is a list of open source software that I actively use and directly support.

Product Description
7-zip I use this Open source Windows utility for manipulating archives everyday. Does what it says on the tin.
Filezilla Multi-platform ftp client that I use virtually every day.
Audacity An excellent Windows application for recording and editing sounds. I use it to edit podcasts.
LAME The best MP3 encoder I’ve found is free, compatible with every audio application I’ve used and improves with each new release.
Crimson Editor Small, fast, usable and feature rich text editor. While this product is no longer in active development, I continue to be a fan.
Java I’m not a programmer, but the number of Java-based applications I use is a testament to Sun’s powerful technology. Kudos to Sun for releasing Java with an open source license as a way to maximize profits from their investment.
XAMPP XAMPP saves me countless hours by providing a simply to install and administer web development environment that includes (among other things) Apache HTTP Server, multiple versions of PHP, MySQL and more.
Apache HTTP Server The first open source product I used way back in 1995 and the application that ushered open source software into enterprise data centers.
MySQL The world’s most popular open source database may not be the most feature rich, but it has more than enough power for my phpBB forum and WordPress blog. Easy to administer, small footprint, reliable.
PHP The server side scripting language and core component of XAMPP powers many of my recently deployed Web sites and Web applications. Zend Technologies employs the original developers and remains the catalyst for the language.
WordPress The power of community organization from the team at Automattic elevated Matt Mullenweg’s interesting code into a beacon of web usability and the promise of a plug-in architecture.
phpBB Easy to deploy and administer, phpBB defined the open source forum software category. It how has more competition than ever but continues to innovate.
Gallery My latest open source find, I use gallery to manage Internet photo albums. Perhaps a tad behind flickr and other photo sharing sites, but it gives me more control over privacy and intellectual property.
SquirrelMail I use and like SquirrelMail because its reliable and lightweight. Sadly its not a leader in innovation.
Postfix Even with sendmail available as open source and bundled in virtually every Linux distro, Postfix has become my favorite mail transfer agent thanks to rock solid reliability and ease of administration.

As part of the open source tradition of contributing back to the communities that make effective products, I’m sharing my endorsement along with links to the drivers of these products and communities. I wish all of the commercial interests, developers and customers driving these products a long and prosperous run.

Leave a comment with details about other great open source products.