How to transcribe audio to text

8.1K views
9 min read

With the rise of AI technologies, learning how to transcribe audio to text might prove to be a very useful skill. Imagine that you have a voice memo from work and you need to convert its audio file to text — sitting down to listen to what’s being said and typing up the transcript might take up to several hours of your time. At the same time, AI’s audio to transcript timeline might be a few minutes or even seconds!


In this article, we’ll discuss different ways to transcribe audio to text, using existing audio or video files, or creating your own recordings (think meetings. podcasts, or even TikTok videos) first. We’ll also look at your options as to AI that summarizes audio. Read on to find out what tools would fit your transcript audio to text needs the best.

Transcribe audio to text in seconds

An effective and simple app to use if you don’t want to spend a lot of time typing is MurmurType. It’s somewhat like the native macOS dictation feature, but with better accuracy and an impressive ability to transcribe your words, automatically translating them into 19 different languages (including English, German, Arabic, Chinese, and Spanish). 

MurmurType can record voice and transcribe audio to text for you right away. This makes it a great app to use for a Zoom meeting if you want to get meeting minutes transcribed automatically. Or if you’d like to conjure up a social media post while multitasking.

Using MurmurType is very easy:

  1. Install MurmurType and choose the mic you’d like to record with from the list of available microphones
  2. Click Record
  3. Speak or join the meeting you’d like to record
  4. Click Transcribe when you’re done
  5. Press Command + V to paste the text where you need it

    MurMurType recorder and transcriber

For seamless cooperation, MurmurType lets you customize the built-in silence tracker and come up with your own keyboard shortcuts to start recording and paste the text.

Another handy tool to transcribe audio to text automatically is called superwhisper. The app will help you write emails, send messages and take notes in over 100 languages at superhuman speed. All processing takes place on your device, no Wi-Fi required.


If you’d like to try a more structured UI for working with ChatGPT-4, go for TypingMind. It offers an extensive set of built-in AI personas (from Product Manager or Financial Adviser to Standup Comedian or Life Coach) and a library of prompts to help you bring a better focus to your conversations with AI. 

Use TypingMind to ask questions, request explainers, collect and summarize information on a subject, fix grammar in your writing, and much more.

TypingMind to use ChatGPT-4 on a Mac

Use AI that summarizes audio to text

There’s no doubt that AI tools are great at summarizing text: in a matter of seconds providing you with a concise version of what has been discussed in the meeting. How often, though, do you regret not having recorded a meeting to give AI to summarize? An important client call, a productive brainstorm, an insightful user interview — you can’t record them after they have already happened. Or can you?

Meet Backtrack. An app that can both save a recording of your meeting retrospectively and provide an AI transcript or summary of it.

Backtrack makes sure you don’t have to worry about taking notes or even pressing the record button: once installed, the app records all of your meetings automatically — letting you decide after the fact if you want to have anything saved. You can go back for up to 5 hours in Backtrack recordings before they get overwritten with the new ones.

To record a meeting happening on the screen of your Mac and convert audio to text afterward using Backtrack:

  1. Download and launch Backtrack (it will automatically begin recording)
  2. Go to the app’s menu bar icon to open Settings
  3. Choose the period for recording (from 15 minutes to five hours back in time)
  4. Drag the Backtrack menu bar icon to your desktop and backtrack the amount of audio you want to save once your meeting is over
  5. Choose where you’d like to save the recording
  6. Ask Backtrack to turn audio into text or use the app’s AI that summarizes audio 

    Backtrack settings

Convert video to text

Learning how to convert audio to text is great and definitely going to pay off in productivity. But sometimes the files you’d like to get a transcript of are videos. A TED talk you’re especially impressed by, a webinar you’d like to get the notes for, or a focus group session recorded. Worry not — all you need to do to convert video files with audio to text is add captions.

You can use VidCap to add subtitles to your reels or other video recordings and then choose to export your captions as text. 

VidCap uses advanced speech-to-text AI technology to transcribe video or audio files to text. It takes the app only a few minutes to generate captions, automatically translating into English from more than 60 languages or transcribing the original audio in French, German, Japanese, Mandarin, Polish, Spanish, Ukrainian, and more.

VidCap is very intuitive. You won’t need any additional instructions to use it to get a text transcript from a video:

  1. Launch VidCap ➙ Pick a Video
  2. Upload the file you’d like to transcribe
  3. Specify the original language of the file’s audio and decide if you need to have the text translated ➙ Generate Subtitles
  4. Once the subtitles are ready, click on Export ➙ Export Subtitles ➙ Transcript to get a TXT file (the other available options are SRT and VRT file formats)
  5. Name the transcript file and choose a location where you want to save it ➙ Save

    VidCap welcome screen

What’s great is that VidCap can work with audio files just as successfully (simply upload an audio file instead of video and then follow the same steps as described above to convert audio to text).

You can also choose to create captions for your videos using the native YouTube caption-generation feature as free transcription audio to text. But let’s compare these two options.

There’s no doubt that YouTube’s free captioning helps make content more accessible. But the quality of these automatic subtitles is subject to mispronunciations, unrecognizable utterings, accents, background noises, or simply no support for the language in the video. It’s usually recommended that you try to add professional subtitles first.

VidCap, on the other hand, is often praised for how accurate its captions are even when translated into another language or transcribing audio to text from poor quality footage.

Additionally, VidCap allows for editing and formatting of the text to match the tone and look of your video or social media account. You can choose from a vast collection of formatting tools (including text color, font, size, backgrounds, and animation styles) and preview your video before sharing it.

VidCap allows for editing and formatting of the transcribed text

Turn audio into text: top apps for different occasions

When looking to transcribe audio to text, your choice of an app for the task depends largely on whether you already have the file that needs transcript audio to text or you’re planning to record yourself and know you’d like to turn audio into text afterward.

superwhisperMurmurType and Backtrack make it possible for you to record within the app and get the transcript minutes (or even seconds) after you’re done. You can speak Ukrainian, Chinese, or Arabic, etc. and have the app convert audio to text in English automatically. Backtrack goes even further and gives you the opportunity to decide to have any of your meetings recorded after they have already happened — just as long as you have the app installed on your Mac.

If you haven’t made a recording with MurmurType or Backtrack and got your audio file from somewhere else, the easiest way to transcribe audio to text is by using VidCap. The app works with both video and audio formats, generating subtitles that you can export as text files with proper punctuation, capitalization, and no timestamps.

Once you have your transcript ready, don’t forget you can easily get it summarized with the help of AI. Try TypingMind for a more focused ChatGPT UI that shapes AI responses according to a built-in persona of your choice.

Be sure to test different audio to transcript options mentioned in the article for free with the seven-day trial of Setapp, a platform of best-in-class iOS and macOS productivity apps. Learn how superwhisper, MurmurType, Backtrack, and VidCap handle audio to text conversion, and check out more than 240 other options to boost your performance at work.

250+ apps for $9.99
per month

Sign up to Setapp and try them for free.

Security-tested