Skip to main content

Transcribe Speech into Text

There are many tools that can handle transcription from speech into text. Many are commercial softwares or SaaS that charge a fee. Whisper from OpenAI is free and open-sourced.

Install FFmpeg

Whisper requires the use of FFmpeg. Use macOS package manager brew to install ffmpeg, a very versatile command-line tool to handle multimedia transformations.

brew install ffmpeg

It would take a long time to install ffmpeg as it requires many dependency packages. It might take multiple tries to complete the installation, especially if the network connection is not stable.

Install Whisper from OpenAI

On macOS, use pip to install Whisper API, following the instructions from OpenAI's Github repo that hosts the open-sourced Whisper API.

pip install -U openai-whisper

Check if the installation is successful.

whisper --help

Transcribe English

To transcribe English speech into text, just whisper!

whisper your-english-audio.mp3 --model medium

Experiment with LLM models of different sizes and parameters to trade off performance and speed. The default setting is small.


Transcribe Non-English

For non-English speech, add --language option

whisper your-chinese-audio.m4a --language Chinese

It will transcribe the audio file line by line on screen. It's pretty magical.


It takes a long time though. For a recent 4-hour video, it took about 3-plus hours to complete.

Upon successful completion, Whisper will produce 5 files for the transcribed script in .json, .srt, .tsv, .txt, and .vtt format. The most commonly used subtitle format is .srt, which can then be embedded or burned into the original video file.

Whisper didn't know some of the specific words used in my industry or line of work, but its transcription did a fine job and the script is 100% in-sync with the speech on pace. It's 100% automatic. That's a huge productivity booster.


Whisper does transcription by default. To translate, use --task to trigger translate function. By default, it translates non-English into English.

whisper japanese.wav --language Japanese --task translate

Do not perform translation and transcription in the same folder. Both actions will produce the same output files with the same names and potentially could overwrite previous outputs without warning.

Like it? A donation or tip would go a long way.

  BTC  14DYUJw7SYszhDtn3UHhRwV9WpmGWeFAve
  ETH  0xCfB04B53C05488Ac0aC4F47C9550e8Ca1eaA476e
  ICP  d80bd36baca1a0166e812c0f669ffaa222a7c6b6f2a8e3f5c690bafe251aaf4b