Transcribe Speech into Text
There are many tools that can handle transcription from speech into text. Many are commercial softwares or SaaS that charge a fee. Whisper from OpenAI is free and open-sourced.
Install FFmpeg
Whisper requires the use of FFmpeg. Use macOS package manager brew to install ffmpeg, a very versatile command-line tool to handle multimedia transformations.
brew install ffmpeg
It would take a long time to install ffmpeg as it requires many dependency packages. It might take multiple tries to complete the installation, especially if the network connection is not stable.
Install Whisper from OpenAI
On macOS, use pip to install Whisper API, following the instructions from OpenAI's Github repo that hosts the open-sourced Whisper API.
pip install -U openai-whisper
Check if the installation is successful.
whisper --help
Transcribe English
To transcribe English speech into text, just whisper!
whisper your-english-audio.mp3 --model medium
Experiment with LLM models of different sizes and parameters to trade off performance and speed. The default setting is small.
Transcribe Non-English
For non-English speech, add --language option
whisper your-chinese-audio.m4a --language Chinese
It will transcribe the audio file line by line on screen. It's pretty magical.
It takes a long time though. For a recent 4-hour video, it took about 3-plus hours to complete.
Upon successful completion, Whisper will produce 5 files for the transcribed script in .json
, .srt
, .tsv
, .txt
, and .vtt
format. The most commonly used subtitle format is .srt
, which can then be embedded or burned into the original video file.
Whisper didn't know some of the specific words used in my industry or line of work, but its transcription did a fine job and the script is 100% in-sync with the speech on pace. It's 100% automatic. That's a huge productivity booster.
Translate
Whisper does transcription by default. To translate, use --task to trigger translate function. By default, it translates non-English into English.
whisper japanese.wav --language Japanese --task translate
Do not perform translation and transcription in the same folder. Both actions will produce the same output files with the same names and potentially could overwrite previous outputs without warning.
Like it? A donation or tip would go a long way.
14DYUJw7SYszhDtn3UHhRwV9WpmGWeFAve
0xCfB04B53C05488Ac0aC4F47C9550e8Ca1eaA476e
d80bd36baca1a0166e812c0f669ffaa222a7c6b6f2a8e3f5c690bafe251aaf4b