Transcribing Audio to Subtitles Using OpenAI’s Whisper Model on Google Colab
In this post, I’ll walk you through the process of transcribing an audio file into subtitles using OpenAI’s Whisper model. We’ll use Google Colab for the environment, leveraging its GPU capabilities to speed up the transcription process.
Step 1: Setting Up the Environment
First, we need to mount Google Drive to access the audio files and save the generated subtitle files.
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')
Next, we’ll install the necessary packages. OpenAI’s Whisper model and PyTorch are required for transcription and GPU acceleration, respectively.
# Install required packages
!pip install git+https://github.com/openai/whisper.git
!pip install torch
Step 2: Loading the Model
We’ll import the necessary libraries and check if a CUDA-enabled GPU is available. This helps in leveraging the power of GPU for faster transcription.
import whisper
import torch
import os
# Check if a CUDA-enabled GPU is available
device = "cuda" if torch.cuda.is_available() else "cpu"
Load the Whisper model and move it to the GPU if available.
# Load the Whisper model and move it to the GPU if available
model = whisper.load_model("large", device=device)
Step 3: Transcribing the Audio File
Specify the path to the audio file stored on Google Drive. In this example, the file is named 1_2_nerwork_101.wav
and is located in the awsbc9_source
directory.
# Specify the path to the audio file on Google Drive
audio_file = "/content/drive/MyDrive/awsbc9_source/1_2_nerwork_101.wav"
# Set the input language to Turkish
input_language = "tr" # Turkish language code
Transcribe the entire audio file with fp16 enabled and specified language. This step converts the audio into text.
# Transcribe the entire audio file with fp16 enabled and specified language
result = model.transcribe(audio_file, fp16=False, language=input_language)
Step 4: Creating the SRT File
We need a helper function to format the timestamps correctly for the SRT (SubRip Subtitle) file.
# Helper function to convert seconds to SRT timestamp format
def format_timestamp(seconds):
milliseconds = int((seconds % 1) * 1000)
seconds = int(seconds)
minutes, seconds = divmod(seconds, 60)
hours, minutes = divmod(minutes, 60)
return f"{hours:02}:{minutes:02}:{seconds:02},{milliseconds:03}"
Create the SRT file content by iterating over the transcription segments and formatting them into the SRT format.
# Create the SRT file content
srt_content = []
for i, segment in enumerate(result["segments"]):
start_time = format_timestamp(segment["start"])
end_time = format_timestamp(segment["end"])
text = segment["text"].strip()
srt_content.append(f"{i + 1}")
srt_content.append(f"{start_time} --> {end_time}")
srt_content.append(text)
srt_content.append("")
Finally, write the SRT file to Google Drive.
# Write the SRT file to Google Drive
output_srt_file = "/content/drive/MyDrive/awsbc9_source/1_2_nerwork_101.srt"
with open(output_srt_file, "w") as f:
f.write("\n".join(srt_content))
print(f"Subtitle file saved to {output_srt_file}")
Conclusion
By following these steps, you can transcribe audio files into subtitles using OpenAI’s Whisper model on Google Colab. This process leverages the GPU for faster transcription and allows you to save the output directly to Google Drive.
Feel free to modify the code to suit your needs, such as changing the input language or the location of your files.
Source
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')
# Install required packages
!pip install git+https://github.com/openai/whisper.git
!pip install torch
import whisper
import torch
import os
# Check if a CUDA-enabled GPU is available
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load the Whisper model and move it to the GPU if available
model = whisper.load_model("large", device=device)
# Specify the path to the audio file on Google Drive
audio_file = "/content/drive/MyDrive/awsbc9_source/1_2_nerwork_101.wav"
# Set the input language to Turkish
input_language = "tr" # Turkish language code
# Transcribe the entire audio file with fp16 enabled and specified language
result = model.transcribe(audio_file, fp16=False, language=input_language)
# Helper function to convert seconds to SRT timestamp format
def format_timestamp(seconds):
milliseconds = int((seconds % 1) * 1000)
seconds = int(seconds)
minutes, seconds = divmod(seconds, 60)
hours, minutes = divmod(minutes, 60)
return f"{hours:02}:{minutes:02}:{seconds:02},{milliseconds:03}"
# Create the SRT file content
srt_content = []
for i, segment in enumerate(result["segments"]):
start_time = format_timestamp(segment["start"])
end_time = format_timestamp(segment["end"])
text = segment["text"].strip()
srt_content.append(f"{i + 1}")
srt_content.append(f"{start_time} --> {end_time}")
srt_content.append(text)
srt_content.append("")
# Write the SRT file to Google Drive
output_srt_file = "/content/drive/MyDrive/awsbc9_source/1_2_nerwork_101.srt"
with open(output_srt_file, "w") as f:
f.write("\n".join(srt_content))
print(f"Subtitle file saved to {output_srt_file}")