Transcribing Audio to Subtitles Using OpenAI’s Whisper Model on Google Colab

Sinan Artun
3 min readMay 30, 2024

--

In this post, I’ll walk you through the process of transcribing an audio file into subtitles using OpenAI’s Whisper model. We’ll use Google Colab for the environment, leveraging its GPU capabilities to speed up the transcription process.

Step 1: Setting Up the Environment

First, we need to mount Google Drive to access the audio files and save the generated subtitle files.

# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Next, we’ll install the necessary packages. OpenAI’s Whisper model and PyTorch are required for transcription and GPU acceleration, respectively.

# Install required packages
!pip install git+https://github.com/openai/whisper.git
!pip install torch

Step 2: Loading the Model

We’ll import the necessary libraries and check if a CUDA-enabled GPU is available. This helps in leveraging the power of GPU for faster transcription.

import whisper
import torch
import os

# Check if a CUDA-enabled GPU is available
device = "cuda" if torch.cuda.is_available() else "cpu"

Load the Whisper model and move it to the GPU if available.

# Load the Whisper model and move it to the GPU if available
model = whisper.load_model("large", device=device)

Step 3: Transcribing the Audio File

Specify the path to the audio file stored on Google Drive. In this example, the file is named 1_2_nerwork_101.wav and is located in the awsbc9_source directory.

# Specify the path to the audio file on Google Drive
audio_file = "/content/drive/MyDrive/awsbc9_source/1_2_nerwork_101.wav"

# Set the input language to Turkish
input_language = "tr" # Turkish language code

Transcribe the entire audio file with fp16 enabled and specified language. This step converts the audio into text.

# Transcribe the entire audio file with fp16 enabled and specified language
result = model.transcribe(audio_file, fp16=False, language=input_language)

Step 4: Creating the SRT File

We need a helper function to format the timestamps correctly for the SRT (SubRip Subtitle) file.

# Helper function to convert seconds to SRT timestamp format
def format_timestamp(seconds):
milliseconds = int((seconds % 1) * 1000)
seconds = int(seconds)
minutes, seconds = divmod(seconds, 60)
hours, minutes = divmod(minutes, 60)
return f"{hours:02}:{minutes:02}:{seconds:02},{milliseconds:03}"

Create the SRT file content by iterating over the transcription segments and formatting them into the SRT format.

# Create the SRT file content
srt_content = []
for i, segment in enumerate(result["segments"]):
start_time = format_timestamp(segment["start"])
end_time = format_timestamp(segment["end"])
text = segment["text"].strip()
srt_content.append(f"{i + 1}")
srt_content.append(f"{start_time} --> {end_time}")
srt_content.append(text)
srt_content.append("")

Finally, write the SRT file to Google Drive.

# Write the SRT file to Google Drive
output_srt_file = "/content/drive/MyDrive/awsbc9_source/1_2_nerwork_101.srt"
with open(output_srt_file, "w") as f:
f.write("\n".join(srt_content))

print(f"Subtitle file saved to {output_srt_file}")

Conclusion

By following these steps, you can transcribe audio files into subtitles using OpenAI’s Whisper model on Google Colab. This process leverages the GPU for faster transcription and allows you to save the output directly to Google Drive.

Feel free to modify the code to suit your needs, such as changing the input language or the location of your files.

Source

github

# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Install required packages
!pip install git+https://github.com/openai/whisper.git
!pip install torch

import whisper
import torch
import os


# Check if a CUDA-enabled GPU is available
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the Whisper model and move it to the GPU if available
model = whisper.load_model("large", device=device)

# Specify the path to the audio file on Google Drive
audio_file = "/content/drive/MyDrive/awsbc9_source/1_2_nerwork_101.wav"

# Set the input language to Turkish
input_language = "tr" # Turkish language code

# Transcribe the entire audio file with fp16 enabled and specified language
result = model.transcribe(audio_file, fp16=False, language=input_language)

# Helper function to convert seconds to SRT timestamp format
def format_timestamp(seconds):
milliseconds = int((seconds % 1) * 1000)
seconds = int(seconds)
minutes, seconds = divmod(seconds, 60)
hours, minutes = divmod(minutes, 60)
return f"{hours:02}:{minutes:02}:{seconds:02},{milliseconds:03}"

# Create the SRT file content
srt_content = []
for i, segment in enumerate(result["segments"]):
start_time = format_timestamp(segment["start"])
end_time = format_timestamp(segment["end"])
text = segment["text"].strip()
srt_content.append(f"{i + 1}")
srt_content.append(f"{start_time} --> {end_time}")
srt_content.append(text)
srt_content.append("")

# Write the SRT file to Google Drive
output_srt_file = "/content/drive/MyDrive/awsbc9_source/1_2_nerwork_101.srt"
with open(output_srt_file, "w") as f:
f.write("\n".join(srt_content))

print(f"Subtitle file saved to {output_srt_file}")

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

No responses yet

Write a response