Transcribing Audio to Subtitles Using OpenAI’s Whisper Model on Google Colab

3 min readMay 30, 2024

In this post, I’ll walk you through the process of transcribing an audio file into subtitles using OpenAI’s Whisper model. We’ll use Google Colab for the environment, leveraging its GPU capabilities to speed up the transcription process.

Step 1: Setting Up the Environment

First, we need to mount Google Drive to access the audio files and save the generated subtitle files.

# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Next, we’ll install the necessary packages. OpenAI’s Whisper model and PyTorch are required for transcription and GPU acceleration, respectively.

# Install required packages
!pip install git+https://github.com/openai/whisper.git
!pip install torch

Step 2: Loading the Model

We’ll import the necessary libraries and check if a CUDA-enabled GPU is available. This helps in leveraging the power of GPU for faster transcription.

import whisper
import torch
import os

# Check if a CUDA-enabled GPU is available
device = "cuda" if torch.cuda.is_available() else "cpu"

Load the Whisper model and move it to the GPU if available.

# Load the Whisper model and move it to the GPU if available
model = whisper.load_model("large", device=device)

Step 3: Transcribing the Audio File

Specify the path to the audio file stored on Google Drive. In this example, the file is named 1_2_nerwork_101.wav and is located in the awsbc9_source directory.

# Specify the path to the audio file on Google Drive
audio_file = "/content/drive/MyDrive/awsbc9_source/1_2_nerwork_101.wav"

# Set the input language to Turkish
input_language = "tr"  # Turkish language code

Transcribe the entire audio file with fp16 enabled and specified language. This step converts the audio into text.

# Transcribe the entire audio file with fp16 enabled and specified language
result = model.transcribe(audio_file, fp16=False, language=input_language)

Step 4: Creating the SRT File

We need a helper function to format the timestamps correctly for the SRT (SubRip Subtitle) file.

# Helper function to convert seconds to SRT timestamp format
def format_timestamp(seconds):
    milliseconds = int((seconds % 1) * 1000)
    seconds = int(seconds)
    minutes, seconds = divmod(seconds, 60)
    hours, minutes = divmod(minutes, 60)
    return f"{hours:02}:{minutes:02}:{seconds:02},{milliseconds:03}"

Create the SRT file content by iterating over the transcription segments and formatting them into the SRT format.

# Create the SRT file content
srt_content = []
for i, segment in enumerate(result["segments"]):
    start_time = format_timestamp(segment["start"])
    end_time = format_timestamp(segment["end"])
    text = segment["text"].strip()
    srt_content.append(f"{i + 1}")
    srt_content.append(f"{start_time} --> {end_time}")
    srt_content.append(text)
    srt_content.append("")

Finally, write the SRT file to Google Drive.

# Write the SRT file to Google Drive
output_srt_file = "/content/drive/MyDrive/awsbc9_source/1_2_nerwork_101.srt"
with open(output_srt_file, "w") as f:
    f.write("\n".join(srt_content))

print(f"Subtitle file saved to {output_srt_file}")

Conclusion

By following these steps, you can transcribe audio files into subtitles using OpenAI’s Whisper model on Google Colab. This process leverages the GPU for faster transcription and allows you to save the output directly to Google Drive.

Feel free to modify the code to suit your needs, such as changing the input language or the location of your files.

Source

github

# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Install required packages
!pip install git+https://github.com/openai/whisper.git
!pip install torch

import whisper
import torch
import os


# Check if a CUDA-enabled GPU is available
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the Whisper model and move it to the GPU if available
model = whisper.load_model("large", device=device)

# Specify the path to the audio file on Google Drive
audio_file = "/content/drive/MyDrive/awsbc9_source/1_2_nerwork_101.wav"

# Set the input language to Turkish
input_language = "tr"  # Turkish language code

# Transcribe the entire audio file with fp16 enabled and specified language
result = model.transcribe(audio_file, fp16=False, language=input_language)

# Helper function to convert seconds to SRT timestamp format
def format_timestamp(seconds):
    milliseconds = int((seconds % 1) * 1000)
    seconds = int(seconds)
    minutes, seconds = divmod(seconds, 60)
    hours, minutes = divmod(minutes, 60)
    return f"{hours:02}:{minutes:02}:{seconds:02},{milliseconds:03}"

# Create the SRT file content
srt_content = []
for i, segment in enumerate(result["segments"]):
    start_time = format_timestamp(segment["start"])
    end_time = format_timestamp(segment["end"])
    text = segment["text"].strip()
    srt_content.append(f"{i + 1}")
    srt_content.append(f"{start_time} --> {end_time}")
    srt_content.append(text)
    srt_content.append("")

# Write the SRT file to Google Drive
output_srt_file = "/content/drive/MyDrive/awsbc9_source/1_2_nerwork_101.srt"
with open(output_srt_file, "w") as f:
    f.write("\n".join(srt_content))

print(f"Subtitle file saved to {output_srt_file}")

Transcribing Audio to Subtitles Using OpenAI’s Whisper Model on Google Colab

Step 1: Setting Up the Environment

Step 2: Loading the Model

Step 3: Transcribing the Audio File

Step 4: Creating the SRT File

Conclusion

Source

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Sinan Artun

No responses yet