Export text from the video with Python

Stokry - Sep 30 '20 - - Dev Community

In today's post, I will show you how can you export text from the video. We are going to use SpeechRecognition: This is a library for or performing speech recognition with the Google Speech Recognition API.
Also, we will be using moviepy library. MoviePy is a Python library for video editing: cutting, concatenations, title insertions, video compositing (a.k.a. non-linear editing), video processing, and creation of custom effects. MoviePy can read and write all the most common audio and video formats, including GIF, and runs on Windows/Mac/Linux, with Python 2.7+ and 3 (or only Python 3.4+ from v.1.0).
Let's start

import speech_recognition as sr
import moviepy.editor as me
Enter fullscreen mode Exit fullscreen mode

We need to specified, video_file, output_audio_file, and output_text_file

VIDEO_FILE = "test.mp4"
OUTPUT_AUDIO_FILE = "converted.wav"
OUTPUT_TEXT_FILE = "recognized.txt"
Enter fullscreen mode Exit fullscreen mode

The concept will be like this: the script will convert the mp4 file to a wav file, and from that file, it will output text file.
Let's do that - Extracting audio from video

Enter fullscreen mode Exit fullscreen mode

The next thing we need to do is define the recognizer.

recognizer =  sr.Recognizer()
Enter fullscreen mode Exit fullscreen mode

We need to import audio file for recognition

audio_clip = sr.AudioFile("{}".format(OUTPUT_AUDIO_FILE))
Enter fullscreen mode Exit fullscreen mode

Now the magic begins - we will start the conversion to text

    with audio_clip as source:
        audio_file = recognizer.record(source)
    print("Please wait ...")

    result = recognizer.recognize_google(audio_file)

    with open(OUTPUT_TEXT_FILE, 'w') as file:
        print("Speech to text conversion successfull.")

except Exception as e:
    print("Attempt failed -- ", e)
Enter fullscreen mode Exit fullscreen mode

This is the whole code:

import speech_recognition as sr
import moviepy.editor as me

VIDEO_FILE = "video.mp4"
OUTPUT_AUDIO_FILE = "converted.wav"
OUTPUT_TEXT_FILE = "recognized.txt"
    video_clip = me.VideoFileClip(r"{}".format(VIDEO_FILE))
    recognizer =  sr.Recognizer()
    audio_clip = sr.AudioFile("{}".format(OUTPUT_AUDIO_FILE))
    with audio_clip as source:
        audio_file = recognizer.record(source)
    print("Please wait ...")
    result = recognizer.recognize_google(audio_file)
    with open(OUTPUT_TEXT_FILE, 'w') as file:
        print("Speech to text conversion successfull.")
except Exception as e:
    print("Attempt failed -- ", e)
Enter fullscreen mode Exit fullscreen mode

For longer videos, you can split audio data into chunks.

This is the video that I use for testing purposes: video.
The video is originally uploaded to Youtube and you can find it here: Youtube link.

Thank you all.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player