Simple and easy way to convert video into audio then extract text using Google Speech Recognition API.
In this article, We will extract speeches using google api from videos. After extracting the speeches we will convert them into a text file. It is going to be a simple machine learning task using google speech recognition library. Speech Recognition is widely used nowadays under machine learning concepts. Speech Recognition is also used in many fields.
For example, the subtitles that we see on Amazon prime, Netflix, and YouTube videos are best examples of using Artificial Intelligence in Speech Recognition.
We are going to use two libraries for this conversion task.
Before we start, let's install them if you haven't installed them yet. Installing a module library is very easy in python. We can even install libraries in one or two lines of code.
Run the following cmd in your terminal window:-
pip install SpeechRecognition moviepy
SpeechRecognition module supports multiple APIs for recognition, Google Speech API we are going to use here. You can learn more about the module from here.
MoviePy is a module library that can read and write almost each type of audio and video formats, including GIF’s.
Now, start the most important part i.e code. Open your editor and start by importing the libraries.
Now, we will convert the video into an audio file. Lot’s of video formats are available, some of them MP4, 3GP, OGG, WMV etc. Let’s also take a look in some audio formats. Here are some of them MP3, AAC, WMA, AC3 (Dolby Digital) etc. We should know our video’s format to do the conversion without any problem.
Now, start the conversion using MoviePy library. It is going to be very easy.
I recommend converting it to wav format. It works great with the speech recognition library, which will be covered in the next step.
In this step, we will convert the audio into the text using “recognize_google” API. Finally we save the text file i.e. recognized.txt .
If you are getting error like broken pipe then reduce the duration of audio file or use “.subclip(starttime, endtime)” as shown in code. Still face any other error let me know in comment section.
We did it! We have finally got our text. We have created a program that converts a video into an audio file and then extracts the speech from that audio. And lastly, exporting the recognized speech into a text document.
Here is the Complete code:-
Hoping that you enjoyed reading this post and working on the project.
I hope you have learned something new today. Working on hands-on programming projects like this one is the best way to sharpen your coding skills.
Link to complete project on Github.
Thanks for reading!