Speech recognition and sentimental analysis are very important part of machine learning. In this tutorial, we will learn IBM Bluemix Speech to Text Transcription file in Python and copy those files to Hadoop ecosystem for further analysis. Once you have data in HDFS format you can torture the data to get the desired results.
In this post will walk you through creating speech to text transcription file using IBM Bluemix and copy that file to Hadoop HDFS.
IBM Bluemix Speech to Text Transcription in Python – Steps
Below are the some of the steps required to create speech to text transcription file.
- Create account with IBM Bluemix or login with IBM ID
- Get API credentials
- Create Speech to Text Transcription file
- Copy Transcription file to Hadoop HDFS
Create account with IBM Bluemix
In order to use the IBM Watson Bluemix, you need to create the services and credentials on IBM bluemix. Login with the IBM account or create one if you don’t have.
Get API credentials
Next, you need to create a Speech to Text service on Bluemix. Choose services and API, choose speech to text service and create API service credentials. Get those credentials from service credentials on the left side of Bluemix dashboard.
Create Speech to Text Transcription file
Next, you need create python script to create audio wave file to transcription text, for that you have call IBM Bluemix API with credentials that you have created in previous step.
Below is the sample script file that can be used to convert audio file to text:
import requests import json import sys import subprocess # IBM bluemix API url url = 'https://stream.watsonplatform.net/speech-to-text/api/v1/recognize' # bluemix authentication username username = '<username>' # bluemix authentication password password = '<password>' headers={'Content-Type': 'audio/wav'} # Open audio file(.wav) in wave format audio = open('/home/vithal/Documents/speeh-to-text/myfamily.wav', 'rb') r = requests.post(url, data=audio, headers=headers, auth=(username, password)) # create the json file out of with open('/home/vithal/Documents/speeh-to-text/sample.json', 'w') as f: sys.stdout = f print(r.text)
Copy Transcription file to Hadoop HDFS
The final step is to copy the json file which we have created in above steps. For that, you have to execute the HDFS commands with help of subprocess in python. Below are the sample scripts that can be used.
# Copy file to Hadoop HDFS subprocess.call(["hadoop","dfs","-copyFromLocal",local_file,tgt_hdfs_dir]);
# Display content of file subprocess.call(["hadoop","dfs","-cat",'/warehouse/speech-to-text/sample.json']);
Great tutorial, thank you for your effort:)
I followed the tutorial step by step but I am facing an SSL error issue in with your code..
requests.exceptions.SSLError: (“bad handshake: Error([(‘SSL routines’, ‘tls_process_server_certificate’, ‘certificate verify failed’)],)”,)
Any ideas where is the problem?
Hi,
Thank you.
Not sure about the error but please check API that you have got from Bluemix and if you are trying it from your office network, try it from outside of your office private network.
Let me know how it goes.
Thanks