IBM Bluemix Speech TO Text Transcription in Python – Tutorial

  • Post author:
  • Post last modified:February 28, 2018
  • Post category:General
  • Reading time:3 mins read

Speech recognition and sentimental analysis are very important part of machine learning. In this tutorial, we will learn IBM Bluemix Speech to Text Transcription file in Python and copy those files to Hadoop ecosystem for further analysis. Once you have data in HDFS format you can torture the data to get the desired results.

In this post will walk you through creating speech to text transcription file using IBM Bluemix and copy that file to Hadoop HDFS.

IBM Bluemix Speech to Text Transcription in Python – Steps

Below are the some of the steps required to create speech to text transcription file.

  • Create account with IBM Bluemix or login with IBM ID
  • Get API credentials
  • Create Speech to Text Transcription file
  • Copy Transcription file to Hadoop HDFS

Create account with IBM Bluemix

In order to use the IBM Watson Bluemix, you need to create the services and credentials on IBM bluemix. Login with the IBM account or create one if you don’t have.

ibm bluemix speech to text transcription account-login 

Get API credentials

Next, you need to create a Speech to Text service on Bluemix. Choose services and API, choose speech to text service and create API service credentials. Get those credentials from service credentials on the left side of Bluemix dashboard.

ibm bluemix speech to text transcription service-and-api

ibm bluemix speech to text transcription get-service-credentials

Create Speech to Text Transcription file

Next, you need create python script to create audio wave file to transcription text, for that you have call IBM Bluemix API with credentials that you have created in previous step.

Below is the sample script file that can be used to convert audio file to text:

import requests 
import json 
import sys 
import subprocess

# IBM bluemix API url 
url = 'https://stream.watsonplatform.net/speech-to-text/api/v1/recognize'

# bluemix authentication username 
username = '<username>'

# bluemix authentication password 
password = '<password>'

headers={'Content-Type': 'audio/wav'}

# Open audio file(.wav) in wave format 
audio = open('/home/vithal/Documents/speeh-to-text/myfamily.wav', 'rb')

r = requests.post(url, data=audio, headers=headers, auth=(username, password))

# create the json file out of 
with open('/home/vithal/Documents/speeh-to-text/sample.json', 'w') as f: 
 sys.stdout = f 
 print(r.text)

Copy Transcription file to Hadoop HDFS

The final step is to copy the json file which we have created in above steps. For that, you have to execute the HDFS commands with help of subprocess in python. Below are the sample scripts that can be used.

# Copy file to Hadoop HDFS 
subprocess.call(["hadoop","dfs","-copyFromLocal",local_file,tgt_hdfs_dir]);
# Display content of file 
subprocess.call(["hadoop","dfs","-cat",'/warehouse/speech-to-text/sample.json']);

This Post Has 2 Comments

  1. Hawary

    Great tutorial, thank you for your effort:)

    I followed the tutorial step by step but I am facing an SSL error issue in with your code..

    requests.exceptions.SSLError: (“bad handshake: Error([(‘SSL routines’, ‘tls_process_server_certificate’, ‘certificate verify failed’)],)”,)

    Any ideas where is the problem?

    1. Vithal S

      Hi,

      Thank you.

      Not sure about the error but please check API that you have got from Bluemix and if you are trying it from your office network, try it from outside of your office private network.

      Let me know how it goes.

      Thanks

Comments are closed.