+

Speech Recognition in Python Using Google Speech API

Speech recognition is an important feature in several used applications such as home automation, artificial intelligence, etc. The purpose of this article — give an idea of ​​how to use the SpeechRecognition library in Python. This is useful as it can be used on microcontrollers such as the Raspberri Pis with an external microphone.

Required settings

The following must be installed :

  1. Python Speech Recognition Module:
     sudo pip install SpeechRecognition 
  2. PyAudio: use the following command for Linux users
     sudo apt-get install python-pyaudio python3-pyaudio 

    If the versions in the repositories are too old, install pyaudio with the following command

     sudo apt- get install portaudio19-dev python-all-dev python3-all-dev & amp; & amp; sudo pip install pyaudio 

    Use pip3 instead of pip for python3. 
    Windows users can install pyaudio by executing the following command in the terminal

     pip install pyaudio 

Speech input using microphone and speech-to-text translation

  1. Configure microphone (for external microphones): It is recommended to specify the microphone during the program to avoid crashes. 
    Enter lsusb in the terminal. A list of connected devices appears. The microphone name will look like this
     USB Device 0x46d: 0x825: Audio (hw: 1, 0) 

    Write this down as it will be used in the program.

  2. Set the size of the block: basically this indicated how many bytes of data we want to read at the same time. Typically, this value is specified in powers of 2, such as 1024 or 2048
  3. Set Sampling Rate: The sampling rate determines how often values ​​are written for processing
  4. Set the device ID for the selected microphone : in this step we specify the device ID of the microphone we want to use to avoid ambiguity in the case of multiple microphones. It also helps with debugging, in the sense that while the program is running, we know if the specified microphone is recognized. During the program, we specify the device_id parameter. The program will say that the device_id cannot be found if the microphone is not recognized.
  5. Allow setting for ambient noise: since as the ambient noise changes, we must let the program for a second or too adjust the energy threshold of the recording so that it is adjusted according to the level of external noise.
  6. Speech-to-text translation: this is done using Google Speech Recognition. This requires an active internet connection to work. However, there are some standalone recognition systems like PocketSphinx, but they have a very strict installation process that requires several dependencies. Google Speech Recognition is one of the easiest to use.
  7. The above steps were implemented below:

    # Python 2.x recognition program speech

     

    import speech_recognition as sr

      
    # enter the name of the USB microphone that you found
    # using lsusb
    # the following name is used as an example only

    mic_name = "USB Device 0x46d: 0x825: Audio (hw : 1, 0) "

    # Sample rate is the rate at which values ​​are written.

    sample_rate = 48000

    # Chunk is like a buffer. It stores 2048 samples (data bytes)
    # Here.
    # It is recommended to use powers of 2 like like 1024 or 2048

    chunk_size = 2048

    # Initialize recognizer

    r = sr.Recognizer ()

     
    # create a list of all sound cards / microphones

    mic_list = sr.Microphone.list_microphone_names ()

     
    # the next loop is to set the microphone device ID which
    # We specifically want to use it to avoid meaningfulness.

    for i, microphone_name in enumerate (mic_list):

    if microphone_name = = mic_name:

    device_id = i

     
    # use microphone as input source. Here we also specify
    # which device ID specifically for finding a microphone
    # does not work, a message will appear "Device_id undefined"

    with sr.Microphone (device_index = device_id, sample_rate = sample_rate, 

      chunk_size = chunk_size) as source:

    # wait a second for the resolver to adjust

    # energy threshold based on ambient noise

      r.adjust_for_ambient_noise (source)

      print "Say Something"

    # listens for user input

    audio = r.listen (source)

     

    try :

      text = r.recognize_google (audio)

    print "you said:" + text

     

    # error in occurs when Google cannot understand what was said

     

      except sr.UnknownValueError:

    print ( "Google Speech Recognition could not understand audio" )

     

    except sr.RequestError as e:

      print ("Could not request results from Google 

    Speech Recognition service; { 0 } ". format (e))

    Rewrite audio file to text

    If we have an audio file that we want to translate into text, we just have to replace the source with an audio file instead of a microphone.
    Put the audio file and program in one folder for convenience. This works for WAV, AIFF, FLAC files.
    Implementation was shown below

    # Python 2.x program for decrypting audio file

    import speech_recognition as sr

     

    AUDIO_FILE = ( "example.wav" )

     
    # use audio file as source sound

     

    r = sr.Recognizer ()

      
    with sr.AudioFile (AUDIO_FILE) as source:

    # reads an audio file. Here we use post instead of

    # Listen

    audio = r.record (source) 

     

    try :

    print ( "The audio file contains:" + r.recognize_google ( audio))

     

    except sr.UnknownValueError:

    print ( "Google Speech Recognition could not understand audio" )  

     

    except sr. RequestError as e:

    print ( "Could not request results from Google Speech 

    Recognition service; { 0 } ". format (e))

Troubleshooting

The following problems are commonly encountered

  1. Muted microphone: This causes no input to be accepted. check it out, you can use alsamixer.
    Can be installed with
     sudo apt-get install libasound2 alsa-utils alsa-oss 

    Like amixer . Output would look something like this

     Simple mixer control `Master`, 0 Capabilities: pvolume pswitch pswitch-joined Playback channels: Front Left - Front Right Limits: Playback 0 - 65536 Mono: Front Left: Playback 41855 [64% ] [on] Front Right: Playback 65536 [100%] [on] Simple mixer control `Capture`, 0 Capabilities: cvolume cswitch cswitch-joined Capture channels: Front Left - Front Right Limits: Capture 0 - 65536 Front Le ft: Capture 0 [0%] [off] #switched off Front Right: Capture 0 [0%] [off] 

    As you can see, the capture device is currently off. To enable it, type alsamixer
    As you can see in the first image, it displays our playback devices. Press F4 to switch to device capture.

    In the second image, the highlighted portion indicates that the capture device is disabled. Press spacebar to enable audio

    As seen in the last image, the highlighted part confirms that the capture device is not disabled.

  2. The current microphone is not selected as a capture device:
    In this case, the microphone can be configured by typing alsamixer and selecting sound cards. Here you can choose the default microphone. 
    As shown in the figure, the highlighted part — this is where you should choose your sound card. 

    The second image shows the screen selection for sound card

  3. No internet connection: An active internet connection is required to convert speech to text.

This article is provided by Deepak Srivatsav . If you are as Python.Engineering and would like to contribute, you can also write an article using contribute.python.engineering or by posting the article [email protected] ... See my article appearing on the Python.Engineering homepage and help other geeks.

Please post comments if you find anything wrong or if you would like to share more information on the topic discussed above.

Get Solution for free from DataCamp guru