Personal voice assistant in Python



An embedded assistant can open an application (if installed on the system), search for a query on Google, Wikipedia and YouTube, calculate any math question, etc., simply by typing a voice command . We can process data as needed or add functionality depending on how we code things.

We use the Google Speech Recognition API and Google Text to Speech for voice input and output, respectively.
You can also use the WolframAlpha API to calculate a mathematical expression.
Playsound package is used to play saved mp3 sound from the system.

External Python package Requirements:

– & gt; gTTS – Google Text To Speech, for converting the given text to speech
– & gt; speech_recognition – for recognizing the voice command and converting to text
– & gt; selenium – for web based work from browser
– & gt; wolframalpha – for calculation given by user
– & gt; playsound – for playing the saved audio file.
– & gt; pyaudio – for voice engine in python

Well, let`s start with the code. We`ll split each function as a single code for easy understanding.

Here is the main function with get_audio () and assistant_speaks . get_audio () is designed to receive audio from a user using a microphone, the phrase limit is set to 5 seconds (you can change it). The Assistant function says to provide output consistent with the processed data.

# import speech recognition package from Google API

import speech_recognition as sr 

import playsound # play saved mp3 file

from gtts import gTTS # Google text to speech

import os # save / open files

import wolframalpha # to calculate strings in formula

from selenium import webdriver # control browser operations

 

num = 1

def assistant_speaks (output):

global num

 

  # num to rename each audio file

# with a different name to disambiguate

  num + = 1

print ( "PerSon:" , output)

 

toSpeak = gTTS (text = output, lang = `en` , slow = False )

# save audio file given by Google text to speech

file = str (num) + ". mp3 

toSpeak.save ( file )

  

# playsound package is used to play the same file.

playsound.playsound ( file , True

  os .remove ( file )

 

 

 

def get_audio ():

 

rObject = sr.Recognizer ()

audio = ``

 

with sr.Microphone () as source:

print ( " Speak ... " )

 

# recording audio using speech recognition

audio = rObject.listen (source, phrase_time_limit = 5

print ( " Stop. " ) # 5 second limit

 

  try :

  

text = rObject.recognize_google (audio, language = `en-US` )

print ( "You:" , text)

return text

 

except :

  

assistant_speaks ( "Could not understand your audio, PLease try again! " )

  return 0

  

 
Driver code

if __ name__ = = "__ main __" :

assistant_speaks ( "What`s your name, Human?" )

name = `Human`

name = get_audio ( )

assistant_speaks ( "Hello, " + name + ` .` )

 

while ( 1 ):

 

  assistant_speaks ( "What can i do for you?" )

text = get_audio (). lower ()

 

if text = = 0 :

continue

 

if "exit" in str (text) or "bye" in str (text) or " sleep " in str (text):

assistant_speaks ( "Ok bye," + name + ` .` )

break

  

# calling process text to process the request

process_text (text)

So, we have an idea of ​​how we sound the device and take part from the user. The next step and the main step — how you want to process your input. This is just basic code, many other algorithms (NLP) can be used to process the text appropriately. We made it static.

Also Wolframalpha api was used to calculate the computational part.

def process_text ( input ):

try :

if `search` in input or `play` in input :

# main web crawler using selenium

s earch_web ( input )

return

 

  elif "who are you" in input or "define yourself" in input :

speak = “Hi, I`m human. Your personal assistant.

I`m here to make your life easier. You can order me to perform

various tasks like counting amounts or opening apps, etc. etc. & # 39; & # 39; & # 39;

assistant_speaks (speak)

return

 

elif "who made you" in input or "created you" in input :

speak = "I have been created by Sheetansh Kumar. "

  assistant_speaks (speak)

return

  

elif "pythonengineering" in input : # only

speak = & quot; & quot; Geeks for Geeks is the best online learning platform & quot; & quot;

assistant_speaks (speak)

return

 

elif "calculate" in input . lower ():

  

  # write your wolframalpha app_id here

app_id = "WOLFRAMALPHA_APP_ID"  

client = wolframalpha.Client (app_id)

 

indx = input . lower (). split (). index ( `calculate` )

query = input . split () [indx + 1 :]

res = client.query ( `` . join (query))

answer = next (res.results) .text

  assistant_speaks ( "The answer is" + answer)

return

 

elif `open` in input :

 

# more one function to open

# another app available

open_application ( input .lower ()) 

return

 

  else :

  

assistant_speaks ( "I can search the web for you, Do you want to continue? " )

  ans = get_audio ()

if `yes` in str (ans) or `yeah` in str (ans):

  search_web ( input )

else :

return

except :

 

assistant_speaks ( "I don`t understand, I can search the web for you, Do you want to continue? " )

  ans = get_audio ()

  if `yes` in str (ans) or ` yeah` in str (ans):

search_web ( input )

Now we have processed the input, it`s time to act!

search_web two functions: search_web and open_application .

search_web & # 8212 ; it`s just a scanner that uses a selenium package to process it. It can search Google , Wikipedia and can open YouTube . You just need to provide a name and it will open in the Firefox browser. For other browsers, you need to install the corresponding browser package in Selen. Here we are using the web driver for Firefox.

open_application — it is just a function that uses the os package to open an application present on the system.

def search_web ( input ):

 

driver = webdriver.Firefox ()

driver.implicitly_wait ( 1 )

  driver.maximize_window ()

  

if `youtube` in input . lower ():

 

assistant_speaks ( "Opening in youtube" )

indx = input . lower (). split (). index ( `youtube` )

  query = input . split () [indx + 1 :]

driver.get ( " http://www.youtube.com/results?search_query =" + ` + ` . join (query))

  return

 

  elif `wikipedia` in input . lower ():

 

assistant_speaks ( "Opening Wikipedia" )

indx = input . lower (). split (). index ( `wikipedia` )

query = input . split () [indx + 1 :]

driver.get ( " https://en.wikipedia.org/wiki/ " + `_` . join (query))

return

 

else :

 

if `google` in input :

 

indx = input . lower (). split (). index ( `google` )

  query = input . split () [indx + 1 :]

driver.get ( " https://www.google.com/search?q =" + `+` . join (query))

 

elif `search` in input :

 

indx = input . lower (). split (). index ( `google` )

  query = input . split () [indx + 1 :]

driver.get ( " https://www.google.com/search?q = " + `+` . join (query))

 

else :

 

driver.get ( " https://www.google.com/search?q = " + `+` . join ( input . split ( )))

undefined spaces ">  

  elif ` search` in input :

 

indx = input . lower (). split (). index ( `google` )

  query = input . split () [indx + 1 :]

driver.get ( " https://www.google.com/search?q = " + `+` . join (query))

 

else :

 

driver.get (