You are here

Our First Google Home Development Project, Part Two - Let's Get Technical

 

Welcome back!  If you missed the first post on our initial Google Home Development Project, you can find it here.  In Part Two, I will be diving deeper into the features of API.AI and the gotchas we ran into while developing the app. 

With any voice service, the first place you start is with a flowchart of your dialog.  You can use any of the flowchart services out there to design this.  Your flowchart can get very large very quickly so try to strive for the minimum viable product.  Below is an example flowchart we came up with for our quick Welcome Action.

Fresh Lines Welcome Greeter.png

 

Your first google home development project, API.AI or Actions SDK?

 
When you first start your Google Home Action you need to decide whether or not you will use the Actions SDK or API.AI. What we found is that API.AI already has the language processing (variable parsing) built in, which made it a little easier for us.  If you already have a voice service API or want to integrate with a third party app for language processing then by all means use the Actions SDK.  Just keep in mind for the Actions SDK that you get a single endpoint and every API call must go through that endpoint.  For complex apps, you’ll need to turn that single call into a router to make a well laid out app. 
 
For the rest of our post we are assuming we are using API.AI.
 

Intents

 
Using your flowchart, you will need to make an Intent for every step.  An Intent is an object that handles a statement spoken by the user.  It contains the listening text, the context in which listening text is valid, the parameters, and the response.  For phrases that are unique, just add the unique phrase, replace any variables with their correct variable name, build the response, and you’re done.  For phrases that require yes/no answers and other single word answers you will need to use that in context. 
 

Context

 
Context gives you the ability to handle similar phrases and wildcard phrases.  As an example we have two questions, “What is your name?” and “What is the name of your brother?”.  In this example both require a single word answer.  Here is where context comes in.  For the question “What is your name?”, you add the output context of “your-name”.  For the question “What is the name of your brother?”, you add the output context of “brothers-name”.  Now you create two separate Intents both handling a one word response but you put one with the input context of “your-name” and the other with the input context of “brothers-name”.  A gotcha here is you can put a persistence value on the output context.  It defaults to 5.  This means that for the next 5 spoken phrases by the user, this context will exist.  This can cause trouble if you ask two questions with similar answers within those 5 phrases.  Change this to 1 to make sure your next Intent is the right one.  
 

Parameters

 
Parameters allow you to grab data from the user’s phrases.  This data can be used in WebHooks to return answers from your API.  It can also be used to grab the attention of the user you are answering by letting them know you are listening.  The best way to obtain a variable from the user’s phrase is to use Template Mode.  In the “User Says” section of API.AI, click on the quote symbol to the left of your phrase.  It will turn into an @ symbol.  Now you are in Template Mode.  From here you can add variables into your phrases.  An example, “My name is @sys.given-name:user_name”.  Given-name is one of the many entity types that API.AI gives you.  It contains a list of common first names.  You can extend and add to an existing entity type or you can create your own.  Once you have your phrase, you need to add the variable to the parameters list below.  For the given phrase above the parameter line would look like (in JSON format) the following:

{required:true, prameter_name: user_name,entity: @sys.given-name,value: $user_name, is_list:false, prompts:[“Can you give us your name”,”give us your name please”, “what was that name again”]}

You can also add parameters that don’t show up in text if you want that variable to be passed to a WebHook or another Intent down the road.

Example

 
Below is an example of an Intent.  Notice the 4 sections.  At the top there are contexts and below the contexts are the “User Says” phrases where you use Template Mode to accept parameters.  Below the phrases, are the parameters that you collect and/or create.  Finally we have the response.  Notice how the variable is used in the response.

api_ai_intent.jpg

WebHooks

 
WebHooks allow you to respond with data from your API.  You enable your WebHook in the Fulfillment tab of API.AI.  There is a single API endpoint.  Again, knowing good coding techniques we are not going to crowd a single file with a ton of code.  We pass a variable called “webhook_route” that we then do a 307 redirect from our endpoint to that webhook_route.  Here is a bit of sample code to explain what we are talking about:
var webhook_route = req.body.result.parameters.webhook_route;
var status = 307;
var url = 'https://234bcfe3.ngrok.io' + '/' + webhook_route

 

Notice that we use ngrok.  API.AI will not work unless you have SSL setup even for test accounts.  Ngrok allows you to point a public SSL domains to your local webserver.  Information can be found at https://ngrok.com/docs#expose.

Below is a sample JSON query sent from API.AI to your endpoint.  To get the parameters from the JSON look to the contexts array.  The parameters for each context are stored as an array of key, value objects for the parameters key.

{
    "query": [
        "and for tomorrow"
    ],
    "contexts": [{
        "name": "weather",
        “parameters”: [{“wind”,”15”},{”waves”,”2-3 ft”}],
        "lifespan": 4
    }],
    "location": {
        "latitude": 37.459157,
        "longitude": -122.17926
    },
    "timezone": "America/New_York",
    "lang": "en",
    "sessionId": "1234567890"
}

 

We would also recommend using the authentication settings for production purposes.  Otherwise, anyone can get your data.  A response from your WebHook API should have the following format:

JSON.stringify({
      "speech": ssml,
      "displayText": display,
      "contexts": [],
      "source": "Weighmastery"
  })
 

The speech variable is what returns the text you want in reply to the user.  The displayText will be the text that gets displayed.  This is useful for testing the app on the command line without using voice.  Contexts is the outbound content set and the source in our case is our App Name.  You can find all the info about the query and response at https://docs.api.ai/docs/query.

Speech Synthesis Markup Language (SSML)

 
In the Speech variable above, we want to pass our text using the Speech Synthesis Markup Language (SSML).  This allows you to add pauses, say number strings as numbers, or even spell out words.  Here is a link to help learn more:  https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/speech-synthesis-markup-language-ssml-reference.
 

API.AI and Amazon Echo

 
API.AI can help with Amazon Echo integration.  They give a way to export your Intents, entities, and parameters.  Click on integrations --> Amazon Alexa --> Export and you will be given the files you need to get started with your Amazon integration.
 

Homophones and your Invocation Name

 
One of the gotchas we had was with our invocation name. The invocation name is used to start your app.  You say, “Talk to Weigh Master” and the Weighmastery Application launches.  We had a homophone in our name with the word Weigh.  It was parsing the word as “Way” which was not launching our app “Weigh Master”.  AT this point you can either change your invocation text to “Way Master”, you can change the invocation name to something without a homophone, or you can contact Google to see if they can change the priority of the homophone.
 

Training and Machine Learning in API.AI

 
When you app doesn’t recognize a phrase, API.AI will copy the phrase and give you an opportunity to tie it to an Intent or create a new Intent.  This is how you teach your app to use many phrases and is part of machine learning.  You just click on the training tab in API.AI and go through the list of phrases.  
 

The Simulator

 
To launch the simulator so you can test you app, in API.AI click on integrations and then click “Actions on Google”.  Fill out the form including the invocation name and the Google Developer Console project ID that you need to create before continuing.  Click Authorize.  When successful it will give you the option to preview and then deploy.  Click on preview and then quickly click on the “Google Home Web Simulator” button in the bottom right.  This will launch your simulator at which point you must either type or speak your invocation name.  If you make a change to your app you must launch the simulator every time.
 

Local Deployment to your Device

 
When you click preview as part of testing your app, if you have your physical Google Home device tied to your Google Home app on your mobile device, your app will deploy to the physical device as well.  This is a finite deploy and from what I am to understand deletes itself after 24 hours.  We found a great link to make your app persist for local deployments.  Check out http://stackoverflow.com/questions/41088596/make-google-actions-development-project-preview-persist-longer.  They basically copy the deployment code and setup a cron to run the preview every day you are in the office.

We have really enjoyed hearing our voice service app, Welcome, take shape.  It’s been a great experience and hope this article helps you get over some of the gotchas involved with your first app.  We can’t wait to see what comes of these new Actions on Google and want to be a part of building awesome new interactive functionality!  

Good luck with your first Voice Service interaction!

Fresh Lines is a Web and Mobile Development company that can help with Web Applications, Mobile Applications, Marketing Sites and Voice Services.

Contact Fresh::Lines Today!