Alexa is an Amazon cloud-based service available for tens of millions of Amazon devices and third parties that allows us to build voice experiences and offer users a more intuitive way to interact with the technology they use in their day-to-day lives.
After Alexa Dev Days Mexico (a very good experience by the way) we had the opportunity to start working on the creation of Skills for Amazon Alexa (We could think of a skill as the equivalent of an app on a mobile device).
Here we have a vision of the overall experience while developing these skills.
In the same way if you are ready to create your own skill, here we have some recommended steps:
- Create an account on AWS and on developer.amazon.com.
- The skills can be developed with the configuration of certain regions in AWS, for example N. Virginia or Ireland, it is important to verify that our AWS account is configured in those regions (The example skill is developed with a lambda function)
- Next, we can follow the Amazon tutorial on Github.
The experience with Alexa
While developing skills for Alexa we have started by using the web development environments which turned out to be simple to use and very focused on achieving the development goals. You would have a couple of them if you are working with lambda functions:
- The IDE for the voice model
- The IDE for the lambda function
Something that was pleasantly surprising is that general experience in the development of skills using lambda functions for the backend is very fluid, we will review a bit more in the next sections.
So basically, here is a brief explanation of the aspects we will take into account while developing an Alexa Skill:
- The Front-End
- The Back-End
- The testing of the result
Now, let’s get into it.
The Front-End (or VoiceEnd :D)
The Front-End part of the development is basically focused on generating a voice model that will be used by Alexa to understand what the user is intending to do.
Once you start a skill project, we would start with the definition of the statement that the user will use for Alexa to execute the actions you define. This is known as the Invocation phrase.
In our example, the skill we developed could be invoked by telling Alexa:
Alexa, start flights in Mexico City.
A brief note here, the skill we developed is set to be in Spanish, so that’s why the descriptions we are seeing in the screen are set to that language.
Once this is done, we can define the interactions (or sample utterances) that we will have within our skill (that is, the instructions that Alexa will listen to and the code that will be executed accordingly). These actions are grouped in Intents.
Within the utterances we can also define placeholders in case that we want to retrieve particular “variables” from the user.
With this information, the Alexa engine generates a learning model that through Deep Learning allows the service to respond to different language combinations that a user would commonly use. This means that even when the user does not say the exact phrases that we are giving as an example, Alexa is able to recognize them. It is very interesting how this layer of abstraction allows us to make a development that includes artificial intelligence so easily.
Once we have our Intents and our invocation we can build our model with the Build Model action.
The union of our Front-End with our Back-End is carried out with the definition of an End Point.
At runtime, once Alexa recognizes the intention (or action) that the user seeks to execute, an HTTP request is sent to the backend so that the defined process can be executed.
The Back-End
When we choose to use lambda functions as our backend the integration is really simple. We will just need to specify the urn of our lambda function inside the model builder (the front end) so the link between the two can be stablished.
Next, when we review the IDE we can see that we can edit the code and run tests all via web.
If we would like to have a more robust development experience, we also have the option to download the skill code via the Alexa command line. We would basically clone our skill, make our changes locally and then re-deploy the skill to the lambda function environment.
A good thing about developing backend functionality with Alexa is that you would have different programming language options, you could develop the skill with:
- Node.js
- Phyton
- Java
Speaking about the main code goal, we could summarize it to using a model of action handlers. The handlers will have the following structure:
- A method can handle: That at run time will determine if this handler will be responsible for providing the functionality to the particular Intent (This is determined by returning true as a response to canHandle).
- A handle method: That executes the actions to satisfy this Intent
Once the request is received from the Front-End, the backend goes through the different handlers until finding the right handler. Once it finds it, it executes the handle method and to generate a response that the user will hear from Alexa the responseBuilder is used. While using the responseBuilder we can specify the information that will be said by Alexa (.speak), show information on the devices with screen (.withSimpleCard), and if we expect for a user to have another interaction after Alexa responds we will use reprompt (.repromt).
Another cool thing from Alexa is that we can the tone or the form that certain words will be pronounced, we can say we can “format” this response using SSML.
At runtime, once the execution is finished, the Front-End receives the response to the initial execution and in its case the user will receive the Alexa voice response.
The result
Once generated our model and with the logic of the full backend we can start chatting with Alexa 😀
Some development tips
In general, the experience in the development environment is quite solid, however we found a couple of things to take into account:
- When testing in the Front-End (Testing tab) from time to time the skills seem to not follow the normal execution flow. The way to solve it can be to enable and disable the tests and refresh the page.
- If, when doing a deploy from the command line, and for some reason the update of a lambda function is interrupted, you might find some difficulties while trying to re-deploy your function. The way to solve this problem is to eliminate the temporary file that is generated in the local folder of the skill.
Conclusion
Definitely the experience of creating a voice skill for Alexa is an experience worth trying. In particular I like the simplicity to train the language models and the flexibility in the backend to use multiple programming languages and exploit them to the fullest.
I hope you have enjoyed this brief overview on how to develop Alexa skills, please let us know your opinions on your development experience working for Alexa and also let us know what skills you have published, it would be very interesting to be able to try them!
Thank you!
Best Regards!