Inclusion and trust in Smart Home Speaker environments
Design team: Jana Thompson, Isabella Su, and Ali Hoss
Originally published on Medium on 06.29.2021
(Image credit: photo by Davide Boscolo on Unsplash)
With the 2014 release of Amazon Alexa and 2016 release of Google Home, if felt as though the sci-fi future of Ubiquitous Computing was becoming reality. [1] For many users, the promise of this future has fallen short, and many users do not highly utilize this device due to issues of presumed universality in the design of both the Automatic Speech Recognition and taxonomies on culturally-specific artifacts such as music. [2]
Our team’s original objective was to identify the interaction styles people have with their smart home speakers, what they use them for, and why they choose to use them and to examine how one would think of a design system for a Voice User Interface (VUI) within the framework of Atomic Design.[3]
User Research
Methodology
We conducted a five-day diary study with six participants using Google Surveys to capture the highlights of each user’s daily usage of their smart home speaker. The six participants were chosen from a pool of applicants by posting on social media where the qualification was that they both owned at least one smart home speaker and used it commonly at least once per day.
The diary study included three weekdays and one full weekend to get a picture of typical usage both on weekdays and weekends. We limited our time scale due to the fact we self-financed our research and could not pay the amount that a participant in a longer diary study would deserve for the amount of work and attention to detail required. We did compensate participants with $20 gift cards upon request.
The information in the diary study was supplemented by usage logs from the mobile apps for their smart home speakers as voluntarily supplied by participants.
At the end of the five day study, we conducted user interviews with each participant on why they acquired their device, what they had hoped to use it for, and their frustrations with usage.
Quotes from diary study participants (visual created by Ali Hoss)
Results
At the end of the five day study, we had spreadsheet results from each day of survey results and recordings and notes from each follow up interview. The team then held a workshop to look at how to both synthesize the research data and find actionable insights in the data. We began the workshop by importing the qualitative and quantitative results into Miro for our virtual workshop via Zoom.
Diary study information imported into Miro
We first began with affinity mapping to find common themes in the data. Very quickly, we found that the majority of users had issues with their device understanding them. The main causes for this were due to accents, a lack of understanding due to either a lack of understanding due to no cultural knowledge in the system or a misinterpretation due to other factors (“India news” versus “Indy 500”), background noise, or complexity of commands.
Affinity diagram by Isabella Su, Ali Hoss, and Jana Thompson
After completing affinity diagramming, the team moved on to user needs statements. All members of the team agreed that “an experienced user of a smart home speaker needs to have a more personalized experience with their device.” The challenge came with defining how to bring that about. We attempted a How Might We…? statement activity, but we quickly abandoned that to simply brainstorm and discuss what the goal of this experienced user was and how to bring it about. After discussing for about an hour, we came up with the following insights:
personalized set up to ask for music, where they were from (to identify any accent), and having a specific set up mode
this set up should be a conversation like a person when you are getting to know someone, and people learn facts about one another when they get to know each other.
if the assistant made a mistake, how could you have a better fallback than current ones, by using the ways humans rectify situations of misunderstanding in conversation.
Finally, we storyboarded a potential idea for a new feature for a smart home assistant to “get to know” a user using Storyboard This, expressing how we believe this new feature could lead to greater user delight.
Storyboard and conversation by Isabella Su
Additionally, we recognized that the solution we proposed would require possibly better ASR than currently exists [4] or is used in deployment with smart home assistant systems and would require richer or fuller taxonomies of music and cultural practices, or if these already exist for specific countries and societies, they should be adapted for global use to recognize the diversity and mobility of modern societies beyond the statistical number game. Such work has been investigated by researchers in UX and AI at Spotify, one of the systems that Google Home can connect with to provide music content for their users. [5]
Design System
As part of our exploration into smart home assistants, we began work on an atomic design system framework for Google Home’s VUI. Based on input from experienced VUI designers, secondary research, and primary investigation with devices, we developed a partial preliminary system by that incorporated not only the voice interactions themselves, but the personality of the available voices for English and the physical aspects of the Google Home Mini device. We did not look at any system with a screen, the associated mobile app, and languages beyond English to keep our project within a reasonable scope, but we acknowledge that these are part of a larger design system that should be addressed when examining smart home speakers and assistants.
Linguistic Choice
As part of the meta-design system, the user has some freedom of choice with regards to language and voice for use in their interactions with their smart home speakers. They must choose one or more languages for interacting with their smart home assistant. In the full system, we listed all that was available as of late May 2021 for Google Home. Within each language, there are voices available. English had a large number compared to German, for instance, with English having ten voices available, and German having two. We discuss the specifics of voice below.
Languages and voice diagram in derived design system by Jana Thompson
Personality
While an app or product persona is considered a part of developing a design for a brand, personality for a VUI is more than just a nice-to-have. Humans have biases and judgments about other people and things that we perceive to have agency. VUIs are especially susceptible to these biases.
One aspect that has been a focus of study for perception of a VUI is the perceived gender of such a voice. [6], [7] Gender perception is based on measurements of voice ranges in megahertz (MHz), and a full investigation of the voices used for English in Google Home is beyond the scope of the project. The gender perceptions in the system rely upon the judgment of the researcher. Additionally, accents are also a matter of measurements, but again, in this research, we relied upon the judgment of the researcher.
Elements of Voice in derived design system by Jana Thompson
Additionally, based upon research [8] on how personality is measured by designers when creating VUIs, we created sliders to describe voice personality using Tone. Based on [8], we measured tone measured on four continuums for each voice:
funny to serious
casual to formal
irreverent to respectful
enthusiastic to matter-of-fact
Again, for Tone we also relied on researcher judgment rather than on more quantitative measurements. In future work, this should be based on a broad set of user perceptions rather than solely on researcher judgment, as such judgments are highly subject to individual and social biases.
Usage and Context
The rest of voice-specific design system can be broken into two systems, one of which is nested inside the other. The language-independent aspects of usage and context are Features, Templates, and Intents. Features, within the context of Google Home, would be equivalent to an Action or within the Alexa/Echo ecosystem, a Skill. Features are composed of Templates, which are groups of related specific types of conversations. Templates are composed of Intents, which are specific groups of smaller more directed pieces of conversation, such as finding a user’s location or where they are from, while the Template LOCATION would be composed of all of these potential intents within a larger Feature or Action.
Template for Daily Routine conversations in design system by Jana Thompson
The smaller elements within the Organisms of Intents are language-dependent elements of the design system, as these are specific to the interactions between the smart home assistant and user in a situation and these interactions are communicated within a specific language.
Atoms and Molecules in Google Home design system by Jana Thompson and Isabella Su
The final aspect of the language-independent and language-dependent design is the error handling. Google Home has a Success, Fallback, and Failure built into the error handling for the voice assistant.
Physical Aspects
The physical aspects of the Google Home fall into two categories with regards to use: those that help the user interpret what the smart home assistant is doing, and I/O controls.
Visual Status Cues
Google Home has a set of visual cues as illustrated below to indicate to the user what its current status is.
Visual cues of a Google Home Mini by Ali Hoss
I/O Controls
Google Home Mini has both touch points on the speaker itself for control output and a physical toggle on the bottom of the device to control input.
Diagram of physical features in a design system of a Google Home Mini by Ali Hoss
Some refinement of the design system is still required in its current iteration so as to allow designers that would use it ways to define and translate aspects of the design for conversation developers using Google’s Action Console for development. Our initial system was improved upon by input by both Damien Dubrowski from Dee VUI and Diana Mundo Spataro.
Proposed features
Based on user research with frustrations with their devices not comprehending what the users are saying or understanding cultural context, as well as secondary research such as [1] and [11], the team decided to design two new features: Get to Know Me, a set-up conversation where a user could have an introductory conversation with their device so that the system could develop a personalized profile of each user to aide in helping with music choices, news, and other potential uses of their smart home assistant, and Add to Favorites, a feature that allowed the smart home assistant to “get to know” the user better at any point.
Additionally, to try and mitigate user frustration, we included additional fallback strategies as discussed below. Also, as noted in [12], and by several of our participants, trust and security are issues with smart home assistants, and we tried to account for that in our design system as well.
Finally, many users simply didn’t seem that enthused about their devices. As many had received them as gifts, they used them, but didn’t express a great deal of enthusiasm with them. By developing better habits of an assistant, by being helpful, trustworthy, and a good listener.
Get to Know Me
Close up of flow diagram intro for Get to Know Me by Jana Thompson
Get to Know Me is the primary new feature we propose. Upon the first boot up, the Get to Know Me conversation would begin and the system can get to know the user to personalize the assistant for them. As this particular feature is complex, we will only touch upon key points of this conversational feature below. Please contact the author(s) for full details of the design flow and feature system.
Setup with personalized settings with common categories
The setup conversation is designed to ask a user specifically about the uses most common for smart home assistants: music, daily routines, and home automation.
Multimodal options
For things that would be more difficult to set up or listen to via voice, the user is either prompted to go to the app or given the choice between the app and a voice interface.
Location
Location template in design of Get to Know Me by Jana Thompson
Going beyond mere localization, this setup conversation is meant to allow for a user to be able to give a history of places that they have lived (right now, the system only has current locale and where a user grew up into the system) that would allow the system to optimize for accents (dependent upon improved ASR), and understanding of cultural background to optimize the backend search space for user searches.
Skip a question
This returns some control for the user (this will be discussed further in the section Trust below). If a user doesn’t want to go into details at that time or divulge information, they can skip any and all questions.
Add to Favorites
Flow diagram of Add to Favorites by Jana Thompson and Isabella Su
A much smaller feature, this is designed for a user to add any information that can be discussed in the initial Get to Know Me conversation. It is invoked with “Hey Google, get to know me.”
Animated gif conversation by Ali Hoss
This feature also allows us to add being helpful to the assistant’s personality. Being helpful means the assistant should make suggestions if possible, but always being open to learning more from the user on how to be a better assistant.
Error Handling
Improved error handling for new feature by Jana Thompson
Inspired by conversations on how humans handle clarification in conversations, we added new fallback features for error handling to mitigate user frustration. While technology should continue to improve, technology alone is not a panacea to user frustrations. By clarifying to the user that the system is really trying to understand them, we can add to the personality of the assistant by making it a good listener. A good listener doesn’t always understand, but they make every attempt to understand.
Trust
One of the most persistent things noted about both mobile technology and smart home assistants is that there are aspects of creepiness and a lack of trust for companies. Our own users did not identify it as a primary concern, but they brought it up as a given almost. Trustworthy AI is increasing as a priority for companies, and in their smart home assistant technology this can be accomplished both by increasing transparency in Terms and Conditions, use and storage of personal data, and improving trust as noted in [12].
Closeup of flow diagram by Jana Thompson
This work is beyond the scope of the present system in many ways, but we have left in placeholders to remind these aspects of the design of AI systems should be attended to as a first-order priority and not an afterthought.
Implementing Demo/Prototype in Actions Console
The team began an implementation of the Get to Know Me feature as an Action in Google’s Actions Console for demo purposes only at the moment. The challenges in working on this feature has strong implications for the need for good translations between a design system and the developer console.
Future Directions
As a time-boxed course-based study, this project has many more dimensions that require exploration. These are discussed below.
Completing the Design System
The design system should at the very least be explored with several more languages with differing typologies and language families and could be used to explore how voices and user perceptions and expectations vary according to culture, sub-culture, and other aspects of sociocultural values. As VUI design originally grew out of IVR research, exploring the formality of systems in cultural contexts of trust and familiarity would be interesting avenues of further research.
Additionally, the design system should include all aspects of the Google Home ecosystem, including the regular Google Home, the Ring, etc. The mobile app should also be included, as it is a key part of the system that is only briefly touched upon here with voice settings.
Further User Research
Further and more in-depth longitudinal studies of usage and user expectations would no doubt yield greater insights than the ones we touched upon here. Recruiting a greater number of participants with greater variation in usage would also shed light on expectations for devices, while a greater diversity would hopefully allow for better understanding of varying cultural understandings and difficulties with the assistant systems processing of multiple cultural data points.
Develop the Features
Developing the features as a fuller prototype. This would be a substantial investment of work on learning how to use Google’s Actions Console and how to build more flexible functions within the system. Nonetheless, having a system to user test would be an invaluable step in understanding how users would react to the features in real-life scenarios.
Conclusion
The study and development discussed here took place over an eight-week period for a course in UX Tools as part of the fulfillment for a graduate degree in UX Design at the Maryland Institute College of Art. For further information on this work, please contact the author of this work. Feel free to leave comments and participate in any discussion.
References
Bentley, Frank, Chris Luvogt, Max Silverman, Rushani Wirasinghe, Brooke White, and Danielle Lottridge. Understanding the Long-Term Use of Smart Speaker Assistants. Proceedings of the ACM on Interactive, Mobile, Wearable, and Ubiquitous Technologies, Volume 2, Issue 3. September 2018. Article № 91, pp. 1–24. DOI: https://doi.org/10.1145/3264901
Dourish, Paul and Scott D. Mainwaring. Ubicomp’s Colonialist Impulse. UbiComp ’12: Proceedings of the 2012 ACM Conference on Ubiquitous Computing. September 2012. Pages 133–142. DOI: https://doi.org/10.1145/2370216.2370238
Frost, Brian. Atomic Design. https://atomicdesign.bradfrost.com/table-of-contents/
S. Yoo, I. Song and Y. Bengio, “A Highly Adaptive Acoustic Model for Accurate Multi-dialect Speech Recognition,” ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 5716–5720, doi: 10.1109/ICASSP.2019.8683705.
Mennicken, S.; Brillman, R.; Thom, J.; Cramer, H.. Challenges and Methods in Design of Domain-specific Voice Assistants. AAAI Spring Symposium Series, North America, mar. 2018. Available at: https://aaai.org/ocs/index.php/SSS/SSS18/paper/view/17575/15465
Schnoebelen, Tyler. The gender of artificial intelligence. Crowdflower Medium Publication. July 11, 2016. https://medium.com/@CrowdFlower/the-gender-of-artificial-intelligence-3d494c8fe7ac
Sutton, Selina Jeanne. Gender Ambiguous, not Genderless. Designing Gender in Voice User Interfaces (VUIs) with Sensitivity. CUI ’20: Proceedings of the 2nd Conference on Conversational User Interfaces. July 2020. Article № 11, pp. 1–8. DOI: https://doi.org/10.1145/3405755.3406123
Babich, Nick. Designing for the Future with Voice Prototypes. Smashing Magazine. May 2, 2019. https://www.smashingmagazine.com/2019/05/future-design-voice-prototypes/
Filice, Simone, Giuseppe Castellucci, Marcus Collins, Eugene Agichtein, and Oleg Rokhlenko. VoiSeR: A New Benchmark for Voice-Based Search Refinement. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. https://www.aclweb.org/anthology/2021.eacl-main.197
Braun, Daniel, Adrian Hernandez Mendez, Florian Matthes, and Manfred Langen. Evaluating Natural Language Understanding Services for Conversational Question Answering Systems. Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pp 174–185. August 2017. https://www.aclweb.org/anthology/W17–5522.pdf. DOI: 10.18653/v1/W17–5522
Ammari, Tawfiq, Jofish Kaye, Janice Y. Tsai, and Frank Bentley. Music, Search, and IoT: How People (Really) Use Voice Assistants. ACM Transactions on Computer-Human Interaction, Volume 26, Issue 3. June 2019. Article № 17, pp. 1–28. DOI: https://doi.org/10.1145/3311956
Abdi, Noura, Xian Zhan, Kopo M. Ramokapane, and Jose Such. Privacy Norms for Smart Home Personal Assistants. CHI ’21: Proceedings of the 20201 CHI Conference on Human Factors in Computing Systems. May 2021. Article №558, pp. 1–14. DOI: https://doi-org.proxy.library.nyu.edu/10.1145/3411764.3445122
Wiggers, Kyle. Amazon Alexa head scientist on developing trustworthy AI systems. Venture Beat. June 16, 2021. https://venturebeat.com/2021/06/16/amazon-alexa-head-scientist-on-developing-trustworthy-ai-systems/
Baker, Justin. Voice User Interfaces (VUI) — The Ultimate Designer’s Guide: The Fundamentals That Empower Us to Converse with Our Devices. Muzli Medium Publication. Nov 25, 2018. https://medium.muz.li/voice-user-interfaces-vui-the-ultimate-designers-guide-8756cb2578a1
Chakraborty, Poulami. Creating Voice Interaction Flows: UX Design for Voice Interfaes. UX Collective. Dec 30, 2018. https://uxdesign.cc/ux-design-for-voice-interfaces-part-ii-3b0056020cd3
Giangola, James. Conversation Design: Speaking the Same Language. https://design.google/library/conversation-design-speaking-same-language/
Kukulska-Hulme, Agnes. Language and Communication: Essential Concepts for User Interface and Documentation Design. 1999. Oxford University Press.
Hall, Erika. Conversation Design. 2018. A Book Apart.
Mortensen, Ditte Hvas. How to Design Voice User Interfaces. Interaction Design Foundation. https://www.interaction-design.org/literature/article/how-to-design-voice-user-interfaces
Privat, Guillame. Fundamental Elements of VUI Design. Prototypr. Dec 10, 2018. https://blog.prototypr.io/fundamental-elements-of-vui-design-8630077a7009
Reddy, Anuradha. Researching IoT Through Design: An Inquiry into Being at Home. Doctoral Dissertation, Malmö University, 2020.
Pearl, Cathy. Designing Voice User Interfaces. 2016. O’Reilly Publications.
Comments