Since StarTrek TOS I have imagined living in the future. Talking with a computer conversationally and using our words to do anything sounds like a dream come true. Amazon Alexa’s introduction gave us that possibility with the first viable form of voice control. Even before Alexa, I’ve been a die-hard speech command lover through tools dragon text to speech for dictation and voice macros on MacOS to talk with Insteon devices and other random commands. Eject CD-Rom! Turn the light on!
Now we seem even closer to that promised land. Google Assistant, Apple’s Siri, Microsoft Cortana, Amazon Alexa, and Samsung Bixby have begun a relay race to create an AI we can talk with. It feels like we keep inching closer and closer to the promised land, and you know what? Voice is a pretty horrible way to communicate.
I find myself often tongue-tied and struggling to figure out how best to ask the simplest or most complicated of things. I would wager that sometimes I spend more time thinking about how to state a request then the time the actual request takes. But this is an issue that we can already see that ChatBots and AI are understanding and will become easier. The conversation should become more natural and feel less pained.
My thoughts have evolved on where a voice request starts and what the natural reaction should be. For example, what time does a restaurant close? Oh, great, it’s open, and I can make it. But was that my real question? If it’s closed, where else could I go? Would I prefer to walk or drive? Will I need the menu? Does it meet my dietary needs? It’s not really a voice command I need but a conversation.
Even if voice offered the info you need conversationally, would it really scratch the bigger itch? Yes, I want directions, but can I see the map first? I can judge the neighborhood much better by looking than by hearing an address. What is on the menu? Can I view it because we all know that a 1 or even 2 min read out of the menu is not what we want. Sometimes when debating music choices, I ask to hear Jazz but do you have 5 possible playlists I might like? Present me my options on an interface near me or start playing if I don’t respond. If I put down my phone or tablet voice should reengage and continue to help me. Did you want me to book those reservations? Should I send a message to people as an invite?
Our next evolution in voice is understanding it is not perfect, and that is why we continue to read and write. It is why we take pictures and record videos or audio. The words muscle memory speak to the speed interactions can offer in the physical world that voice can’t replicate. Our future is the ability to connect all of these mediums in a way that truly feels connected and seamless, and I think we all may have missed this point. Give me what I want where I want it, and when I want it.
What we all want is not a way to reach the world around us but many ways. The future is multi-sensory and tuned to the best combination of equipment at your disposal. If I’m in my bedroom, it may be my voice and my tv. In the office, my laptop (my voice may disturb others around me). On a walk or run, voice and a watch or phone. In the car, well, the car.