Hear no evil

January 7, 2017

Voice-controlled AI assistants are advanced enough to be dangerous

Useful voice recognition, combined with AI capable of parsing specific phrases and sentences, is finally here. Amazon’s Alexa, Apple’s Siri and Google’s Assistant are showing us what the future will be like.

However, the safeguards are lagging behind the capabilities, as the recent example of a TV anchor ordering dollhouses shows. The fact that the system picked up voice from the TV and interpreted it as a command sounds funny, but should be terrifying to anyone remotely interested in computer security. It sounds like a Hollywood adaptation of the classic remote code execution bug — but it’s not a fantasy any more.

We’re so happy that we have machines that can listen to us, that in our rush to use / buy / create them, we haven’t stopped and made sure they listen only to us. That’s why a kid can order a dollhouse while parents are asleep or away, TV anchor reporting on that can order hundreds more, and we can play fun pranks when visiting friends by ordering tons of toilet paper while they’re not looking :–)

Accidentally ordering something online can be terribly inconvenient and cost you a fine buck, but as these assistants get control over more devices in our homes and our lives (IoT anyone?), we’ll start seeing real problems. Here’s a stupid trick that might just work in a year of so: Alexa, unlock the front door!

Mobile phone voice assistants show one way of handling this: by requiring the phone to be unlocked for (most) commands to work. Yet while may make sense for phones (and only slightly inconvenience the user), it’s a non-starter for home automation systems. If I have to walk over and press a button, I might just as well do the entire action (such as turning the light off, or unlocking the door) myself.

Another possibility is speaker recognition. By analyzing how the words are uttered, not just what they are, such systems can distinguish voice of the authorized user. However, like many other biometric systems, it is easily fooled by a facsimile of the user — in this case, a simple recording. Thus anyone with a mobile phone can “hack” this kind of security.

More effective, and only slightly more inconvenient, would be the combination of requiring the physical presence of the user in the room (for example, by sensing their mobile phone, smartwatch, or other personal item they’d carry around most of the time) and speaker recognition. In this case, even if a hack is attempted, the user themselves would be around to prevent it.

So the good news is, it shouldn’t be that hard to build more secure voice-controlled systems. The bad news is, as we’ve seen with huge botnets made of compromised IoT devices, many companies in home automation space currently have no experience or incentives to focus more on security.

Voice-controlled AI assistants are here to stay, and it’s a good thing — they’re mightily convenient. But expect more fun anecdotes and scary stories in the years ahead.