In just a few years, gestures have become one of the primary ways we interact with computers. They’ve migrated from research devices to smartphones, smartphones to tablets, tablets to PC touchpads, and so onward to gaming systems, kiosks, cars, and more.
A few touchscreen gestures — a one-finger swipe to scroll, pinch-and-spread to zoom in and out — have become a lingua franca almost everyone understands. They’re deployed in nearly the same way across almost all devices, and have been since some of the earliest research on multitouch input — long before smartphones made them popular.
The primary driver of this proliferation of gestures has been the maturity of multitouch. Most crude touchscreen interfaces can’t do true gestures. At your average touchscreen shopping kiosk, you’re limited to tapping software buttons or icons, in a kind of facsimile of either a hardware display or a genuine GUI.
But we should be clear: gestures aren’t limited to touchscreens. Gestures may not involve touch at all. They can be kinetic, like tilting a smartphone to change orientation or shaking it to shuffle between songs. On Motorola’s newest smartphones, two quick twists of the wrist activate the phone’s camera. In the newest version of Apple’s iOS, simply raising the phone out and upright can be used to activate Siri. This isn’t even strictly speaking a gesture — it’s simply an orientation that’s specifically useful for speech input, and (Apple hopes) unlikely to be used for anything else.
Gestures can also be indirect. If you use multitouch gestures on a laptop’s touchpad, you’re no longer “manipulating” an “object” “directly” — or at least not fooling your eyes, hands, and brain into thinking you are in quite the same way. Instead, you’re performing similar gestures at a 90-degree angle. You do the same if you use your smartphone or tablet as a remote control for your television set. Gestures performed on the input device are realized on the second, networked screen. Of course, logically, nothing is different; you’re still transforming touch or point recognition to a virtual object that is represented on a screen. Even phenomenologically, we can train ourselves to think of the gestures on one screen as occurring on the other, in the same way that we’ve trained ourselves to associate the movement of a mouse with the movement of a cursor.
Gestures become a sign language that you and your device understand — a third way that complements longstanding input methods like tapping or clicking on graphical icons. Meanwhile, by comparison, text entry is downright ancient.
Gestural computing is only possible through a specific blend of hardware and software: accelerometers, gyroscopes, and even cameras are used to detect movement, identify specific meaningful inputs, then decode and translate those inputs into actions in a computing system. Gestures are becoming such an important and data-greedy part of computing that new devices like the iPhone 5S and Moto X are shipping with extra chips to process these sensors’ input and make gesture recognition faster, more sensitive, and less power-hungry.
We’re offloading gestural processing to dedicated silicon in the same way that we did to graphics processing a computational generation ago. Just as that both signaled and helped usher in a new reliance on fast, high-definition video for computing, these new chips do the same for gestures. We will do less and less with points and clicks, and more and more through speech and gestures.
Along with speech, gestures are sometimes classified as a “natural” user interface. But we ought to be skeptical whenever anyone labels a gesture as “natural.” A gesture might be motivated by a correspondence with the physical world, but it has to be defined in very specific terms to be used in computers. Really, gestures fall into the same kind of structures as all other kinds of language, or all other kinds of computer input: they need to be reasonably well-defined and differentiated, and they need to be made conventional. Even when they're motivated by a kind of imitation of other kinds of movements or affordances, it all becomes a kind of language.
Some of my computing gestures imitate real-world movements, like swinging a video game controller like an imaginary baseball bat. Some copy skills I’ve learned from other technologies, like translating smartphone gestures to trackpads, or tilting a controller like a steering wheel. And some gestures, like the difference between a two-finger and a three-finger swipe, are almost totally arbitrary, whether they’re dictated by the system or are user-defined.
In 2011, Apple introduced a new feature in for Mac OS X Lion, called “natural” two-finger scrolling. Natural scrolling is similar to moving paper or scrolling on an iPhone or iPad’s touchscreen. Users can also select the scrolling method used by most versions of Windows and older versions of OS X, which imitates the scroll button or scroll bar on a traditional desktop. What’s strange is that if you have spent much time using either method, both scrolling conventions feel “natural.”
At their limit, then, gestures either imitate analog-world actions so well — or fit into our set of already-acquired computing skills and routines so closely — that we don’t think about them as signs or gestures at all; we’re just acting naturally.
It can sometimes feel like a step backwards to try to force gestures from the old world into the new. In 2010, I interviewed Microsoft Research’s Bill Buxton, who has spent his career working on user interfaces at Microsoft, Xerox PARC, the University of Toronto, and elsewhere. Buxton stressed the importance of using gesture technologies to leverage the skills we already have.
“Skills are really expensive,” Buxton says. “We’ve invested time, energy, and physical and historical capital in acquiring them. And if we don’t take advantage of them, it isn’t just a waste of time or money, it shows disrespect to the user.”
If anything, Buxton says, most multitouch systems don’t borrow enough from our skills learned using other technologies. In a 2007 essay, “Multi-Touch Systems that I Have Known and Loved,” Buxton writes, “Many, if not most, of the so-called ‘multi-touch’ techniques that I have seen are actually ‘multi-point.’”
Touchscreens typically register just points of contact: they don’t register the pressure with which the device has been touched, the angle and articulation of the hand, or velocity of movement. Interaction, Buxton says, is about both look and feel, but most multitouch systems overwhelmingly emphasize look over feel, sight over touch.
Our hands actually permit us a much wider range of gestures than our devices currently allow. Prototypes already feature flexible displays, an even greater range of sensors for optics, pressure, and movement, haptic feedback, and more.
When it comes to gestures, we’re still just scratching the surface. (But until sensors can make sense of it, scratching the surface is a gesture most smartphone owners should try to avoid.)