Maybe you’ve seen “Snow Fall.” It’s long, it scrolls, and unlike most pieces on the web before it, it skillfully stitches together text and image and video.
Over the past year, what began as an article grew into a form. Individual features became conventions, reflecting a new interest in animation and screen-filling media, made available through increased bandwidth and evolving web browsers. It showed a way forward for longform reading, and the importance of designer and developer craft in that direction.
Yet for all the excitement, I can’t help but wish for more thoughtful discussion, both conceptually and practically. Often, I hear people refer to these designs as “intuitive” and “immersive,” but I find those words maddeningly vague. We — designers, developers, readers, writers, publishers — think we know what they mean in the abstract, but when we stoop down to the details, we end up disagreeing with each other on what the problems are and how they can be solved.
And without a common language for describing what works and what doesn’t, our work isn’t being pushed or explored further. I see example after example appearing online, that people have clearly spent time and thought into making, which cover the same ground and also share the same mistakes.
Experimentation is great if you’re learning. If you’re not, it’s just expensive.
The words we’ve been using so far, like “intuitive” and “immersive,” are overloaded with meaning. Let’s drop them. What are we really trying to say? By pulling these words apart, we may find more precise ways that pinpoint the different problems we are trying to solve.
Last year at Tools of Change, Tim Carmody gave a talk about the act of reading and the technologies behind it. Today we draw comparisons between the revolution of Gutenberg’s press and the revolution of the Internet, but there is an important 100-year period, from 1850–1950, that sits between these events. Carmody calls this era Paper Modernism, and it’s when much of what we consider “modern media” was actually developed.
Newspapers, cinema, the telegraph, typewriters, advertisements, and diagrammatic sentences — these things reshaped the ways people accessed and navigated news and information. They raised questions like: What’s worth reading? Who’s a professional and who’s an amateur? How do I know when to trust something and when to be skeptical? How do I organize and share information with other people? How should this get made? How does this make money?
The answers eventually came in the form of the invention of tools (microfiche projectors, underwater cables, QWERTY keyboards), the growth of new jobs (fact-checkers, secretaries, switchboard operators, graphic designers, and ad agencies), the creation of industries (family-owned papers, Hollywood), even the changing of culture in really weird ways (remember drive-in cinemas?). Today, we ask similar questions about smartphones, Twitter, privacy norms, cyber-bullying, citizen reporting, Kindle publishing — and we’re starting to uncover answers.
Now, the media of the Paper Modernist era seems almost quaint: it’s simple and obvious. And it’s because intuitiveness is largely an issue of literacy. Of fluency. Of exposure.
Intuitiveness can be broken down into a sort of ladder, or a cake with three layers: legibility at the top, metaphor in the middle, and skills at the bottom. By way of example, the concept of scrolling (which we hardly give any thought to) relies on all of them:
- It has to be legible. Legibility is about having cues or signals that are unambiguous and recognizable. Text goes down the page until it gets cut off by the bottom edge of the screen, or a scroll bar is visible on the side. These are table stakes — the minimum of what needs to be present to realize what you’re staring at.
- Metaphor is the framing concept. In the case of scrolling, either you think of a scroll (the papyrus kind!), where one end rolls up and the other end unrolls to reveal more text, or you think of a window in space that you pan around. Either way, it informs a mental model of the behavior of what’s onscreen, and how it will respond to your actions.
- Skills are the motor skills and mental skills that you pick up. It’s the physical ability to click your mouse on up/down arrows, or to slide your finger across the scroll wheel. Using a mouse may be easy for those of us who grew up playing with computers, but watch someone figure it out for the first time. For a few years, I taught seniors the basics of computers and the internet at a library, and quickly learned that moving the mouse is not what they have trouble with — they’ve been able to push things around on a table since they were three. But they haven’t needed to learn how to click something with their index finger until now — and that’s what trips people up.
The three layers of intuitiveness all depend on each other. If someone is confused, you have to see which layer they’re tripping on. Are there enough cues? Is the metaphor wrong? Or are they missing the necessary skills? The farther down the problem goes, the harder it gets to fix.
Making something more legible is easy; you might make a button more dimensional, or make an arrow stand out more obviously against its background. But if you’re using a weak or inappropriate metaphor, that takes more work to address. And if skills are the problem, that’s the most expensive — you either have to substitute a skill users have already learned, or invest time and money up front to teach them the gesture. When the iPhone first came out, pinch-to-zoom was not something we thought was possible, so every Apple commercial showed a person making the gesture. Once you’ve seen and internalized it, it becomes easy.
When people use the word “immersive,” I don’t think they mean getting lost in a world. They really mean the ability to focus — to avoid distraction. But another word people use, which can be dangerous, is “cinematic.”
It’s easy to fall into the trap of imitating another form without understanding how it works. This is a funny review by FILM CRITIC HULK, which, despite being written in all caps, is a spot-on critique of Tom Hooper’s 2012 film adaptation of Les Misérables. Here, FILM CRITIC HULK describes how the director wants to portray grand and dramatic emotion, but instead of drawing upon sweeping, panning shots (the standard way to evoke these feelings), Hooper has Hugh Jackman (as Jean Valjean) stare into the camera and make the audience uncomfortable:
SO HE WANTED TO TAKE A SOULFUL MOVIE, RIFE WITH DRAMA AND TRAGEDY, TELLING A TRULY EPIC, CLASSIC STORY BOTH IN TERMS OF SCOPE AND POLITICS, A STORY THAT FEATURES AN EMOTIONAL PERSONAL JOURNEY SPANNING DECADES WITH ALL THE CHARACTERS SINGING SONGS ABOUT HOPE AND LONGING...
AND HE FILMED IT IN A WAY THAT CONVEYS CHAOS AND DISCORD…AND HE OVERUSED THE MOST POWERFUL TOOL OF CINEMATIC STORY CONTROL, CLOSE-UPS, BY DOING IT THE ENTIRE TIME, MEANWHILE EMPLOYING AN EQUAL METHOD THAT UNDOES THAT CLOSE-UP EFFECT BY HAVING THE CHARACTERS LOOK DIRECTLY AT THE CAMERA, WHICH HAS THE SOLE EFFECT OF BREAKING THE FOURTH WALL AND MAKING THE AUDIENCE UNCOMFORTABLE!?!??!?!?!?!?!?!?
I’ve witnessed similar mistakes happening on the web. A lot of sites are now using large, lavish layouts in order to feel “cinematic.” It breaks my heart to use Flickr’s updated design, which expands images to be as gigantic as possible. The original Flickr site was as much about the photo as it was about who took it, what camera and settings the photographer used, and what collections the photo belonged to. The new site pushes everything else down, and loses so much of what made those old pages rich.
Something I think is worth imitating is the four-hour film Historias extraordinarias, by Argentinian director Mariano Llinás. It’s narrated continuously, for nearly the entire time, and what’s onscreen is almost a backdrop to the primary element — the words. If you’re an English speaker, there’s a density to the movie where you’re switching between the primary text (the subtitles) and the background (the visuals).
From Jose-Luis Moctezuma:
Amazingly enough, over a span of four hours (and a fifteen minute break if you’re watching it at a movie theater) the voice-over never seems to get gimmicky (not to mention that the narrator at key moments in the film hands narrative duties over to other characters, sometimes peripheral, sometimes hidden). This may be a result of the narration’s curious sublimation of the events that transpire onscreen, paradoxical when considering the subversive effect the voice-over has on reducing the eminence of the plastic image in favor of the descriptive and textual imagination of the narrator. The experience is quite literally akin to that of reading a novel, and even more so for those who don’t understand Spanish, since they will have to read the relentless flow of subtitles that swim beneath the images. These qualities give Historias extraordinarias a complex structuring that is primarily textual, and secondarily visual, with a third layering composed of the plot’s intertextual and seemingly infinite multiplication.
Unlike the adage “show, don’t tell,” this film tries to do both. Likewise, in our work online, the design must negotiate between text and image, tipping the balance one way or another, according to what’s being said. Historias extraordinarias provides a particularly elegant solution, and serves as a starting point for new aspects of onscreen reading that I’d like to explore.
We know how to design on paper, but screens exist in a state of flux. They can change. Our old metaphors don’t necessarily hold up. So what are the mechanics of balancing all the elements in our content? What factors should be kept in mind?
I propose these three materials that one should consider using:
Much of attention, as in a film like Historias extrordinarias, is about distinguishing between what is foreground and what is background. The viewer can focus on different things onscreen, which act like frames for his or her attention. When you first come to a web page, your frame is the whole window, because your eyes are still looking around. But once you begin reading, the frame locks around that particular body of text.
When you’re reading, it’s not very calm. Your eyes are moving from word to word, bouncing around, but they stay inside the current frame. So if you’re reading and you glance over at some side notes, it doesn’t feel very distracting. But if something tries to catch your attention outside the current frame, your attention diminishes. This happens quite a bit online, when you’re trying to read: as you scroll, pictures appear from the side (swooping in with an animation, even) and completely wreck your attention. Much of the criticism of multimedia storytelling is that animations are too gratuitous and distracting, and should be minimal and subtle. But I don’t think that’s true — what’s jarring to a story has more to do with how many and how often components appear in the wrong frame.
Think of attention as a limited resource that each reader brings to the page, that you must budget. Each person will start with a different amount, and a designer must be aware of how much they’re using: once attention disappears, so does the reader.
Rhythm is made up of two components. The first is pacing, which should be purposeful. Sometimes you want readers to stop and take a breath: examples include moving from one chapter to the next, or breaking from one major idea and switching to another. Media that takes up the whole width of the screen or has heavy animation is useful in these instances.
We can’t apply this treatment for every single image — not all of them warrant it. Sometimes an image is supplementary, sets the tone or atmosphere, or illustrates a point the text is making. So we need to expand our range of design elements; in addition to large images, we should have smaller ones that can chaperone the text and not break the reading flow.
Sequence is the other half of rhythm, and is well understood by any writer. It addresses questions like, what part goes where? And, how do you take the reader through all these different elements? As a designer, you have additional tools in your palette. Slideshows, inline video, block quotes, timelines, audio players — they can all be used in concert to guide the reader through the structure of a text.
Lastly, I’ll touch on “weight,” which is a factor that revolves around the interactive parts of a story. How do things feel as you click or tap on them? “Weight” is a term with much history with animators and video game designers — they learned early on that objects onscreen could develop a convincing degree of mass and movement.
Here’s an excerpt from The Animator’s Survival Kit, which is worth digging in to if you want to understand all the tricks animators use to create visceral reactions in their audience. This image is about anticipation: you don’t just see a character pick up a heavy rock in one movement; the person must crouch down and bend his or her back. Even as the rock tenuously rises up from the ground, it’s slow, while the person strains as far back as humanly possible. You actually feel the weight in your gut: that’s how you know the rock is heavy.
This is why animators use easing curves — so that different parts of an animation don’t play at the same speed. Instead, you have parts either speeding up or slowing down at the beginning and at the end of their timelines. The beginning and end of an animation is when people pay the most attention.
There’s also a close relationship between the notion of “feel” and the interfaces and controls for video games, because they struggle to deal with responsiveness and feedback to input. Consider Nuclear Thrones, a game where you control a green alien that moves around in a post-apocalyptic world, tasked with fighting off hungry beasts. The following explanation by the game’s designers gives just a small taste of how many different things the team must account for: there’s sound, and the speed of the bullet as your weapon fires away, and the shake and flash of the screen when something blows up. All of these factors add up to a singular gestalt that the player feels, and each of them have parameters that must be tuned in order for that action to feel right.
Hitting an enemy also creates that hit effect, plays that enemy’s own specific impact sounds (which is a mix of a material — meat, plant, rock or metal — getting hit and that character’s own hit sound), adds some motion to the enemy in the bullet’s direction (3 pixels per frame) and triggers their “get hit” animation. The get hit animation always starts with a frame white, then two frames of the character looking hit with big eyes. The game also freezes for about 10-20 milliseconds whenever you hit something.
This is just the basic shooting. So many more systems come in to play here. Enemies dying send out flying corpses that can damage other enemies, radiation flies out at just the right satisfying speed, etc. We could keep going on and on. It’s that attention to details and the relationships of all those systems that matter. You might miss an enemy and hit a radiation canister, forcing you to run into danger to grab all that exp before it expires, etc. It’s the mix of things that matters, not the things themselves. I guess what our games have is our view on what makes those values feel and play good. That’s the Vlambeer “feel.”
Of course, lasers and alien invaders appear less frequently in our digital work, but the design issues are the same: What happens when someone interacts with something in the environment we’ve made? Do images slide slightly left and right as you move through a slideshow? How much impact should there be when a sidebar opens and closes? Can well-timed sounds — long neglected on the web but increasingly common on phones and tablets — tell users what’s happening without looking? What can screens convey viscerally?
These are all questions. They lie on the boundaries of our practice. They are broad and expansive, because they belong to more than any one particular form. And every time designers ape what they’ve seen before, without learning why it works, these stones stay unturned.
For better or worse, we live in a world of media invention. Instead of reusing a stable of forms over and over, it’s not much harder for us to create new ones. Our inventions make it possible to explore the secret shape of our subject material, to coax it into saying more.
These new forms won’t follow the rules of the scroll, the codex, or anything else that came before, but we can certainly learn from them. We can ask questions from a wide range of influences — film, animation, video games, and more. We can harvest what’s still ripe today, and break new ground when necessary.