Kevin Kelly: “Becoming Screen Literate”

by Michael Nielsen on November 23, 2008

There’s a great piece in the New York Times from Kevin Kelly. Here’s a few quotes, but there’s much more, and I definitely recommend the whole thing:

Rewriting video can even become a kind of collective sport. Hundreds of thousands of passionate anime fans around the world (meeting online, of course) remix Japanese animated cartoons. They clip the cartoons into tiny pieces, some only a few frames long, then rearrange them with video editing software and give them new soundtracks and music, often with English dialogue. This probably involves far more work than was required to edit the original cartoon but far less work than editing a clip a decade ago. The new videos, called Anime Music Videos, tell completely new stories. The real achievement in this subculture is to win the Iron Editor challenge. Just as in the TV cookoff contest “Iron Chef,” the Iron Editor must remix videos in real time in front of an audience while competing with other editors to demonstrate superior visual literacy. The best editors can remix video as fast as you might type.

In fact, the habits of the mashup are borrowed from textual literacy. You cut and paste words on a page. You quote verbatim from an expert. You paraphrase a lovely expression. You add a layer of detail found elsewhere. You borrow the structure from one work to use as your own. You move frames around as if they were phrases.


On Google SketchUp’s 3D Warehouse, you can find insanely detailed three-dimensional virtual models of most major building structures of the world. Need a street in San Francisco? Here’s a filmable virtual set. With powerful search and specification tools, high-resolution clips of any bridge in the world can be circulated into the common visual dictionary for reuse. Out of these ready-made “words,” a film can be assembled, mashed up from readily available parts. The rich databases of component images
form a new grammar for moving images.

After all, this is how authors work. We dip into a finite set of established words, called a dictionary, and reassemble these found words into articles, novels and poems that no one has ever seen before. The joy is recombining them. Indeed it is a rare author who is forced to invent new words. Even the greatest writers do their magic primarily by rearranging formerly used, commonly shared ones. What we do now with words, we’ll soon do with images.


But merely producing movies with ease is not enough for screen fluency, just as producing books with ease on Gutenberg’s press did not fully unleash text. Literacy also required a long list of innovations and techniques that permit ordinary readers and writers to manipulate text in ways that make it useful. For instance, quotation symbols make it simple to indicate where one has borrowed text from another writer. Once you have a large document, you need a table of contents to find your way through it. That requires page numbers. Somebody invented them (in the 13th century). Longer texts require an alphabetic index, devised by the Greeks and later developed for libraries of books. Footnotes, invented in about the 12th century, allow tangential information to be displayed outside the linear argument of the main text. And bibliographic citations (invented in the mid-1500s) enable scholars and skeptics to systematically consult sources. These days, of course, we have hyperlinks, which connect one piece of text to another, and tags, which categorize a selected word or phrase for later sorting.

All these inventions (and more) permit any literate person to cut and paste ideas, annotate them with her own thoughts, link them to related ideas, search through vast libraries of work, browse subjects quickly, resequence texts, refind material, quote experts and sample bits of beloved artists. These tools, more than just reading, are the foundations of literacy.

If text literacy meant being able to parse and manipulate texts, then the new screen fluency means being able to parse and manipulate moving images with the same ease. But so far, these “reader” tools of visuality have not made their way to the masses. For example, if I wanted to visually compare the recent spate of bank failures with similar events by referring you to the bank run in the classic movie “It’s a Wonderful Life,” there is no easy way to point to that scene with precision. (Which of several sequences did I mean, and which part of them?) I can do what I just did and mention the movie title. But even online I cannot link from this sentence to those “passages” in an online movie. We don’t have the equivalent of a hyperlink for film yet. With true screen fluency, I’d be able to cite specific frames of a film, or specific items in a frame. Perhaps I am a historian interested in oriental dress, and I want to refer to a fez worn by someone in the movie “Casablanca.” I should be able to refer to the fez itself (and not the head it is on) by linking to its image as it “moves” across many frames, just as I can easily link to a printed reference of the fez in text. Or even better, I’d like to annotate the fez in the film with other film clips of fezzes as references.

With full-blown visuality, I should be able to annotate any object, frame or scene in a motion picture with any other object, frame or motion-picture clip. I should be able to search the visual index of a film, or peruse a visual table of contents, or scan a visual abstract of its full length. But how do you do all these things? How can we browse a film the way we browse a book?

It took several hundred years for the consumer tools of text literacy to crystallize after the invention of printing, but the first visual-literacy tools are already emerging in research labs and on the margins of digital culture. Take, for example, the problem of browsing a feature-length movie. One way to scan a movie would be to super-fast-forward through the two hours in a few minutes. Another way would be to digest it into an abbreviated version in the way a theatrical-movie trailer might. Both these methods can compress the time from hours to minutes. But is there a way to reduce the contents of a movie into imagery that could be grasped quickly, as we might see in a table of contents for a book?

My excerpts are concentrated on material related to remix culture, especially in the visual arts. An important subtheme of the piece, which my excerpts don’t capture so clearly, is the extent to which technology shapes human behaviour. The medium isn’t just the message, it shapes the entire culture.

