Wednesday, December 30, 2015

New paper: The vocabulary of manga

I'm happy to announce that my new article with co-author Sean Ehly, "The vocabulary of manga: Visual morphology in dialects of Japanese Visual Language" is now published in the Journal of Pragmatics!

This paper is especially exciting, because my co-author is a former student who did this project as part of his class project. It now joins previous publications stemming from projects from that class, with more on the way!

Sean wanted to investigate the "morphology" of the Japanese Visual Language that are used in manga—the graphic elements like bloody noses for lust or a giant sweat drop for anxiety. I had discussed some of these in my book, but Sean recognized that there were many that I missed. He listed over 70 of these elements related to emotion alone! In fact, as a resource to other researchers and fans, we've now compiled this "visual vocabulary" into a list:

Morphology in Japanese Visual Language

We don't consider it exhaustive, so if you think of others that should be added, please let us know!**

We then used this list to investigate how they are used in 20 different manga—10 shojo and 10 shonen—which amounted to over 5,000 panels coded across these books. Overall, we show that most of these "visual morphemes" appear in both types of books, though certain morphemes are more prevalent in one type or antoher. We take this as first empirical evidence that there may be distinct "dialects" within a broader Japanese Visual Language, at least for this one dimension of structure.

The paper is available along with all others at my Downloadable Papers page, and directly as a pdf. Here's the full abstract:

The visual representations of non-iconic elements in comics of the world often take diverse and interesting forms, such as how characters in Japanese manga get bloody noses when lustful or have bubbles grow out their noses when they sleep. We argue that these graphic schemas belong to a larger ‘‘visual vocabulary’’ of a ‘‘Japanese Visual Language’’ used in the visual narratives from Japan. Our study first described and categorized 73 conventionalized graphic schemas in Japanese manga, and we then used our classification system to seek preliminary evidence for differences in visual morphology between the genres of shonen manga (boys’ comics) and shojo manga (girls’ comics) through a corpus analysis of 20 books. Our results find that most of these graphic schemas recur in both genres of manga, and thereby provide support for the idea that there is a larger Japanese Visual Language that pervades across genres. However, we found different proportions of usage for particular schemas within each genre, which implies that each genre constitutes their own ‘‘dialect’’ within this broader system.

Cohn, Neil and Sean Ehly. 2016. The vocabulary of manga: Visual morphology in dialects of Japanese Visual Language. Journal of Pragmatics. 92: 17-29.

** Longtime followers of this site may remember that we attempted a similar listing for morphology across different visual languages based on a discussion on my now defunct forum over 10 years ago. Perhaps I'll have to create additional pages for other visual languages as well, now that we have ongoing corpus research underway...

Tuesday, November 24, 2015

New Paper: A multimodal parallel architecture

I'm excited to share that I now have a new article in the latest issue of Cognition: "A multimodal parallel architecture: A cognitive framework for multimodal interactions." This paper presents my overall model of language, and then uses it to explore different aspects of multimodal communication.

The key distinctions in this paper are about multimodal relations that must balance grammar in multiple domains. Many models of multimodal relations describe the various meaningful (i.e., semantic) interactions between modalities. This paper extends beyond these relationships to talk about how the dominance of meaning in one modality or another must negotiate grammatical structure in one or multiple modalities.

This paper has had a long journey... I first had many of these ideas way back in 2003, and they were part of an early draft on my website about multimodality called "Interactions and Interfaces." In 2010, I started reconsidering how to integrate the theory in the context of my mentor Ray Jackendoff's model of language—the parallel architecture. The component parts were always the same, but articulating them in this way allowed for a better grounding in a model of cognition, and in further elaborating on how these distinctions about multimodality fit within a broader architecture. I then tinkered with the manuscript on and off for another 5 years...

So, 12 years later, this paper is finally out! It pretty much lays out how I conceive of language and different modalities of language (verbal, signed, visual), not to mention their relationships. I suppose that makes it a pretty significant paper for me.

The paper can be found on my Downloadable Papers page, and a direct link (pdf) is here.

Human communication is naturally multimodal, and substantial focus has examined the semantic correspondences in speech–gesture and text–image relationships. However, visual narratives, like those in comics, provide an interesting challenge to multimodal communication because the words and/or images can guide the overall meaning, and both modalities can appear in complicated ‘‘grammatical” sequences: sentences use a syntactic structure and sequential images use a narrative structure. These dual structures create complexity beyond those typically addressed by theories of multimodality where only a single form uses combinatorial structure, and also poses challenges for models of the linguistic system that focus on single modalities. This paper outlines a broad theoretical framework for multimodal interactions by expanding on Jackendoff’s (2002) parallel architecture for language. Multimodal interactions are characterized in terms of their component cognitive structures: whether a particular modality (verbal, bodily, visual) is present, whether it uses a grammatical structure (syntax, narrative), and whether it ‘‘dominates” the semantics of the overall expression. Altogether, this approach integrates multimodal interactions into an existing framework of language and cognition, and characterizes interactions between varying complexity in the verbal, bodily, and graphic domains. The resulting theoretical model presents an expanded consideration of the boundaries of the ‘‘linguistic” system and its involvement in multimodal interactions, with a framework that can benefit research on corpus analyses, experimentation, and the educational benefits of multimodality.

Cohn, Neil. 2015. A multimodal parallel architecture: A cognitive framework for multimodal interactions. Cognition 146: 304-323

Monday, November 09, 2015

How to analyze comics with narrative grammar

Over the past several years, I've presented a lot of evidence that panel-to-panel "transitions" cannot account for how we understand sequences of images in visual narratives like comics. Rather, I've argued that narrative sequential images use a "narrative grammar" that assigns roles to panels, and then groups panels into hierarchic relationships.

Though there are many reasons panel transitions don't work to explain how we understand sequential images, one of the reasons why panel transitions as a theory may be attractive is because it is intuitive to see outright. A person can easily look at a sequence and assign transitions between panels and it "feels" right because that matches one's conscious experience of reading a comic (though it is not very cognitively accurate).

In contrast, my theory of narrative grammar is fairly complex, and much harder to intuit. Though, I think this is somewhat as it should be—there's a lot of complicated things going on sequential images that people don't realize! However, this complexity means that people might have a hard time of implementing the theory in practice.

SO... to help rectify this issue I've now written a "tutorial" that aims to explain the process that people should follow when analyzing a visual narrative sequence and are attempting to implement this theory of narrative grammar.

You can download a pdf of the tutorial here, while it can be found also on my Downloadable Papers page and my Resources page.

The simple summary is that one cannot simply look at a sequence and assign labels to things. There are a series of procedures and diagnostics to use, and there is an order of operations that is optimal for arriving at an analysis. This is the same as most any linguistic theory, which usually requires instruction or dedicated learning in order to implement.

This tutorial is aimed at researchers (or anyone curious) who wish to implement this theory in practice and/or learn more about the underlying logic for how it works. It is also aimed at teachers who might wish to instruct this theory in their classrooms, but may not know how to do it with competence.**

As you'll find in the tutorial, it only somewhat actually covers the basic principles of the theory. For this you should reference my papers and my book, The Visual Language of Comics. The tutorial can thus supplement these works for a full understanding and implementation of the theory.

**On this point, note: anyone who wants to learn how to do this, especially with the intent of putting into practice in research or instruction should feel free to contact me for more guidance and resources.

Monday, November 02, 2015

Dispelling emoji myths

In my recent BBC article and my blog posts about emoji, I have tried to explain how emoji are not an emerging language, but that they do serve important functions that resemble other limited communicative systems.

Having now poked around online quite a bit looking at what people say about emoji, I'm particularly struck by the repetition of a few myths. Since these misunderstandings creep up all over the place, I wanted to address them here...

1. Emoji are not like hieroglyphics

First off, many people have compared emoji to Egyptian hieroglyphics, either saying that they work exactly the same and/or that emoji are a "modern hieroglyphics."

This is simply not true, mostly because hieroglyphics were a full blown writing system where each sign had a mapping to sound. Hieroglyphics are not "symbol systems" made up of pictures. To me, this seems like the kind of misperception that people who are only used to an alphabet have about other writing systems: "if each sign isn't a sound like a letter, it must be just about meanings!"

There are actually several ways that hieroglyphics operated as a writing system. Some signs did indeed mean what they represented. For example, the sign for "owl" looked like an owl, and was pronounced "m":

However, the use of "rebus" signs meant that those signs could also be used without that iconic meaning, and only would be used for their sound value (i.e., that owl sign would be used for many words using the sound "m," but not for its meaning of "owl").

From there, these both of these types of signs could be combined into compound signs. For example, this combination takes the rebus of owl (using just the sound "m") and the sign for ear (using its meaning, but not pronunciation) for the word "to hear":

This type of compound used signs both for their meaning value and for their sound value. There are no compounds made up of two signs that just contribute to meaning—they always have some sound-based sign present. Hieroglyphics also sometimes use fairly abstract representations, and purely sound-based signs which vary based on the number of consonants they represent.

In sum, unlike the purely imagistic meanings found in emoji, hieroglyphics are a fully functioning writing system that is intrinsically tied to the Egyptian language. This is totally different from emoji in context also because the imagistic emoji accompany a separate writings system (for English speakers, the alphabet). In the case of hieroglyphics, they are the writing system.

I'll note also, these same things apply to Chinese characters. Though they work a little different than hieroglyphics, the same basic principles apply: it's a writing system tied to the sounds of languages, not a series of pictures that only have imagistic meaning.

2. There is no such thing as a universal language

I have seen many people exhort that one of the exciting things about emoji is their quality of transcending spoken languages to be a "universal language." This is also hogwash, for many reasons. No language is universal, whether verbal, signed, or visual. Here are several reasons why images (including emoji) are not, and cannot be, universal:

How they convey meaning

Just because images may be iconic—they look like what they represent—does not mean that they are culturally universal. Even simple things like the way people dress does not translate across cultures, not to mention variables in facial expressions or, even more dramatic, fully conventionalized meanings like giant sweat drops to convey anxiety. Note that, since they were created in Japan originally, many emoji are already culturally specific in ways that do not translate well outside Japan.

This is not to mention the limitations of emoji that I discussed in my BBC article, such as that they rely on a non-producible vocabulary that does not allow the easy creation of new signs, and their sequence maintain a simple system characteristic of impoverished grammar. In other words, they are highly limited in what they can express, even as a graphic system.

Cultural exposure

We also know that images are not universal because a host of studies have shown that people who do not have cultural exposure to images often have difficulty understanding the meanings of images. Such deficits were investigated prevalently in the 1970s and 1980s under the umbrella of "visual literacy." Here's how I summarized one such study examining individuals from Nepal from Fussell & Haaland (1978):

As described in the paper, the individuals tested "had significant deficits understanding many aspects of single images, even images of faces with simple emotions in cartoony styles (happy - 33%, sad - 60%). They had even more difficultly related to actions (only 3% understood an image trying to convey information about drinking boiled water). Some respondents had radically different interpretations of images. For example, a fairly simple cartoony image of a pregnant woman was interpretted as a woman by 75%, but 11% thought it was a man, and others responded that it was a cow, rabbit, bird, or other varied interpretations. They also had difficulty understanding when images cut off parts of individuals by the framing of a panel (such as images of only hands holding different ingredients)."

Such findings are not rare. For example, Research into Illustration by Evelyn Goldsmith summarizes several studies along these lines. Bottom line: Understanding drawings requires exposure to a graphic system, just like learning a language.

There is not just one visual language

Most discussion of the universality of images focuses on how they are comprehended. But, this overlooks the fact that someone also had to create those images, and images vary widely in their patterns across the world.

That is, as I argue in my book, there is not just one visual language, but rather there are many visual languages in the world too. There's a reason why the "style" of American superhero comics differs from Japanese manga or French band desinee or instruction manuals, etc. Drawing "styles" reflect the patterns of graphic information stored in the minds of those who create them. These patterns vary across the world, both within and between cultures.

This happens because people are different and brains are not perfect. There will always be variation and change within what is perceived to be a coherent system. This is in part because any given language is actually a socio-cultural label applied to the system(s) used by individual people. There is no "English" floating out in the ether to which we all link up. Rather, "English" is created by the similarities of patterning between the languages spoken by many people.

Indeed, though many who speak "English" can communicate in mutually intelligible ways, there are hundreds of sub-varieties of "English" with variations that range from subtle (slight accents) to dramatic (changing vocabulary and grammar), both across geographic, cultural, and generational dimensions.

Similarly, even if there was to be a universal language—be it spoken or visual—sub-varieties would emerge based on who is using the system and how they do it. Just because images are typically iconic does not mean that they are transparent and outside of cognitive/cultural patterns.

Emoji in part exemplify this facade that a language is external to the patterns in people's minds, since the vocabulary is provided by tech companies and does not directly emerge from people's creations. Someone (ICANN) decides which emoji can be used, and then makes them available. This is the opposite of how actual languages work, as manifestations of similarities between cognitive structures across speakers.

In sum, drawings are not universal because drawings differ based on the cultural "visual languages" that result from people using different patterns across the world.

Tuesday, October 27, 2015

New Paper: Narrative Conjunction's Junction Function

I'm excited to announce that my new paper, "Narrative Conjunction's Junction Function" is now out in the Journal of Pragmatics! This is the first major theoretical paper I've had in a long time, and it goes into extensive detail about several aspects of my theory of how narrative image sequences are comprehended, Visual Narrative Grammar.

The main topic of this paper is "conjunction" which is when multiple panels are grouped together and play the same role in a sequence. I argue that this narrative pattern is mapped to meaning in several different ways. In addition to these arguments, the paper provides a fairly extensive treatment of the basics of my narrative theory along with the underlying logic it is guided by (i.e., diagnostic tests).

You can find the paper here (pdf) or along with my other downloadable papers. Here's the full abstract:


While simple visual narratives may depict characters engaged in events across sequential images, additional complexity appears when modulating the framing of that information within an image or film shot. For example, when two images each show a character at the same narrative state, a viewer infers that they belong to a broader spatial environment. This paper argues that these framings involve a type of “conjunction,” whereby a constituent conjoins images sharing a common narrative role in a sequence. Situated within the parallel architecture of Visual Narrative Grammar, which posits a division between narrative structure and semantics, this narrative conjunction schema interfaces with semantics in a variety of ways. Conjunction can thus map to the inference of a spatial environment or an individual character, the repetition or parts of actions, or disparate elements of semantic associative networks. Altogether, this approach provides a theoretical architecture that allows for numerous levels of abstraction and complexity across several phenomena in visual narratives.

Cohn, Neil. 2015. "Narrative conjunction’s junction function: The interface of narrative grammar and semantics in sequential images." Journal of Pragmatics 88:105-132. doi: 10.1016/j.pragma.2015.09.001.

Tuesday, October 13, 2015

Emoji and visual languages

I'm excited that my recent article on the BBC website about emoji has gotten such a good response. So, I figured I'd write an addendum here on my blog to expand on things I couldn't get a chance to write in the article. I of course had a lot to say in that article, and it was inevitable that not everything could be included.

The overall question I was addressing was, "are emoji a visual language?" or "could emoji become a visual language?" My answer to both of these is "no."

Here's a quick breakdown of why, which I say in the article:

1. Emoji have a limited vocabulary set that is made of whole-unit pieces, and that vocabulary has no internal structure (i.e., you can't adjust the mouth of the faces while keeping other parts constant, or change the heads on bodies, or change the position of arms)

2. Emoji force these stripped-down units into unit-unit sequences, which just isn't how drawings work to communicate. (More on this below)

3. Emoji use a limited grammatical system, mostly using the "agent before act" heuristic found across impoverished communication systems.

All of these things limit emoji from being able to communicate like actual languages. Plus, these also limit emoji from communicating like actual drawings that are not mediated by a technological interface.

There are two addendums I'd like to offer here.

First, these limitations are not just constrained to emoji. They are limitations of every so-called "pictogram language," which are usually created to be "universal" across spoken languages. Here, the biggest problem is in believing that graphic information works the way that writing does: putting individual units, each which have a "single meaning," into a unit-unit sequence.

However, drawings don't work this way to communicate. There are certainly ways to put images in sequence, such as what is found in the visual language of comics. The nature of this sequencing has been my primary topic of research for about 15 years. When images are put into sequence, they have characteristics unlike any of the ways that are used in these "writing imitative" pictogram sequences.

For example, actual visual language grammars typically depict events across the image sequence. This requires the repetition of the same information in one image as in the other, only slightly modified to show a change in state. Consider this emoji sequence:

This can either be seven different monkeys, or it can be one monkey at seven different points in time (and the recognition of this difference requires at least some cultural learning). Visual language grammars allow for both options. Note though that it doesn't parcel out the monkey as separate from the actions. It does not read "monkey, cover eyes" and then "monkey, cover mouth" etc. where the non-action monkey just gives object information and the subsequent one just gives action information. Rather, both object and event information is contained in the same unit.

So, what I'm saying is that the natural inclination for grammars in the visual form is not like the grammars that operate in the verbal or written form. They just don't work the same, and pushing graphics to try to work in this way will never work, because it goes against the way in which our brains have been built to deal with graphic information.

Again: No system that strips down graphics into isolated meanings and puts them in a sequence will ever communicate on par with actual languages. Nor will it actually communicate the way that actual visual languages do...

And this is my second point: There are already visual languages in the world that operate as natural languages that don't have the limitations of emoji.

As I describe in my book, The Visual Language of Comics, the structure of drawing naturally is built like other linguistic systems. It becomes a "full" visual language when a drawing system is shared across individuals (not a single person's style) and has 1) a large visual vocabulary that can create new and unique forms, and 2) that those vocabulary items can be put into sequences with underlying hierarchic structure.

This structure often becomes the most complex and robust in the visual languages used in comics, but we find complex visual languages in other places too. For example, in my book I devote a whole chapter to the sand drawings of Australian Aboriginals, which is a full visual language far outside the context of comics (and is used in real-time interactive communicative exchagnes). But, whether a drawing system becomes a full visual language or not, the basis for those parts is similar to other linguistic systems that are spoken or signed.

The point here is this: emoji are not a visual language, and can never be one because of the intrinsic limitations on the way that they are built. Drawings don't work like writing, and they never will.

However, the counter point is this: we already have visual languages out in the world—we just haven't been using them in ways that "feel" like language.

... yet.

Monday, September 28, 2015

New paper: Getting a cue before getting a clue

It seems the last few months on this blog have been all about inference generation... I'm happy to say this post is also the case! I'm excited to announce that I have a new paper out in the journal Neuropsychologia entitled "Getting a cue before getting a clue: Event-related potentials to inference in visual narrative comprehension."

This paper examines the brain response to the generation of inference in a particular narrative construction in comics. As far as I know, it's the first neuroscience paper to examine inference specifically in visual narratives. Specifically, our analysis focused on comparing sequences like these:

The top sequence (a) is from an actual Peanuts strip. What is key here is that you never see the main event of the sequence: Linus retrieving the ball. In my narrative structure, this "climactic" state would be called a "Peak." Rather, the image of Charlie watching ambiguously hides this event, but that panel is more characteristic of a "Prolongation" that extends the narrative further without much action.

Contrast this with (b), which has a structure that also appears in several Peanuts strips. Here, the third panel also does not show the main event (the same event as "a") but here the exclamation mark implies at least that some event is happening at least. In my narrative structure, this cue is enough to tell you that this panel is the climax, despite not showing you what the climax is.

We were curious then if the brain distinguishes between these types of sequences which both should require inference (indeed, the same inference) but differ in their narrative structure (spoiler: it does!). You can read a full pdf of the paper here. Here's the full abstract and reference:


Inference has long been emphasized in the comprehension of verbal and visual narratives. Here, we measured event-related brain potentials to visual sequences designed to elicit inferential processing. In Impoverished sequences, an expressionless “onlooker” watches an undepicted event (e.g., person throws a ball for a dog, then watches the dog chase it) just prior to a surprising finale (e.g., someone else returns the ball), which should lead to an inference (i.e., the different person retrieved the ball). Implied sequences alter this narrative structure by adding visual cues to the critical panel such as a surprised facial expression to the onlooker implying they saw an unexpected, albeit undepicted, event. In contrast, Expected sequences show a predictable, but then confounded, event (i.e., dog retrieves ball, then different person returns it), and Explicit sequences depict the unexpected event (i.e., different person retrieves then returns ball). At the critical penultimate panel, sequences representing depicted events (Explicit, Expected) elicited a larger posterior positivity (P600) than the relatively passive events of an onlooker (Impoverished, Implied), though Implied sequences were slightly more positive than Impoverished sequences. At the subsequent and final panel, a posterior positivity (P600) was greater to images in Impoverished sequences than those in Explicit and Implied sequences, which did not differ. In addition, both sequence types requiring inference (Implied, Impoverished) elicited a larger frontal negativity than those explicitly depicting events (Expected, Explicit). These results show that neural processing differs for visual narratives omitting events versus those depicting events, and that the presence of subtle visual cues can modulate such effects presumably by altering narrative structure.

Cohn, Neil, and Marta Kutas. 2015. Getting a cue before getting a clue: Event-related potentials to inference in visual narrative comprehension. Neuropsychologia 77:267-278. doi: 10.1016/j.neuropsychologia.2015.08.026.

Tuesday, September 01, 2015

Inference generating comic panels

Since my last post discussed my new paper on action stars, I thought it would be worth doing a refresher on these types of panels in the visual language of comics. "Action stars" are a type of panel that replaces a primary action of a sequence with a star-shaped flash, which on its own usually represents an impact. In the case of action stars, this representation is blown up so large that it encompasses the whole panel, as in the third panel here:

Interestingly, the "star shaped flash" of action stars does not necessarily just convey an impact—my study has shown that seems to generalize to lots of events even without an impact. One reason might be because the "star shaped flash" representation is also the way to typically represent the "carrier" of sound effects. Sound effects, like "Pow!" do typically—but not always—accompany action stars. So, this representation is technically polysemous between impacts and loud sounds—the same physical representation can have multiple meanings—and in the case of action stars it is a little ambiguous.

The key thing I want to focus on here though is that action stars replace the primary actions of the sequence, and thus cause those events to be inferred. In the example above, you infer that Snoopy is run over by the boys playing football, though you don't see it. This doesn't happen "in between the images," but happens at the action star itself, though you don't know what that event is until the next panel.

I discuss these types of "replacing panels" ("suppletive" in linguistic parlance) quite a bit in my book, The Visual Language of Comics, where I pointed out that not all images can work in this way. For example, the "fight cloud" in (b) does work effectively to replace panels—here meaning specifically a fight, not a general action like action stars. But, not all panels can play this "replacing" role. For example, using a heart to replace a whole panel doesn't work as well (c), even when it's used in a context where it could be possible (d):

So, not all elements can replace actions in panels. Recently, I stumbled on another one though in the comic Rickety Stitch and the Gelatinous Goo where an imp uses magic to transform into looking like a gnome:

Again, a full panel here does not depict the action, but replaces the event, leaving it to be inferred. In this case, the "poof cloud" provides a particularly useful covering for avoiding the representation of the physical transformation (which might be a pain to draw). Instead, this transformation is left to the audience's imagination.

In many cases, the onomatopoeia is not needed for these replacement panels, and I've found examples both with and without text. Similar replacement seems to occur without the visual language of the images (clouds, stars), and with the onomatopoeia alone, as in this Calvin and Hobbes strip:

So, onomatopoeia and these replacing panels can go together, or separately. In all though we seem to have an overarching technique for "replacing actions" with visual and/or verbal information which causes inference for the missing information. In the case of the visual information, it seems we have at least three systematic usages: action stars, fight clouds, and poof clouds. Perhaps there are more?

Wednesday, August 26, 2015

New paper: Action starring narrative and events

Waaaay back in 2008 I first posted about a phenomenon in comics that I called an "action star", such as the third panel in this sequence:

I argued that these panels force a reader to make an inference about the missing information (in this case Snoopy getting hit by football players), and that these images also play a narrative role in the sequence—they are narrative climaxes. Because this inference omits information within this panel, it is different than the type of "closure" proposed by McCloud to take place between the panels. Rather, you need to get to the last panel to figure out what happened in the one prior, not what happens between panels 3 and 4.

So, to test this back 7 years ago, I ran a few experiments...

At long last, those studies are now published in my new paper, "Action starring narrative and events" in the Journal of Cognitive Psychology. Though McCloud placed inference as one of the most important parts of sequential image understanding over 20 years ago, and this has been stressed in most all theories of comics, this is one of the first papers to explore inference with actual experiments. I know of a few more papers that will be following too, both by me and others. Exciting!

You can find the paper along with all of my other downloadable papers, or you can check it out directly here (pdf).

Here's the full abstract:

Studies of discourse have long placed focus on the inference generated by information that is not overtly expressed, and theories of visual narrative comprehension similarly focused on the inference generated between juxtaposed panels. Within the visual language of comics, star-shaped “flashes” commonly signify impacts, but can be enlarged to the size of a whole panel that can omit all other representational information. These “action star” panels depict a narrative culmination (a “Peak”), but have content which readers must infer, thereby posing a challenge to theories of inference generation in visual narratives that focus only on the semantic changes between juxtaposed images. This paper shows that action stars demand more inference than depicted events, and that they are more coherent in narrative sequences than scrambled sequences (Experiment 1). In addition, action stars play a felicitous narrative role in the sequence (Experiment 2). Together, these results suggest that visual narratives use conventionalized depictions that demand the generation of inferences while retaining narrative coherence of a visual sequence.

Cohn, Neil, and Eva Wittenberg. 2015. Action starring narratives and events: Structure and inference in visual narrative comprehension. Journal of Cognitive Psychology.

Wednesday, August 19, 2015

Comicology conference in Japan

For anyone who might happen to be in Japan at the end of next month, I'll be speaking at Kyoto Seika University's upcoming conference, Comicology: Probing Practical Scholarship from September 25-27th. The conference will be hosted by the Kyoto International Manga Museum, and there's an impressive lineup of speakers, so it should be a great time.

You can find more information online here (link in Japanese... looks like their English site hasn't been updated with it yet), though you can email for information here.

Here's the official poster (right click on it to check out a larger version):

I'll actually be doing a few speaking/workshops while I'm in Japan, both in Tokyo and Kyoto. Most are by invitation only, but you can email me if you're interested in learning more. My talk as part of the Comicology conference will be on Saturday the 26th.

I'm very excited to meet many of the other speakers, and it will especially be nice to see Natsume Fusanosuke again, given the great time I spent with him the last time I spoke in Japan.

(Interesting tidbit: yes, ニール•コーン is the standard way to write my name in katakana, though when I was living in Japan I started using my chosen kanji of 公安寝留. If you read kanji, it might help to know there's a little Buddhist joke in it, a remnant of my undergrad studies. I did that in part because my last name is how you spell "corn" in Japanese. I still use my hanko stamp with the kanji, and I used to have it on my business card up until just this year).

Tuesday, August 04, 2015

Comic reading fluency

At my ComicCon panel, someone asked me whether I have a measure for comic reading experience. Indeed, I do! I've been using the Visual Language Fluency Index (VLFI) score which is computed by asking participants to self-rate how often they read various types of comics, draw comics, and their expertise for reading and drawing comics. For those doing research with comics and visual narratives, this measure can be downloaded from my Resources page, along with full documentation and files for computing it.

I've used this measure across many studies now, and we often find that aspects of comprehension related to the visual language of comics correlate with participants' VLFI scores. That is, this appears to be a decent measure of proficiency that can correlate with ratings, reaction times, and even brainwave amplitudes to show differences based on participants' "fluency" in this structure.

Given this, I got to thinking... I wonder if this data could tell us something interesting about comic readers in general? So, I spent the other day combining together VLFI data from over 23 experiments that we've now done on comics over the past 8 years, which amounted to over 1000 participants. Here are some interesting things that came out...

First, VLFI scores also correlate with people's habits for reading books, watching movies, and watching cartoons. So, more proficient comic readers also consume other types of media in greater quantities (shocking, I know!). 

The average age for people to start reading comics was 8.4, with the average age of drawing comics at 9.8. These numbers are a little after when children start being able to comprehend sequential images (roughly 5 years old), so these make sense given the developmental literature.

The VLFI scores correlated with the age of participants, suggesting that people read more comics, and become more proficient at understanding them, as they age. However, an additional correlation suggested that higher VLFI scores occur for people who start reading comics at younger ages. So, proficiency benefits from starting earlier in life. These findings also echo the developmental literature

I'm sure there are additional things we can suss out of this data, especially when incorporating the things we actually measured in these experiments. These seem to be some interesting findings to start though.

Monday, July 20, 2015

Comic-Con 2015: The Scientific Study of The Visual Language of Comics

Thanks to my good friend Dan Christensen, I here present my talk from Comic-Con 2015, "The Scientific Study of The Visual Language of Comics." This was my introductory talk for a panel that consisted of three of undergraduates who had been working with me, and provides an overview how I believe that research on comics—or rather, the visual language used in comics—should progress. Alas, note there are a few blips where the video jumps due to some technical difficulties, but only a few seconds are lost.

For those wishing to follow up on things mentioned in the video... Note that my book is available here, and the experiments/diagnostics discussed are available in papers here. In particular, for more advocacy of methods of scholarship about comics, I recommend the paper "Building a better comic theory" (pdf). My experiment about page layouts is in "Navigating Comics II" (pdf). Prior works using corpus analyses are in the "Cross-cultural" papers section.


Tuesday, July 14, 2015

ComicCon 2015 postscript

As always, I had a great time at ComicCon this year! I thought my panel on "The scientific study of the visual language of comics" went great, and I greatly enjoyed talking with everyone who came by the booth to chat with me and check out my book. I am also quite thankful to Reading with Pictures for kindly hosting me at their booth!

In my panel, I gave an overview of how to do "scientific studies" of comics, taking the hard line that this is the only way we can truly get ahold of how the visual language used in comics are structure and comprehended. Much of it echoed the argument from my "Building a better comic theory" paper, along with hints from my introduction to my upcoming book, The Visual Narrative Reader

After this, three of my students each presented their own projects, including an experiment about hierarchy in page layout (Barak Tzori), and coding studies examining how American superhero comics have changed from the 1940s to the present with regards to page layout (Kaitlin Pederson) and text-image interactions (Ryan Taylor).

They totally crushed it! One way I could tell that their presentations were well-received was that most of the questions were directed at them, not me. I was super proud of them, and may try to make this a regular thing at ComicCon.

We attempted to record the presentation, so I'm looking into ways to put that online for all to see parts if not the whole thing. More to come!

Monday, June 15, 2015

The non-universality of cartoony images and comics

There are many who assume that cartoony images and the ability to understand sequential images is universal. The 1978 study "Communicating with Pictures in Nepal" by Fussell and Haaland reports on a study exploring these issues...

This study examined the understanding of images by indiivduals in Nepal. The researchers desired to communicate things related to nutrition, hygeine, reforestation, water supply construction, etc. as part of a UNICEF effort and assumed that wordless pictures would be an effective method. They therefore carried out these studies as a way to confirm that these intuitions were true by presenting 410 Nepalese individuals with drawings and asking for their interpretations. They quickly found that assumptions of universality were wrong.

First they report how Nepalese individuals understood different visual styles by asking them to interpret different types of images (for example, of people, like the image to the right). They tested photos, "blackout" photos with backgrounds omitted, highly detailed line drawings, semi-detailed line drawings with no shading, silhouettes, and cartoony schematic figures. They found that the content of different visual styles were recognized at very different proportions. People "accurately" recognized the content of cartoony (stick figure-esque) styles the worst (49%), while blockout photos without backgrounds (67%) and highly detailed line drawings (79%) were the best.

They had significant deficits understanding many aspects of single images, even images of faces with simple emotions in cartoony styles (happy - 33%, sad - 60%). They had even more difficultly related to actions (only 3% understood an image trying to convey information about drinking boiled water). Some respondents had radically different interpretations of images. For example, a fairly simple cartoony image of a pregnant woman was interpretted as a woman by 75%, but 11% thought it was a man, and others responded that it was a cow, rabbit, bird, or other varied interpretations. They also had difficulty understanding when images cut off parts of individuals by the framing of a panel (such as images of only hands holding different ingredients).

It's worth noting that when looking at the images in the paper, they did not seem overly "poorly drawn" or ambiguous, as the image above shows. By American standards, they were fairly straightforward and drawn in a simple but clear manner. So, it's not just that they were "bad drawings."

Sequences of images fared no better. A two-image comparison of a mother bottle-feeding vs. breast feeding children was only recognized by 19% as being about bottle feeding at all, while only 3% recognized that the image pair was making a comparison.

They also had no assumptions about a left to right (or reversed) reading order, with less than 50% going in this intended order. With a 3-image sequence, some even started in the middle. Even with those who read them in order, most did not understand the connections between images.  The authors note that individuals' degree of literacy was highly predictive of linear reading orders (though it's unknown whether they could also understand those connections).

They state their biggest lesson echoed "Alan Holmes after a study carried out in Kenya in 1961-2: 'It is never safe to act on assumptions as to what people will or will not understand visually without first testing the assumptions.'"

The remainder of the paper discusses efforts to improve instructional papers based on feedback from Nepalese.

In all, these findings are similar to others showing that cartoony images and sequential image understanding are not "universal" without exposure to an external cultural system. From the visual language perspective, these results are expected: one must have exposure and practice to a visual language—just like any other language—in order to understand it.

ResearchBlogging.orgFussell, Diana & Ane Haaland. (1978). Communicating with Pictures in Nepal: Results of Practical Study Used in Visual Education. Educational Broadcasting International, 11 (1), 25-31

Tuesday, May 19, 2015

Review: Unflattening by Nick Sousanis

Nick Sousanis’ recent book Unflattening has been receiving praise for its freeing message and artistic execution. The book was Sousanis’ doctoral dissertation, and in being a graphic work, it thus embodies its message of attempting to break through the confines of the "flatlands" of received viewpoints that unconsciously pervade the ways we see the world. This advocacy specifically targets the hegemony of words, with the freeing power of images—particularly through “comics."

In these regards, the book deserves all the praise it has received. The message is a good one, and the execution is wonderful and sensory. It is a pleasure to read, and nicely balances the artistic and expository, accessible and academic. Clearly, Sousanis knows his craft and carries it out effectively. It also succeeds quite well as just an example of graphic non-fiction, especially without relying on a "narrator character" like McCloud (and included). For these reasons alone the book is worth reading.

For me, the book resonated with my personal experience, but in doing so also betrayed some limitations to its own core message by upholding its own flatlands throughout. I'll spend the rest of this review focusing on these.

The book's advocacy to look beyond the limited viewpoints that one holds both unconsciously and unaware is a great message. In truth, at many times I felt it was speaking directly about my work: In attempting to teach about the theory of visual language, I often feel like Sousanis' referenced example of the sphere telling squares that there is something beyond the flat 2D world as they understand it. As stated in the introduction to my book, I offer an alternative view of the world related to language, drawing, and “comics", and am trying to teach others to see through my eyes.

For example, I cringe whenever I hear someone summarize that I believe “comics are a language,” because it is the exact opposite of what I actually believe. And, it shows that they did not fully understand the alternative viewpoint I offer, but rather pushed my spherical message into the square holes they already held.

It was related to my research that my few criticisms arose though...

First, despite arguing for an interdependence between text and image in their expressive capacities, I found it ironic that only until late in the book does text not lead this dance, and not for long. The images throughout are certainly not negligible, but they mostly enhance, supplement, and enrich the message beyond what text could serve on its own. The text still provides the primary weight of meaning throughout. Rarely are there comprehensible portions of images where the text could be omitted entirely, but the same does not seem to be said for the text (at least, not until the last three chapters). However, I would expect this from a more expository and academic work like this, so it's less of a criticism than an observation.

Second, it was also curious that in its discussion of “comics,” Unflattening most retained what I would say is the “flatland” viewpoint of what they “are” and how they work. To those unfamiliar with comics (and/or their scholarship), this message might be revelatory, but it mostly retains the message reinforced by the “party line” of standard beliefs about “comics” and drawing:

Comics "is" a medium that transcends cultural boundaries, historical periods, etc.

While language is a channel that constrains thought, drawings are freeing, because they encroach on “perception.” Drawings thus get closer to actual pure thought in a unique and individualized way that is not constrained by the memorized patterns of language.

These are the standard perspectives upheld throughout most theories of “comics” and “visual communication,” and Sousanis’ references support these views.

Yet, my work argues that all of these viewpoints are misplaced, and uphold a “flatland” of their own that does not square with our actual cognition or with much of the research on drawing. I would argue that my theory of visual language provides the “upwards not northwards” that Sousanis strives for in this regard...

Drawings are not a siphon for our perception, they reflect entrenched and learned cognitive patterns stored in our memories just as much as language, because they are built just like language. For example, drawing is not about re-presenting perception, but about learning and producing patterned graphic schemas in order to express our concepts. If you don't learn enough patterns, you won't draw proficiently. This makes drawing less like perception, and more like language: both involve stored information in memory. As I argue in my paper “Framing ‘I can’t draw,’” the assumptions about "drawing as tied to perception" actually limit people’s ability to learn to draw, and thereby limit their ability to carry out the mission Sousanis advocates.

Along these lines, Sousanis nicely evokes Lakoff and Johnson’s ideas of conceptual metaphor and Fauconnier and Turner’s ideas of conceptual blending in language. Yet, he does not mention that drawings also use these same conceptualizations (see for example work by Forceville, who will have a summary chapter in my upcoming book).

Furthermore, contrary to the received wisdom, I argue that "comics" "is" not a medium, but rather "comics" are a social construct in which we use two methods of communication: writing (a convergence of spoken language into the visual) and drawing (natural visual language). The union of these things is not “comics” but is our default capacity for multimodal expression, which often happens to appear in a social construct we call “comics”—but not always. As Horrocks’ pointed out, McCloud and others conflated the notion of (sequential) drawing and/or writing into being "comics", a definition constrained by their own flatlands.

Whether you buy into my theories or not, by collapsing these independent but intertwined facets of expression (writing, drawing) into a single construct (“comics”), Sousanis betrays his own mission of attempting to break apart limiting conceptualizations. As a unified thing tied to its stereotypes, it becomes constrained, rather than freed by the unlimited potential of just writing and drawing unbound to such a social construct.

Thus, overall I do recommend the book and wholeheartedly agree with Sousanis’s messages, especially for using text (written/spoken language) and image (visual language) in concert. Indeed I would go further to say that this is our default cognitive human orientation for expression (plus gesture/sign language). However, in its framing of this position, Unflattening retains its own “flatland” that is strived to be overcome, which perhaps in part further speaks to the overall message that escaping one’s frames of knowledge is harder than we might realize.

Thursday, March 12, 2015

New paper on comic page layouts

I'm excited to announce that my paper, "Navigating Comics II" on people's preferences for moving from panel-to-panel in comic page layouts is now published in the latest issue of Applied Cognitive Psychology! This project was undertaken by my student (and co-author), Hannah Campbell, for my course on the Cognition of Comics at UC San Diego.

This project is a follow up to my previous study looking at participants' preferences for how to navigate through comic page layouts, also discussed in my book on the visual language of comics. While we tested several different features of page layouts, here's a graphic version of our most interesting finding:

You can find the paper at my Downloadable Papers page, or directly here (pdf).


Although readers typically believe that comic page layouts should be read following the left to right and down ‘Z-path’ inherited from written language, several spatial arrangements can push readers to deviate from this order. These manipulations include separating panels from each other, overlapping one panel onto another, and using a long vertical panel to the right of a vertical column to ‘block’ a horizontal row. We asked participants to order empty panels in comic page layouts that manipulated these factors. All manipulations caused participants to deviate from the conventional Z-path, and this departure was modulated by incremental changes to spatial arrangements: The more layouts deviated from a grid, the less likely participants were to use the Z-path. Overall, these results reinforce that various constraints push comic readers to engage with panels in predictable ways, even when deviating from the traditional Z-path of written language.


Cohn, Neil and Hannah Campbell. 2015. Navigating comics II: Constraints on the reading order of page layouts. Applied Cognitive Psychology. 29: 193-199

Thursday, February 05, 2015

New paper: Notion of the Motion

I'm excited to say that my paper, "The notion of the motion: The neurocognition of motion lines in visual narratives" with Steve Maher is now published in the latest issue of Brain Research. It examines the comprehension of motion lines in comics. We show that having no lines is worse than having motion lines, but having backwards, anomalous lines is even worse than no lines.  In their context in a sequence of images, processing of these anomalies may activate brain areas typically related to language processing. In addition, this understanding is also modulated by people's experience reading comics, suggesting that they are conventionalized pieces of "vocabulary" in the visual language of comics.

The paper is available here (pdf) and in the "visual vocabulary" section of my Downloadable Papers page along with all other papers about visual language research. A short, graphic summary is readable here.

Full abstract:

Motion lines appear ubiquitously in graphic representation to depict the path of a moving object, most popularly in comics. Some researchers have argued that these graphic signs directly tie to the “streaks” appearing in the visual system when a viewer tracks an object (Burr, 2000), despite the fact that previous studies have been limited to offline measurements. Here, we directly examine the cognition of motion lines by comparing images in comic strips that depicted normal motion lines with those that either had no lines or anomalous, reversed lines. In Experiment 1, shorter viewing times appeared to images with normal lines than those with no lines, which were shorter than those with anomalous lines. In Experiment 2, measurements of event-related potentials (ERPs) showed that, compared to normal lines, panels with no lines elicited a posterior positivity that was distinct from the frontal positivity evoked by anomalous lines. These results suggested that motion lines aid in the comprehension of depicted events. LORETA source localization implicated greater activation of visual and language areas when understanding was made more difficult by anomalous lines. Furthermore, in both experiments, participants׳ experience reading comics modulated these effects, suggesting motion lines are not tied to aspects of the visual system, but rather are conventionalized parts of the “vocabulary” of the visual language of comics.

Cohn, Neil and Stephen Maher. 2015. The notion of the motion: The neurocognition of motion lines in visual narratives. Brain Research. 1601: 73-84