Friday, April 22, 2016

New paper: Reading without words

One of the most frequent questions that people ask about reading comics is "what are people's eyes doing when comprehending comics?" More and more, I see people planning eye-tracking experiments with comics where eyes are recorded for where they move across a page or strip. I've reviewed a number of these studies on this blog, and many use fairly ad hoc methods without systematically manipulating elements within the experiment.

I'm very proud to say that my new paper, "Reading without words: Eye-movements in the comprehending of comic strips," with Tom Foulsham and Dean Wybrow in the journal Applied Cognitive Psychology, addresses these very issues. I was very happy to collaborate with Tom on this project, and it should be the first of several papers related to eye-tracking that we will produce. To our knowledge, this is the first paper on eye-tracking of comics that systematically manipulates aspects of the sequences in controlled experiments outside of free-form reading or without resorting to post-hoc alterations.

We did two experiments where participants read Peanuts comics in either a coherent or scrambled sequence. In Experiment 1, participants were presented with each panel one-at-a-time, while Experiment 2 presented them in a 3x2 grid. Overall, we found that people had more dispersed eye-movements for the scrambled strips, which also created more "regressions" (looks backward) to other panels in the sequence. This study also addressed a few myths of how comics are understood:

1) By and large, reading order in the 3x2 grid resembled that of text—a left-to-right and down motion with regressions to adjacent units. There was no "scanning" of the page prior to reading, as some have claimed.

2) We also found no real difference in eye-movements for the content of panels between layouts. That is, changing the layout did not affect the comprehension of the sequence.

You can download the paper on my Downloadable Papers page or click here for a pdf.

Here's the full abstract:

"The study of attention in pictures is mostly limited to individual images. When we ‘read’ a visual narrative (e.g., a comic strip), the pictures have a coherent sequence, but it is not known how this affects attention. In two experiments, we eyetracked participants in order to investigate how disrupting the visual sequence of a comic strip would affect attention. Both when panels were presented one at a time (Experiment 1) and when a sequence was presented all together (Experiment 2), pictures were understood more quickly and with fewer fixations when in their original order. When order was randomised, the same pictures required more attention and additional ‘regressions’. Fixation distributions also differed when the narrative was intact, showing that context affects where we look. This reveals the role of top-down structures when we attend to pictorial information, as well as providing a springboard for applied research into attention within image sequences."



Foulsham, Tom, Dean Wybrow, and Neil Cohn. (2016) Reading Without Words: Eye Movements in the Comprehension of Comic Strips. Applied Cognitive Psychology .

Tuesday, March 29, 2016

Dispelling myths of comics understanding

In reading through various works about comics understanding, I keep hearing several statements repeated over and over. But, several of these statements are not reflective of the way people actually understand comics. So, I'd like to go through several of these myths about "understanding comics" and explain why they aren't true:

1. Page layout ≠ sequential image understanding

This is one of the biggest recurring myths that I see, and was leveled against both McCloud's panel transitions (see below) and my own model of narrative grammar. Somehow, because we focus on the relations between panel content, it is seen as denying the "meaning" that arises from a page layout. This myth conflates content and layout

Content is the meaningful connections between the depictions within panels. Page layout is the physical arrangement of panels on a canvas (like a page). While there are some cases where page layout can factor into the relations between panel content, these are not the same thing and are fully independent structures.

First off, it is easy to see that layout and content are different because you can rearrange panels into different layouts and it does not change the meaning. So long as the order of panels remains the same, then it doesn't matter if six panels are in a 2 x 3 grid, 6 panels vertically, or 6 panels horizontally. Now, you may end up manipulating the visual composition of panel relations by rearranging them, but that is still not necessarily the same as the "understanding" that is derived from the relations between the meaningful content in images.

Second, we also know that page layout is different from content because we've done experiments on them. In two separate studies, we gave people comic pages with empty panels and asked them to number the panels in the order they'd read them. We find that people choose consistent orderings of panels, even in unusual patterns, that does not rely at all on panel content (pdf, pdf). That is, knowing how to "read" a page is not contingent on panel content.

Also, in a recent study we actually tested the difference between single-panel viewing and panels arranged in a 3 x 2 grid. We effectively found little difference between what people look at and how they understand the panels in the different "layouts." In this case, alterations of layout makes no difference on comprehension of content.

Finally, when we do an experiment where we present panels one at a time, it is not a confound of how people actually understand sequential images. In fact, it is the opposite. These types of experiments aren't looking at "sequential image understanding" in general, but each experiment focuses on specific aspects of comprehension. In doing science, you want to factor out as many variables as you can so that your experiment focuses on the specific thing you are examining. If we included page layout, that is one additional variable that might confound the actual specific things we're examining. 


2. There is no "closure" that takes place between juxtaposed images

I have great fondness for McCloud's seminal theories on comics, but we've reached a point where it is clear that several of his claims are not accurate. In particular, the belief that we "fill in the gap" between panels is simply not correct. While it is the case that readers definitely make inferences for information that is not depicted, this inference does not occur "in the gutter" and also not in panel-to-panel juxtapositions (pdf).

We've now shown evidence of this for several experiments. First, when we do examine how people make inferences (like in action stars in the image to the left, or when that image would be deleted), the evidence for understanding missing information is not reflected in the gap between images, but in understanding the content of the second image relative to the first (or relative to several previously). We see this with slower reading times to the panel after an inference (pdf), or to particular brain responses associated with "mental updating" at the panel after an inference (pdf).

Also, we've shown that people make predictions about what content will come in subsequent images. In one study, we showed that disrupting a sequence within a grouping of panels is worse than disrupting it between groupings (pdf, video). Crucially, the disruption within the first grouping was understood worse than at the break between groupings. Because both of these disruptions appeared before the break between images, people could not have been using the "panel transition" at the break as a cue for those groupings. Rather, people's brains had to have been predicting upcoming structure. This means that there was no "filling in the gaps" because the subsequent image relationship had not yet even been reached.


3. Not all juxtaposed image relationships are meaningful somehow

There is a pervasive myth that somehow all possible juxtapositions of panels can somehow be meaningful. This implies that, no matter what panels are arranged, some type of meaning will be construed. Because of this, it implies that any image has roughly equal probability of appearing after any other panel. This is simply untrue. Not only do we show that different panel relations are understood by the brain in different ways, but people also will choose to order panels in particular orders, not random choices. This emerges in discriminations that...

1. Scrambling the orders of panels is worse than coherent narrative orders (pdf, pdf)
2. Fully random panels pulled from different comics are worse than narrative orders and those where random panels share meaningful associations (like being random panels, but all about baseball) (pdf)
3. Switching the position of some panels in a sequence is worse than others—the longer the switch, the worse the sequence. Also, switches across groupings of panels is worse than within groupings (pdf)
4. People choose to omit some panel types more than others. Those same types are also recognized when missing more often than those that are chosen to be deleted less. (pdf)
Etc.
Etc.

You can also just ask people about relations between panels: if you give them a "yes/no" rating of whether panel relations are comprehensible, they will consistently say that those expected to be anomalous are indeed incongruous. Or if they rate sequences on a 1 to 7 scale, they will consistently rate the incongruous ones as lower than the congruous ones. While conscious interpretation can be somewhat problematic (see below), people are fairly uniform in their assessment of whether something "makes sense" or does not.


4. Consciously explaining a relation between panels is different than an immediate, unconscious brain response.

This one is particularly important, especially for understanding experimental results like reaction times or brain responses. When you derive meaning from a relationship between panels, your brain will respond in a way that attempts to integrate those meanings together. Sometimes, no relation may be made, and you can measure this relationship by comparing different types of relations to each other. This brain response is also very fast: Brainwaves reveal that people recognize the difference between incongruous or out-of-place panels and congruous panels in a sequence in less than half a second.

This type of "meaning" is different than what might be consciously explained. Sure, you may be able to concoct some far flung way in which two totally unrelated images might have some relationship. However, this post hoc conscious explanation does not mean that is the way you brain is deriving meaning from the relation between images, and is far slower than that brain process.

In fact, such explanations are evidence in and of themselves that the relationship may be incongruous: if you have to do mental gymnastics and consciously explain a relation, you are compensating for the clear lack of a relationship that actually exists between those images.

-----

Want more advice about how to do research on the visual language of comics? Check out this paper and these blog posts.

Wednesday, February 17, 2016

Mayan visual narratives in the BBC!

I'm very happy to say that David Robson over at the BBC has a new article out discussing Jesper Nielsen and Søren Wichmann's chapter in my new book, The Visual Narrative Reader.  Their chapter, and the BBC article, examine the structural properties of Mayan visual narratives found on the sides of pottery.

There are a lot of great things in their chapter that motivated me to invite them to be a part of the collection. Foremost, they nicely show that these Mayan systems share many properties with the "visual languages" used in comics and other cultures, ranging from the way they show sequences to the way they use text-image relationships and graphic signs like lines to show smells or speech.

In my conception of sequential image systems being like language, there is no one visual language, but rather there are many throughout the world. In addition, just as spoken languages change and die off over time, so do visual languages. The system used in the Mayan visual narratives thus reflects a “(Classic) Mayan Visual Language” tied to a particular time period and location. Similarly, we could identify historical visual languages from different time periods all over the world.

I’ll point out also that this is different than saying that Mayans used “comics.” This is not the case. “Comics” are the context in which we use some visual languages in contemporary society, and casting that label back in time is inappropriate. Rather, they have a visual language that is used in its own context tied to its era.

What makes the Mayan examples nicely illustrative is that they are an older, historical version of this that is preserved in the historical record. The visual language used in sand drawings (also discussed in two chapters of The Visual Narrative Reader.) disappears once it is drawn, because of the medium of sand, while the accompanying gesture/signs and speech disappear because they are spoken verbally. This means there is no historical record of them. But, the Mayan examples on pottery and ceramics are drawn and include writing, those artifacts can provide a window into past human behavior as a multimodal animal.

Finally, what I really liked about this article—beyond the the subject matter—was the way in which the subject matter was analyzed using systematic linguistic methods. I think this nicely shows how much of what has previously been discussed in “art history” can really be transported to the linguistic and cognitive sciences given the theory of visual language. If we’re talking about the structure of older drawing systems, then we’re not discussing “art” per se, but rather are discussing ancient visual languages and their structure. Further focus like this can contribute towards building a study of historical visual linguistics that can then analyze such material the same way as we think of any other type of linguistic system.

Monday, February 01, 2016

New Paper: The pieces fit

Magical things happen at conferences sometimes. Back at the Cognitive Neuroscience Society conference in 2014, I ran into my graduate school friend, Carl Hagmann, who mentioned he was doing interesting work on rapid visual processing, where people are asked to detect certain objects within an image sequence that changes at really fast speeds (like 13 milliseconds). He noticed that I was doing things with image sequences too and thought we should try this rapid pace with visual narratives (similar to this old paper I blogged about).

Lo and behold, it actually happened, and now our paper is published in the journal Acta Psychologia!

Our paper examines how quickly people process visual narrative sequences by showing participants the images from comics at either 1 second or half a second. In some sequences, we flipped the order that images appeared. In general, we found that "switches" of greater distances were recognized with better accuracy and those sequences were rated as less comprehensible. Also, switching panels between groupings of panels were recognized better than those within groups, again showing further evidence that visual narratives group information into constituents.

This was quite the fun project to work on, and it marks a milestone: It's the first "visual language" paper I've had published where I'm not the first author! Very happy about that, and there will be several more like it coming soon...

You can find the paper via direct link here (pdf) or on my downloadable papers page.


Abstract:

Recent research has shown that comprehension of visual narrative relies on the ordering and timing of sequential images. Here we tested if rapidly presented 6-image long visual sequences could be understood as coherent narratives. Half of the sequences were correctly ordered and half had two of the four internal panels switched. Participants reported whether the sequence was correctly ordered and rated its coherence. Accuracy in detecting a switch increased when panels were presented for 1 s rather than 0.5 s. Doubling the duration of the first panel did not affect results. When two switched panels were further apart, order was discriminated more accurately and coherence ratings were low, revealing that a strong local adjacency effect influenced order and coherence judgments. Switched panels at constituent boundaries or within constituents were most disruptive to order discrimination, indicating that the preservation of constituent structure is critical to visual narrative grammar.


Hagmann, Carl Erick, and Neil Cohn. 2016. "The pieces fit: Constituent structure and global coherence of visual narrative in RSVP." Acta Psychologica 164:157-164. doi: 10.1016/j.actapsy.2016.01.011.

Thursday, January 28, 2016

New Book: The Visual Narrative Reader

I'm very excited to announce that today is the release date for my new book, The Visual Narrative Reader! What makes this one so fun is that I didn't write the whole thing—it features chapters from many luminaries of studying visual narratives. Here's how it came about...

Shortly after the release of my last book, The Visual Language of Comics, I was biking home from work and started reflecting on the important papers that had influenced me along my journey of doing this research.

I thought about David Wilkin's great paper on Australian sand narratives that fully challenged my conceptions of drawings, which I read from a third generation photocopy right after college. I thought about Charles Forceville's great work on visual metaphor in comics, and Jun Nakazawa's psychology experiments looking at how kids (and adults) in Japan comprehend comics. Or, Brent Wilson's 40 years of research looking at how kids across the world draw visual narratives. Or, maybe there were the interesting dissertations that looked at the relations between McCloud's panel transitions and linguistic discourse theories.

All of this work greatly influenced my theories. And yet, much of the people in Comics Studies or other fields looking at visual narratives had no idea that most of this work existed. In many cases, these papers were incredibly hard to find! (One of the dissertations I had to print off of microfiche, another paper was unable to be found by interlibrary loan)

So, I decided that someone aught to compile this work together so that it would be readable by a larger audience, and I decided that that "someone" should be me! The result is the new book that just became available.

I feel very proud to have been able to combine these great works into one volume that can hopefully enrich people's knowledge of visual narratives and the various research that has gone into its cross-disciplinary study over the years.

You can find more information about the book on my website here, along with praise from scholars and creators of comics alike. I hope you like it as much as I do!

Here's the table of contents:

Preface

1. Interdisciplinary approaches to visual narrative, Neil Cohn

Section 1: Theoretical approaches to sequential images
2. Linguistically-oriented comics research in Germany, John Bateman and Janina Wildfeuer
3. No content without form: Graphic style as the primary entrance to a story, Pascal Lefèvre
4. Conceptual Metaphor Theory, Blending Theory, and other Cognitivist perspectives on comics, Charles Forceville
5. Relatedness: Aspects of textual connectivity in comics, Mario Saraceni
6. A little cohesion between friends; Or, we're just exploring our textuality, Eric Stainbrook

Section 2: Psychology and development of visual narrative
7. Manga literacy and manga comprehension in Japanese Children, Jun Nakazawa
8. What happened and what happened next: Kids’ visual narratives across cultures, Brent Wilson

Section 3: Visual narratives across cultures
9. The Walbiri sand story, Nancy Munn
10. Alternative representations of space: Arrernte Narratives in Sand, David Wilkins
11. Sequential text-image pairing among the Classic Maya, Søren Wichmann and Jesper Nielsen
12. Linguistic relativity and conceptual permeability in visual narratives: New distinctions in the relationship between language(s) and thought, Neil Cohn

Further Reading
Index

Wednesday, December 30, 2015

New paper: The vocabulary of manga

I'm happy to announce that my new article with co-author Sean Ehly, "The vocabulary of manga: Visual morphology in dialects of Japanese Visual Language" is now published in the Journal of Pragmatics!

This paper is especially exciting, because my co-author is a former student who did this project as part of his class project. It now joins previous publications stemming from projects from that class, with more on the way!

Sean wanted to investigate the "morphology" of the Japanese Visual Language that are used in manga—the graphic elements like bloody noses for lust or a giant sweat drop for anxiety. I had discussed some of these in my book, but Sean recognized that there were many that I missed. He listed over 70 of these elements related to emotion alone! In fact, as a resource to other researchers and fans, we've now compiled this "visual vocabulary" into a list:

Morphology in Japanese Visual Language

We don't consider it exhaustive, so if you think of others that should be added, please let us know!**

We then used this list to investigate how they are used in 20 different manga—10 shojo and 10 shonen—which amounted to over 5,000 panels coded across these books. Overall, we show that most of these "visual morphemes" appear in both types of books, though certain morphemes are more prevalent in one type or antoher. We take this as first empirical evidence that there may be distinct "dialects" within a broader Japanese Visual Language, at least for this one dimension of structure.

The paper is available along with all others at my Downloadable Papers page, and directly as a pdf. Here's the full abstract:

Abstract
The visual representations of non-iconic elements in comics of the world often take diverse and interesting forms, such as how characters in Japanese manga get bloody noses when lustful or have bubbles grow out their noses when they sleep. We argue that these graphic schemas belong to a larger ‘‘visual vocabulary’’ of a ‘‘Japanese Visual Language’’ used in the visual narratives from Japan. Our study first described and categorized 73 conventionalized graphic schemas in Japanese manga, and we then used our classification system to seek preliminary evidence for differences in visual morphology between the genres of shonen manga (boys’ comics) and shojo manga (girls’ comics) through a corpus analysis of 20 books. Our results find that most of these graphic schemas recur in both genres of manga, and thereby provide support for the idea that there is a larger Japanese Visual Language that pervades across genres. However, we found different proportions of usage for particular schemas within each genre, which implies that each genre constitutes their own ‘‘dialect’’ within this broader system.

Cohn, Neil and Sean Ehly. 2016. The vocabulary of manga: Visual morphology in dialects of Japanese Visual Language. Journal of Pragmatics. 92: 17-29.



** Longtime followers of this site may remember that we attempted a similar listing for morphology across different visual languages based on a discussion on my now defunct forum over 10 years ago. Perhaps I'll have to create additional pages for other visual languages as well, now that we have ongoing corpus research underway...

Tuesday, November 24, 2015

New Paper: A multimodal parallel architecture

I'm excited to share that I now have a new article in the latest issue of Cognition: "A multimodal parallel architecture: A cognitive framework for multimodal interactions." This paper presents my overall model of language, and then uses it to explore different aspects of multimodal communication.

The key distinctions in this paper are about multimodal relations that must balance grammar in multiple domains. Many models of multimodal relations describe the various meaningful (i.e., semantic) interactions between modalities. This paper extends beyond these relationships to talk about how the dominance of meaning in one modality or another must negotiate grammatical structure in one or multiple modalities.

This paper has had a long journey... I first had many of these ideas way back in 2003, and they were part of an early draft on my website about multimodality called "Interactions and Interfaces." In 2010, I started reconsidering how to integrate the theory in the context of my mentor Ray Jackendoff's model of language—the parallel architecture. The component parts were always the same, but articulating them in this way allowed for a better grounding in a model of cognition, and in further elaborating on how these distinctions about multimodality fit within a broader architecture. I then tinkered with the manuscript on and off for another 5 years...

So, 12 years later, this paper is finally out! It pretty much lays out how I conceive of language and different modalities of language (verbal, signed, visual), not to mention their relationships. I suppose that makes it a pretty significant paper for me.

The paper can be found on my Downloadable Papers page, and a direct link (pdf) is here.

Abstract:
Human communication is naturally multimodal, and substantial focus has examined the semantic correspondences in speech–gesture and text–image relationships. However, visual narratives, like those in comics, provide an interesting challenge to multimodal communication because the words and/or images can guide the overall meaning, and both modalities can appear in complicated ‘‘grammatical” sequences: sentences use a syntactic structure and sequential images use a narrative structure. These dual structures create complexity beyond those typically addressed by theories of multimodality where only a single form uses combinatorial structure, and also poses challenges for models of the linguistic system that focus on single modalities. This paper outlines a broad theoretical framework for multimodal interactions by expanding on Jackendoff’s (2002) parallel architecture for language. Multimodal interactions are characterized in terms of their component cognitive structures: whether a particular modality (verbal, bodily, visual) is present, whether it uses a grammatical structure (syntax, narrative), and whether it ‘‘dominates” the semantics of the overall expression. Altogether, this approach integrates multimodal interactions into an existing framework of language and cognition, and characterizes interactions between varying complexity in the verbal, bodily, and graphic domains. The resulting theoretical model presents an expanded consideration of the boundaries of the ‘‘linguistic” system and its involvement in multimodal interactions, with a framework that can benefit research on corpus analyses, experimentation, and the educational benefits of multimodality.


Cohn, Neil. 2015. A multimodal parallel architecture: A cognitive framework for multimodal interactions. Cognition 146: 304-323

Monday, November 09, 2015

How to analyze comics with narrative grammar

Over the past several years, I've presented a lot of evidence that panel-to-panel "transitions" cannot account for how we understand sequences of images in visual narratives like comics. Rather, I've argued that narrative sequential images use a "narrative grammar" that assigns roles to panels, and then groups panels into hierarchic relationships.

Though there are many reasons panel transitions don't work to explain how we understand sequential images, one of the reasons why panel transitions as a theory may be attractive is because it is intuitive to see outright. A person can easily look at a sequence and assign transitions between panels and it "feels" right because that matches one's conscious experience of reading a comic (though it is not very cognitively accurate).

In contrast, my theory of narrative grammar is fairly complex, and much harder to intuit. Though, I think this is somewhat as it should be—there's a lot of complicated things going on sequential images that people don't realize! However, this complexity means that people might have a hard time of implementing the theory in practice.

SO... to help rectify this issue I've now written a "tutorial" that aims to explain the process that people should follow when analyzing a visual narrative sequence and are attempting to implement this theory of narrative grammar.

You can download a pdf of the tutorial here, while it can be found also on my Downloadable Papers page and my Resources page.

The simple summary is that one cannot simply look at a sequence and assign labels to things. There are a series of procedures and diagnostics to use, and there is an order of operations that is optimal for arriving at an analysis. This is the same as most any linguistic theory, which usually requires instruction or dedicated learning in order to implement.

This tutorial is aimed at researchers (or anyone curious) who wish to implement this theory in practice and/or learn more about the underlying logic for how it works. It is also aimed at teachers who might wish to instruct this theory in their classrooms, but may not know how to do it with competence.**

As you'll find in the tutorial, it only somewhat actually covers the basic principles of the theory. For this you should reference my papers and my book, The Visual Language of Comics. The tutorial can thus supplement these works for a full understanding and implementation of the theory.


**On this point, note: anyone who wants to learn how to do this, especially with the intent of putting into practice in research or instruction should feel free to contact me for more guidance and resources.

Monday, November 02, 2015

Dispelling emoji myths

In my recent BBC article and my blog posts about emoji, I have tried to explain how emoji are not an emerging language, but that they do serve important functions that resemble other limited communicative systems.

Having now poked around online quite a bit looking at what people say about emoji, I'm particularly struck by the repetition of a few myths. Since these misunderstandings creep up all over the place, I wanted to address them here...

1. Emoji are not like hieroglyphics

First off, many people have compared emoji to Egyptian hieroglyphics, either saying that they work exactly the same and/or that emoji are a "modern hieroglyphics."

This is simply not true, mostly because hieroglyphics were a full blown writing system where each sign had a mapping to sound. Hieroglyphics are not "symbol systems" made up of pictures. To me, this seems like the kind of misperception that people who are only used to an alphabet have about other writing systems: "if each sign isn't a sound like a letter, it must be just about meanings!"

There are actually several ways that hieroglyphics operated as a writing system. Some signs did indeed mean what they represented. For example, the sign for "owl" looked like an owl, and was pronounced "m":


However, the use of "rebus" signs meant that those signs could also be used without that iconic meaning, and only would be used for their sound value (i.e., that owl sign would be used for many words using the sound "m," but not for its meaning of "owl").

From there, these both of these types of signs could be combined into compound signs. For example, this combination takes the rebus of owl (using just the sound "m") and the sign for ear (using its meaning, but not pronunciation) for the word "to hear":


This type of compound used signs both for their meaning value and for their sound value. There are no compounds made up of two signs that just contribute to meaning—they always have some sound-based sign present. Hieroglyphics also sometimes use fairly abstract representations, and purely sound-based signs which vary based on the number of consonants they represent.

In sum, unlike the purely imagistic meanings found in emoji, hieroglyphics are a fully functioning writing system that is intrinsically tied to the Egyptian language. This is totally different from emoji in context also because the imagistic emoji accompany a separate writings system (for English speakers, the alphabet). In the case of hieroglyphics, they are the writing system.

I'll note also, these same things apply to Chinese characters. Though they work a little different than hieroglyphics, the same basic principles apply: it's a writing system tied to the sounds of languages, not a series of pictures that only have imagistic meaning.


2. There is no such thing as a universal language

I have seen many people exhort that one of the exciting things about emoji is their quality of transcending spoken languages to be a "universal language." This is also hogwash, for many reasons. No language is universal, whether verbal, signed, or visual. Here are several reasons why images (including emoji) are not, and cannot be, universal:

How they convey meaning

Just because images may be iconic—they look like what they represent—does not mean that they are culturally universal. Even simple things like the way people dress does not translate across cultures, not to mention variables in facial expressions or, even more dramatic, fully conventionalized meanings like giant sweat drops to convey anxiety. Note that, since they were created in Japan originally, many emoji are already culturally specific in ways that do not translate well outside Japan.

This is not to mention the limitations of emoji that I discussed in my BBC article, such as that they rely on a non-producible vocabulary that does not allow the easy creation of new signs, and their sequence maintain a simple system characteristic of impoverished grammar. In other words, they are highly limited in what they can express, even as a graphic system.

Cultural exposure

We also know that images are not universal because a host of studies have shown that people who do not have cultural exposure to images often have difficulty understanding the meanings of images. Such deficits were investigated prevalently in the 1970s and 1980s under the umbrella of "visual literacy." Here's how I summarized one such study examining individuals from Nepal from Fussell & Haaland (1978):

As described in the paper, the individuals tested "had significant deficits understanding many aspects of single images, even images of faces with simple emotions in cartoony styles (happy - 33%, sad - 60%). They had even more difficultly related to actions (only 3% understood an image trying to convey information about drinking boiled water). Some respondents had radically different interpretations of images. For example, a fairly simple cartoony image of a pregnant woman was interpretted as a woman by 75%, but 11% thought it was a man, and others responded that it was a cow, rabbit, bird, or other varied interpretations. They also had difficulty understanding when images cut off parts of individuals by the framing of a panel (such as images of only hands holding different ingredients)."

Such findings are not rare. For example, Research into Illustration by Evelyn Goldsmith summarizes several studies along these lines. Bottom line: Understanding drawings requires exposure to a graphic system, just like learning a language.

There is not just one visual language

Most discussion of the universality of images focuses on how they are comprehended. But, this overlooks the fact that someone also had to create those images, and images vary widely in their patterns across the world.

That is, as I argue in my book, there is not just one visual language, but rather there are many visual languages in the world too. There's a reason why the "style" of American superhero comics differs from Japanese manga or French band desinee or instruction manuals, etc. Drawing "styles" reflect the patterns of graphic information stored in the minds of those who create them. These patterns vary across the world, both within and between cultures.

This happens because people are different and brains are not perfect. There will always be variation and change within what is perceived to be a coherent system. This is in part because any given language is actually a socio-cultural label applied to the system(s) used by individual people. There is no "English" floating out in the ether to which we all link up. Rather, "English" is created by the similarities of patterning between the languages spoken by many people.

Indeed, though many who speak "English" can communicate in mutually intelligible ways, there are hundreds of sub-varieties of "English" with variations that range from subtle (slight accents) to dramatic (changing vocabulary and grammar), both across geographic, cultural, and generational dimensions.

Similarly, even if there was to be a universal language—be it spoken or visual—sub-varieties would emerge based on who is using the system and how they do it. Just because images are typically iconic does not mean that they are transparent and outside of cognitive/cultural patterns.

Emoji in part exemplify this facade that a language is external to the patterns in people's minds, since the vocabulary is provided by tech companies and does not directly emerge from people's creations. Someone (ICANN) decides which emoji can be used, and then makes them available. This is the opposite of how actual languages work, as manifestations of similarities between cognitive structures across speakers.

In sum, drawings are not universal because drawings differ based on the cultural "visual languages" that result from people using different patterns across the world.

Tuesday, October 27, 2015

New Paper: Narrative Conjunction's Junction Function

I'm excited to announce that my new paper, "Narrative Conjunction's Junction Function" is now out in the Journal of Pragmatics! This is the first major theoretical paper I've had in a long time, and it goes into extensive detail about several aspects of my theory of how narrative image sequences are comprehended, Visual Narrative Grammar.

The main topic of this paper is "conjunction" which is when multiple panels are grouped together and play the same role in a sequence. I argue that this narrative pattern is mapped to meaning in several different ways. In addition to these arguments, the paper provides a fairly extensive treatment of the basics of my narrative theory along with the underlying logic it is guided by (i.e., diagnostic tests).

You can find the paper here (pdf) or along with my other downloadable papers. Here's the full abstract:

Abstract

While simple visual narratives may depict characters engaged in events across sequential images, additional complexity appears when modulating the framing of that information within an image or film shot. For example, when two images each show a character at the same narrative state, a viewer infers that they belong to a broader spatial environment. This paper argues that these framings involve a type of “conjunction,” whereby a constituent conjoins images sharing a common narrative role in a sequence. Situated within the parallel architecture of Visual Narrative Grammar, which posits a division between narrative structure and semantics, this narrative conjunction schema interfaces with semantics in a variety of ways. Conjunction can thus map to the inference of a spatial environment or an individual character, the repetition or parts of actions, or disparate elements of semantic associative networks. Altogether, this approach provides a theoretical architecture that allows for numerous levels of abstraction and complexity across several phenomena in visual narratives.


Cohn, Neil. 2015. "Narrative conjunction’s junction function: The interface of narrative grammar and semantics in sequential images." Journal of Pragmatics 88:105-132. doi: 10.1016/j.pragma.2015.09.001.

Tuesday, October 13, 2015

Emoji and visual languages

I'm excited that my recent article on the BBC website about emoji has gotten such a good response. So, I figured I'd write an addendum here on my blog to expand on things I couldn't get a chance to write in the article. I of course had a lot to say in that article, and it was inevitable that not everything could be included.

The overall question I was addressing was, "are emoji a visual language?" or "could emoji become a visual language?" My answer to both of these is "no."

Here's a quick breakdown of why, which I say in the article:

1. Emoji have a limited vocabulary set that is made of whole-unit pieces, and that vocabulary has no internal structure (i.e., you can't adjust the mouth of the faces while keeping other parts constant, or change the heads on bodies, or change the position of arms)

2. Emoji force these stripped-down units into unit-unit sequences, which just isn't how drawings work to communicate. (More on this below)

3. Emoji use a limited grammatical system, mostly using the "agent before act" heuristic found across impoverished communication systems.

All of these things limit emoji from being able to communicate like actual languages. Plus, these also limit emoji from communicating like actual drawings that are not mediated by a technological interface.

There are two addendums I'd like to offer here.

First, these limitations are not just constrained to emoji. They are limitations of every so-called "pictogram language," which are usually created to be "universal" across spoken languages. Here, the biggest problem is in believing that graphic information works the way that writing does: putting individual units, each which have a "single meaning," into a unit-unit sequence.

However, drawings don't work this way to communicate. There are certainly ways to put images in sequence, such as what is found in the visual language of comics. The nature of this sequencing has been my primary topic of research for about 15 years. When images are put into sequence, they have characteristics unlike any of the ways that are used in these "writing imitative" pictogram sequences.

For example, actual visual language grammars typically depict events across the image sequence. This requires the repetition of the same information in one image as in the other, only slightly modified to show a change in state. Consider this emoji sequence:


This can either be seven different monkeys, or it can be one monkey at seven different points in time (and the recognition of this difference requires at least some cultural learning). Visual language grammars allow for both options. Note though that it doesn't parcel out the monkey as separate from the actions. It does not read "monkey, cover eyes" and then "monkey, cover mouth" etc. where the non-action monkey just gives object information and the subsequent one just gives action information. Rather, both object and event information is contained in the same unit.

So, what I'm saying is that the natural inclination for grammars in the visual form is not like the grammars that operate in the verbal or written form. They just don't work the same, and pushing graphics to try to work in this way will never work, because it goes against the way in which our brains have been built to deal with graphic information.

Again: No system that strips down graphics into isolated meanings and puts them in a sequence will ever communicate on par with actual languages. Nor will it actually communicate the way that actual visual languages do...

And this is my second point: There are already visual languages in the world that operate as natural languages that don't have the limitations of emoji.

As I describe in my book, The Visual Language of Comics, the structure of drawing naturally is built like other linguistic systems. It becomes a "full" visual language when a drawing system is shared across individuals (not a single person's style) and has 1) a large visual vocabulary that can create new and unique forms, and 2) that those vocabulary items can be put into sequences with underlying hierarchic structure.

This structure often becomes the most complex and robust in the visual languages used in comics, but we find complex visual languages in other places too. For example, in my book I devote a whole chapter to the sand drawings of Australian Aboriginals, which is a full visual language far outside the context of comics (and is used in real-time interactive communicative exchagnes). But, whether a drawing system becomes a full visual language or not, the basis for those parts is similar to other linguistic systems that are spoken or signed.

The point here is this: emoji are not a visual language, and can never be one because of the intrinsic limitations on the way that they are built. Drawings don't work like writing, and they never will.

However, the counter point is this: we already have visual languages out in the world—we just haven't been using them in ways that "feel" like language.

... yet.

Monday, September 28, 2015

New paper: Getting a cue before getting a clue

It seems the last few months on this blog have been all about inference generation... I'm happy to say this post is also the case! I'm excited to announce that I have a new paper out in the journal Neuropsychologia entitled "Getting a cue before getting a clue: Event-related potentials to inference in visual narrative comprehension."

This paper examines the brain response to the generation of inference in a particular narrative construction in comics. As far as I know, it's the first neuroscience paper to examine inference specifically in visual narratives. Specifically, our analysis focused on comparing sequences like these:


The top sequence (a) is from an actual Peanuts strip. What is key here is that you never see the main event of the sequence: Linus retrieving the ball. In my narrative structure, this "climactic" state would be called a "Peak." Rather, the image of Charlie watching ambiguously hides this event, but that panel is more characteristic of a "Prolongation" that extends the narrative further without much action.

Contrast this with (b), which has a structure that also appears in several Peanuts strips. Here, the third panel also does not show the main event (the same event as "a") but here the exclamation mark implies at least that some event is happening at least. In my narrative structure, this cue is enough to tell you that this panel is the climax, despite not showing you what the climax is.

We were curious then if the brain distinguishes between these types of sequences which both should require inference (indeed, the same inference) but differ in their narrative structure (spoiler: it does!). You can read a full pdf of the paper here. Here's the full abstract and reference:

Abstract:

Inference has long been emphasized in the comprehension of verbal and visual narratives. Here, we measured event-related brain potentials to visual sequences designed to elicit inferential processing. In Impoverished sequences, an expressionless “onlooker” watches an undepicted event (e.g., person throws a ball for a dog, then watches the dog chase it) just prior to a surprising finale (e.g., someone else returns the ball), which should lead to an inference (i.e., the different person retrieved the ball). Implied sequences alter this narrative structure by adding visual cues to the critical panel such as a surprised facial expression to the onlooker implying they saw an unexpected, albeit undepicted, event. In contrast, Expected sequences show a predictable, but then confounded, event (i.e., dog retrieves ball, then different person returns it), and Explicit sequences depict the unexpected event (i.e., different person retrieves then returns ball). At the critical penultimate panel, sequences representing depicted events (Explicit, Expected) elicited a larger posterior positivity (P600) than the relatively passive events of an onlooker (Impoverished, Implied), though Implied sequences were slightly more positive than Impoverished sequences. At the subsequent and final panel, a posterior positivity (P600) was greater to images in Impoverished sequences than those in Explicit and Implied sequences, which did not differ. In addition, both sequence types requiring inference (Implied, Impoverished) elicited a larger frontal negativity than those explicitly depicting events (Expected, Explicit). These results show that neural processing differs for visual narratives omitting events versus those depicting events, and that the presence of subtle visual cues can modulate such effects presumably by altering narrative structure.


Cohn, Neil, and Marta Kutas. 2015. Getting a cue before getting a clue: Event-related potentials to inference in visual narrative comprehension. Neuropsychologia 77:267-278. doi: 10.1016/j.neuropsychologia.2015.08.026.