Thursday, June 15, 2017

New paper: A picture is worth more words over time

I'm excited to announce we have another new paper, "A picture is worth more words over time: Multimodality and narrative structure across eight decades of American superhero comics," now out in the journal Multimodal Communication. This paper examines the changes in text-image relations and storytelling in American superhero comics from the 1940s though the 2010s.

This was a project first undertaken by students in my 2014 Cognition of Comics class, which became expanded into a larger study. My co-authors, Ryan Taylor and Kaitlin Pederson, coded 40 comics across 8 decades (over 9,000 panels), complementing Kaitlin's study of page layout across time in superhero comics.

We examined three aspects of structure: multimodality (text-image relationships and their balance of meaning and narrative), the framing of information in panels (image above), and the linear changes in meaning that occur between panels.

Overall, we found evidence that American superhero comics have shifted to relying less on text, and more towards the visual narrative sequencing carrying more weight of the storytelling. This has accompanied changes in the framing of information in panels to use fewer elements (as in the example figure), and to use fewer spatial location changes with more time changes across panels.

In addition, this trend is not new, but has been steadily occurring over the past forty years. That means it cannot just be attributed to the influence of manga since the 1980s (and indeed, as we discuss, our results suggest the influence of manga may be more complicated than people suspect).

You can download the paper here (pdf), or along with all my other downloadable papers. You can also see Ryan presenting this study at Comic-Con International in our panel in 2015:


The visual narratives of comics involve complex multimodal interactions between written language and the visual language of images, where one or the other may guide the meaning and/or narrative structure. We investigated this interaction in a corpus analysis across eight decades of American superhero comics (1940–2010s). No change across publication date was found for multimodal interactions that weighted meaning towards text or across both text and images, where narrative structures were present across images. However, we found an increase over time of narrative sequences with meaning weighted to the visuals, and an increase of sequences without text at all. These changes coincided with an overall reduction in the number of words per panel, a shift towards panel framing with single characters and close-ups rather than whole scenes, and an increase in shifts between temporal states between panels. These findings suggest that storytelling has shifted towards investing more information in the images, along with an increasing complexity and maturity of the visual narrative structures. This has shifted American comics from being textual stories with illustrations to being visual narratives that use text.


Cohn, Neil, Ryan Taylor, and Kaitlin Pederson. 2017. A picture is worth more words over time: Multimodality and narrative structure across eight decades of American superhero comics. Multimodal Communication. 6(1): 19-37.

Wednesday, May 24, 2017

New paper: What's your neural function, narrative conjunction?

I'm excited to announce that my new paper "What's your neural function, narrative conjunction: Grammar, meaning, and fluency in sequential image processing" is now out in the open access journal Cognitive Research: Principles and Implications. This study was co-authored by Marta Kutas, who was my advisor while I was a postdoctoral fellow at UC San Diego.

Simple take home message: The way you process the sequences in comics depends on which ones you read.

The more detailed version... This study is maybe the coolest brain study I've done. Here, we examine a particular pattern in the narrative grammar used in comics: Environmental-Conjunction. This is when you have characters in different panels at the same narrative state, but you infer that they belong to the same spatial environment.

Most approaches to comprehending sequential images focus on just the comprehension of meaning (like in "panel transitions"). However, my theory says that this pattern involves both the construction of meaning and the processing of this narrative pattern. The patterning uses only a narrative grammar which is independent of meaning.

When analyzing people's brain responses to conjunction patterns, we found support for two processes. We found one brainwave associated with an "updating" of a mental model (the P600), and another associated with grammatical processing (an anterior negativity). Crucially, this grammatical processor was insensitive to manipulations of meaning, indicating that it was only processing the conjunction pattern. So, you don't just process meaning, but also the narrative pattern.

But, that's only the first part...

In other analyses, we've shown that Japanese manga use more Environmental-Conjunction than American or European comics. So, we used a statistical analysis to analyze whether participants' background reading habits influenced their brain processing of conjunction. And... it did!

Specifically, we found that participants who more frequently read manga "while growing up" tended to rely more on the grammar processing, while infrequent manga readers used more updating. In other words, since frequent manga readers were exposed to the conjunction pattern more in their reading habits, their brains used a more automatic, grammatical process to comprehend it. Note: this result is especially cool, because our comics stimuli were not manga, they were manipulated Peanuts strips that used a pattern frequent in manga.

This result contradicts the idea that comics are uniformly understood by all people, or even the idea that their processing uses a single cognitive process (like "closure"). Rather, comics are understood based on people's fluency with the patterns found in specific visual languages across the world.

You can read the paper online here, download the pdf here, or check out the poster summary.


Visual narratives sometimes depict successive images with different characters in the same physical space; corpus analysis has revealed that this occurs more often in Japanese manga than American comics. We used event related brain potentials to determine whether comprehension of “visual narrative conjunctions” invokes not only incremental mental updating as traditionally assumed, but also, as we propose, “grammatical” combinatoric processing. We thus crossed (non)/conjunction sequences with character (in)/congruity. Conjunctions elicited a larger anterior negativity (300-500ms) than non-conjunctions, regardless of congruity, implicating “grammatical” processes. Conjunction and incongruity both elicited larger P600s (500-700ms), indexing updating. Both conjunction effects were modulated by participants’ frequency of reading manga while growing up. Greater anterior negativity in frequent manga readers suggests more reliance on combinatoric processing; larger P600 effects in infrequent manga readers suggest more resources devoted to mental updating. As in language comprehension, it seems that processing conjunctions in visual narratives is not just mental updating but also partly grammatical, conditioned by comic readers’ experience with specific visual narrative structures.


Cohn, Neil and Marta Kutas. 2017. "What’s your neural function, visual narrative conjunction? Grammar, meaning, and fluency in sequential image processing." Cognitive Research: Principles and Implications. 2(27): 1-13

Sunday, April 16, 2017

Tourist traps in comics land*: Unpublished comics research

In a series of Twitter posts, I recently reflected on the pitfalls of various comics research that hasn't been published. Since I think it contains some valuable lessons, I'm going to repeat and expand on them here...

Though I've written the most about psychological studies about how people understand comics, other people have been doing these types of studies before me. What's interesting is that many of these studies were not published, because they found null results. There are a few trends in this work...

Space = Time

The topic I've heard about the most is the testing of McCloud's idea that panel size relates to the duration of conceived time, and that longer vs. shorter gutters relates to longer vs. shorter spaces of "time" between panels. I critiqued the theory behind this idea that "space = time" back in this paper, but I've heard of several scholars who have tested this with experiments. Usually these studies involved presenting participants with different size panels/gutters and then having participants rate their perceived durations.

In almost all of these studies, no one found any support of the idea that "physical space = conceived time". I can only think of one study that did find something supporting it, and it was only for a subset of the stimuli, and thus warranted further testing (which hasn't been done yet).

Because these studies found null results, they weren't deemed noteworthy enough to warrant publication. And since none got published, other labs didn't know about it, so they tried it too with the same null results. I think it's a good case for importance of publishing null results: they serve to both disprove hypotheses, and inform others not to try to grab at the same smoke.


The other type of study on comics that usually doesn't get published is eye-tracking. I know of at least half-a-dozen unpublished eye-tracking studies looking at people reading comic pages. The main reason these studies aren't published is because they're often exploratory, with no real hypotheses to be tested. Most comics eye-tracking studies just examine what people look at, which doesn't really tell you much if you don't manipulate anything. This can be useful for telling you basic facts about what people look (types of information, how long, etc.), but without a specific manipulation, it is less informative and has lots of confounds.

An example: Let's say you run an eye-tracking study of a particular superhero comic and find that people spend more time fixating on text than on the images (which is a frequent finding). Now the questions arise: Is it because of the specific comic you chose? Is it because your comic had a particular uncontrolled multimodal interaction that weights meaning more to the text? Is it because your participants lacked visual language fluency, and so they relied more on text than images? Is it because you chose a superhero comic, but your participants read more manga? Without more controls, it's hard to know anything substantial.

Good science means testing a hypothesis, which means having a theory that can possibly be tested by manipulating something. Without a testable theory you don't have any real hypothesis to create a manipulation, which results in not a publishable eye-tracking study about comics. Eye-tracking is an informative tool, but the real "meat" of the research needs to be in the thing that is being manipulated.

I'll note that this is the same as when people do (or advise) using fMRI or EEG to study processing (visual) narratives in the brain. I've seen several studies of "narrative" or "visual narrative" where they simply measure the brain activity to non-manipulated materials and then claim that "these are the brain areas involved in comics/visual narrative/narrative!"

In fact, such research is wholly uninformative, because nothing specific is being tested, and such research betrays an ignorance for just how complex these structures actually are. It would be inconceivable for any serious scholar of language to simply have someone passively read sentences and then claim that they "know how they work" by measuring fMRI or eye-tracking to them. Why then the presumption of simplicity for visual narratives?

Final remarks

One feature of unpublished research on comics is that they are often undertaken by very good researchers who had little knowledge-base for what goes on in comics and/or the background literature of that field. It is basically "scientific tourism." While it is of course great that people are interested enough in the visual language of comics to invest the time and effort to run experiments, it's also a recipe for diminishing returns. Without background knowledge or intuition, it's hard to know why your experiment might not be worth running.

Nevertheless, I also agree that it would be useful to know what types of unpublished studies people have done. Publishing such results would be informative for what isn't found, and would prevent future researchers from chasing topics they maybe shouldn't.

So, let me conclude with an "open call"...

If you've done a study on comics that hasn't been published (or know someone who has!): Please contact me. At the least, I'll feature a summary (or link) to your study on this blog, and if I accrue enough of them, perhaps I can curate a journal or article for reporting such results.

*Thanks to Emiel van Miltenburg for the post title!

Friday, February 24, 2017

New paper: When a hit sounds like a kiss

I'm excited to announce that I have new paper out in the journal Brain and Language entitled "When a hit sounds like a kiss: an electrophysiological exploration of semantic processing in visual narrative." This was a project by the first author Mirella Manfredi, who worked with me during my time in Marta Kutas's lab at UC San Diego.

Mirella has an interest in the cognition of humor, and also wanted to know about how the brain processes different types of information, like words vs. images. So, she designed a study using "action stars"—the star shaped flashes that appear at the size of whole panels to indicate that an event happened, but not show you what it is. Into these action stars, she placed either onomatopoeia (Pow!), descriptions (Impact!), anomalous onomatopoeia or descriptions (Smooch!, Kiss!), or grawlixes (#$%?!).

We then measured people's brainwaves for these action star panels. We found a brainwave effect that is sensitive to semantic processing (the "N400")—how people process meaning—that suggested the anomalies were harder to understand than the congruous ones. This suggested that meaning garnered by the context of the visual sequence impacted how people processed the textual words. In addition, the grawlixes showed very little signs of this type of processing, suggesting that they don't hold specific semantic meanings.

In addition, we found that descriptive sound effects evoked another type of brain response (a late frontal positivity) often associated with the violation of very specific expectations (like getting a slightly different word than expected, even though it might not be anomalous).

This response was fairly interesting, because we also recently showed that American comics use descriptive sound effects far less compared to onomatopoeia. What this means is that this brain response wasn't just sensitive to certain words, but was sensitive to the low expectations for a certain type of words: descriptive sound effects in the context of comics.

Mirella and I are now continuing to collaborate on more studies about the interactions between multimodal and crossmodal information, so nice to have this one to kick things off!

You can find the paper along with all my other Downloadable Papers, or directly here (pdf).


Researchers have long questioned whether information presented through different sensory modalities involves distinct or shared semantic systems. We investigated uni-sensory cross-modal processing by recording event-related brain potentials to words replacing the climactic event in a visual narrative sequence (comics). We compared Onomatopoeic words, which phonetically imitate action sounds (Pow!), with Descriptive words, which describe an action (Punch!), that were (in)congruent within their sequence contexts. Across two experiments, larger N400s appeared to Anomalous Onomatopoeic or Descriptive critical panels than to their congruent counterparts, reflecting a difficulty in semantic access/retrieval. Also, Descriptive words evinced a greater late frontal positivity compared to Onomatopoetic words, suggesting that, though plausible, they may be less predictable/expected in visual narratives. Our results indicate that uni-sensory cross-model integration of word/letter-symbol strings within visual narratives elicit ERP patterns typically observed for written sentence processing, thereby suggesting the engagement of similar domain-independent integration/interpretation mechanisms.

Manfredi, Mirella, Neil Cohn, and Marta Kutas. 2017. When a hit sounds like a kiss: an electrophysiological exploration of semantic processing in visual narrative. Brain and Language. 169: 28-38.

Saturday, February 04, 2017

New paper: Drawing the Visual Narratives

I'm happy to announce that we have a new paper in the latest issue of the Journal of Experimental Psychology: Learning, Memory, and Cognition entitled "Drawing the Line Between Constituent Structure and Coherence Relations in Visual Narratives."

This was my final project at project at Tufts University, and was carried out by my former assistant (and co-author) Patrick Bender, who is now in graduate school at USC.

We wanted to examine the relationship between meaningful panel-to-panel relationships ("panel transitions") and the hierarchic constructs of my theory of narrative grammar. Many discourse theories have posited that people do assess meaningful relations between each image in a visual sequence, and (like in my theory) people make groupings. Yet, in these theories, the groupings are signaled by major changes in meaning, such as a "transition" with a big character change. We hypothesized that groupings were not actually motivated by changes in meaning, but by narrative category information that align with larger narrative structures.

So, we simply gave people various visual sequences and asked them to "draw a line" between panels that would best divide the sequence into two meaningful parts—i.e., to break up the sequence into groupings. People then continued to draw lines until all panels had lines between them, and we looked at what influenced their groupings. Similar tasks have been used in many studies of discourse and event cognition.

We found that panel transitions did indeed influence their segmentation of the sequences. However, narrative category information was a far greater predictor of where they divided sequences than these meaningful transitions between panels. That is: narrative structure better predicts how people intuit groupings in visual sequences than semantic "panel transitions."

The paper is downloadable here (pdf) or along with all of my other papers.

Full abstract:

Theories of visual narrative understanding have often focused on the changes in meaning across a sequence, like shifts in characters, spatial location, and causation, as cues for breaks in the structure of a discourse. In contrast, the theory of visual narrative grammar posits that hierarchic “grammatical” structures operate at the discourse level using categorical roles for images, which may or may not co-occur with shifts in coherence. We therefore examined the relationship between narrative structure and coherence shifts in the segmentation of visual narrative sequences using a “segmentation task” where participants drew lines between images in order to divide them into subepisodes. We used regressions to analyze the influence of the expected constituent structure boundary, narrative categories, and semantic coherence relationships on the segmentation of visual narrative sequences. Narrative categories were a stronger predictor of segmentation than linear coherence relationships between panels, though both influenced participants’ divisions. Altogether, these results support the theory that meaningful sequential images use a narrative grammar that extends above and beyond linear semantic shifts between discourse units.

Full Reference:

Cohn, Neil and Patrick Bender. 2017. Drawing the line between constituent structure and coherence relations in visual narratives. Journal of Experimental Psychology: Learning, Memory, and Cognition. 43(2): 289-301.