Tuesday, November 24, 2015

New Paper: A multimodal parallel architecture

I'm excited to share that I now have a new article in the latest issue of Cognition: "A multimodal parallel architecture: A cognitive framework for multimodal interactions." This paper presents my overall model of language, and then uses it to explore different aspects of multimodal communication.

The key distinctions in this paper are about multimodal relations that must balance grammar in multiple domains. Many models of multimodal relations describe the various meaningful (i.e., semantic) interactions between modalities. This paper extends beyond these relationships to talk about how the dominance of meaning in one modality or another must negotiate grammatical structure in one or multiple modalities.

This paper has had a long journey... I first had many of these ideas way back in 2003, and they were part of an early draft on my website about multimodality called "Interactions and Interfaces." In 2010, I started reconsidering how to integrate the theory in the context of my mentor Ray Jackendoff's model of language—the parallel architecture. The component parts were always the same, but articulating them in this way allowed for a better grounding in a model of cognition, and in further elaborating on how these distinctions about multimodality fit within a broader architecture. I then tinkered with the manuscript on and off for another 5 years...

So, 12 years later, this paper is finally out! It pretty much lays out how I conceive of language and different modalities of language (verbal, signed, visual), not to mention their relationships. I suppose that makes it a pretty significant paper for me.

The paper can be found on my Downloadable Papers page, and a direct link (pdf) is here.

Abstract:
Human communication is naturally multimodal, and substantial focus has examined the semantic correspondences in speech–gesture and text–image relationships. However, visual narratives, like those in comics, provide an interesting challenge to multimodal communication because the words and/or images can guide the overall meaning, and both modalities can appear in complicated ‘‘grammatical” sequences: sentences use a syntactic structure and sequential images use a narrative structure. These dual structures create complexity beyond those typically addressed by theories of multimodality where only a single form uses combinatorial structure, and also poses challenges for models of the linguistic system that focus on single modalities. This paper outlines a broad theoretical framework for multimodal interactions by expanding on Jackendoff’s (2002) parallel architecture for language. Multimodal interactions are characterized in terms of their component cognitive structures: whether a particular modality (verbal, bodily, visual) is present, whether it uses a grammatical structure (syntax, narrative), and whether it ‘‘dominates” the semantics of the overall expression. Altogether, this approach integrates multimodal interactions into an existing framework of language and cognition, and characterizes interactions between varying complexity in the verbal, bodily, and graphic domains. The resulting theoretical model presents an expanded consideration of the boundaries of the ‘‘linguistic” system and its involvement in multimodal interactions, with a framework that can benefit research on corpus analyses, experimentation, and the educational benefits of multimodality.


Cohn, Neil. 2015. A multimodal parallel architecture: A cognitive framework for multimodal interactions. Cognition 146: 304-323

Monday, November 09, 2015

How to analyze comics with narrative grammar

Over the past several years, I've presented a lot of evidence that panel-to-panel "transitions" cannot account for how we understand sequences of images in visual narratives like comics. Rather, I've argued that narrative sequential images use a "narrative grammar" that assigns roles to panels, and then groups panels into hierarchic relationships.

Though there are many reasons panel transitions don't work to explain how we understand sequential images, one of the reasons why panel transitions as a theory may be attractive is because it is intuitive to see outright. A person can easily look at a sequence and assign transitions between panels and it "feels" right because that matches one's conscious experience of reading a comic (though it is not very cognitively accurate).

In contrast, my theory of narrative grammar is fairly complex, and much harder to intuit. Though, I think this is somewhat as it should be—there's a lot of complicated things going on sequential images that people don't realize! However, this complexity means that people might have a hard time of implementing the theory in practice.

SO... to help rectify this issue I've now written a "tutorial" that aims to explain the process that people should follow when analyzing a visual narrative sequence and are attempting to implement this theory of narrative grammar.

You can download a pdf of the tutorial here, while it can be found also on my Downloadable Papers page and my Resources page.

The simple summary is that one cannot simply look at a sequence and assign labels to things. There are a series of procedures and diagnostics to use, and there is an order of operations that is optimal for arriving at an analysis. This is the same as most any linguistic theory, which usually requires instruction or dedicated learning in order to implement.

This tutorial is aimed at researchers (or anyone curious) who wish to implement this theory in practice and/or learn more about the underlying logic for how it works. It is also aimed at teachers who might wish to instruct this theory in their classrooms, but may not know how to do it with competence.**

As you'll find in the tutorial, it only somewhat actually covers the basic principles of the theory. For this you should reference my papers and my book, The Visual Language of Comics. The tutorial can thus supplement these works for a full understanding and implementation of the theory.


**On this point, note: anyone who wants to learn how to do this, especially with the intent of putting into practice in research or instruction should feel free to contact me for more guidance and resources.

Monday, November 02, 2015

Dispelling emoji myths

In my recent BBC article and my blog posts about emoji, I have tried to explain how emoji are not an emerging language, but that they do serve important functions that resemble other limited communicative systems.

Having now poked around online quite a bit looking at what people say about emoji, I'm particularly struck by the repetition of a few myths. Since these misunderstandings creep up all over the place, I wanted to address them here...

1. Emoji are not like hieroglyphics

First off, many people have compared emoji to Egyptian hieroglyphics, either saying that they work exactly the same and/or that emoji are a "modern hieroglyphics."

This is simply not true, mostly because hieroglyphics were a full blown writing system where each sign had a mapping to sound. Hieroglyphics are not "symbol systems" made up of pictures. To me, this seems like the kind of misperception that people who are only used to an alphabet have about other writing systems: "if each sign isn't a sound like a letter, it must be just about meanings!"

There are actually several ways that hieroglyphics operated as a writing system. Some signs did indeed mean what they represented. For example, the sign for "owl" looked like an owl, and was pronounced "m":


However, the use of "rebus" signs meant that those signs could also be used without that iconic meaning, and only would be used for their sound value (i.e., that owl sign would be used for many words using the sound "m," but not for its meaning of "owl").

From there, these both of these types of signs could be combined into compound signs. For example, this combination takes the rebus of owl (using just the sound "m") and the sign for ear (using its meaning, but not pronunciation) for the word "to hear":


This type of compound used signs both for their meaning value and for their sound value. There are no compounds made up of two signs that just contribute to meaning—they always have some sound-based sign present. Hieroglyphics also sometimes use fairly abstract representations, and purely sound-based signs which vary based on the number of consonants they represent.

In sum, unlike the purely imagistic meanings found in emoji, hieroglyphics are a fully functioning writing system that is intrinsically tied to the Egyptian language. This is totally different from emoji in context also because the imagistic emoji accompany a separate writings system (for English speakers, the alphabet). In the case of hieroglyphics, they are the writing system.

I'll note also, these same things apply to Chinese characters. Though they work a little different than hieroglyphics, the same basic principles apply: it's a writing system tied to the sounds of languages, not a series of pictures that only have imagistic meaning.


2. There is no such thing as a universal language

I have seen many people exhort that one of the exciting things about emoji is their quality of transcending spoken languages to be a "universal language." This is also hogwash, for many reasons. No language is universal, whether verbal, signed, or visual. Here are several reasons why images (including emoji) are not, and cannot be, universal:

How they convey meaning

Just because images may be iconic—they look like what they represent—does not mean that they are culturally universal. Even simple things like the way people dress does not translate across cultures, not to mention variables in facial expressions or, even more dramatic, fully conventionalized meanings like giant sweat drops to convey anxiety. Note that, since they were created in Japan originally, many emoji are already culturally specific in ways that do not translate well outside Japan.

This is not to mention the limitations of emoji that I discussed in my BBC article, such as that they rely on a non-producible vocabulary that does not allow the easy creation of new signs, and their sequence maintain a simple system characteristic of impoverished grammar. In other words, they are highly limited in what they can express, even as a graphic system.

Cultural exposure

We also know that images are not universal because a host of studies have shown that people who do not have cultural exposure to images often have difficulty understanding the meanings of images. Such deficits were investigated prevalently in the 1970s and 1980s under the umbrella of "visual literacy." Here's how I summarized one such study examining individuals from Nepal from Fussell & Haaland (1978):

As described in the paper, the individuals tested "had significant deficits understanding many aspects of single images, even images of faces with simple emotions in cartoony styles (happy - 33%, sad - 60%). They had even more difficultly related to actions (only 3% understood an image trying to convey information about drinking boiled water). Some respondents had radically different interpretations of images. For example, a fairly simple cartoony image of a pregnant woman was interpretted as a woman by 75%, but 11% thought it was a man, and others responded that it was a cow, rabbit, bird, or other varied interpretations. They also had difficulty understanding when images cut off parts of individuals by the framing of a panel (such as images of only hands holding different ingredients)."

Such findings are not rare. For example, Research into Illustration by Evelyn Goldsmith summarizes several studies along these lines. Bottom line: Understanding drawings requires exposure to a graphic system, just like learning a language.

There is not just one visual language

Most discussion of the universality of images focuses on how they are comprehended. But, this overlooks the fact that someone also had to create those images, and images vary widely in their patterns across the world.

That is, as I argue in my book, there is not just one visual language, but rather there are many visual languages in the world too. There's a reason why the "style" of American superhero comics differs from Japanese manga or French band desinee or instruction manuals, etc. Drawing "styles" reflect the patterns of graphic information stored in the minds of those who create them. These patterns vary across the world, both within and between cultures.

This happens because people are different and brains are not perfect. There will always be variation and change within what is perceived to be a coherent system. This is in part because any given language is actually a socio-cultural label applied to the system(s) used by individual people. There is no "English" floating out in the ether to which we all link up. Rather, "English" is created by the similarities of patterning between the languages spoken by many people.

Indeed, though many who speak "English" can communicate in mutually intelligible ways, there are hundreds of sub-varieties of "English" with variations that range from subtle (slight accents) to dramatic (changing vocabulary and grammar), both across geographic, cultural, and generational dimensions.

Similarly, even if there was to be a universal language—be it spoken or visual—sub-varieties would emerge based on who is using the system and how they do it. Just because images are typically iconic does not mean that they are transparent and outside of cognitive/cultural patterns.

Emoji in part exemplify this facade that a language is external to the patterns in people's minds, since the vocabulary is provided by tech companies and does not directly emerge from people's creations. Someone (ICANN) decides which emoji can be used, and then makes them available. This is the opposite of how actual languages work, as manifestations of similarities between cognitive structures across speakers.

In sum, drawings are not universal because drawings differ based on the cultural "visual languages" that result from people using different patterns across the world.