As people have now started reading my book and papers, they've naturally started to try to apply my theories of "narrative grammar" to sequential images found in comics. My "narrative grammar" is a model of how the "storytelling" of sequential images is understood, which extends beyond previous approaches like Scott McCloud's theory of panel-to-panel transitions.
As people have now opened up comics, and tried to use the theory to describe randomly found pages and sequences, they have no doubt discovered that it is not easy. In fact, they may have thrown up their hands in frustration. If this theory is psychologically real then, why is it so hard to analyze sequences? Does this mean the theory is wrong?
No. There are many reasons why analysis may be challenging...
First, the theory is designed to account for sequences without text. Once text is introduced, the sequence must balance both structure (i.e., grammar) and meaning in multiple modalities, and it becomes manifestly more complex. I'm hoping to have a paper detailing this out soon.
Ok, so what about wordless sequences? Just as it would be really difficult to just read a paper about linguistic syntax and analyze sentences, this theory requires some training to do it properly. At the very least, it helps to follow procedures for how to go about analysis.
Even I don't just look at a sequence and immediately know what the analysis is. I go through a series of procedures that tests the structure at each step of the way (these procedures are found in both my book, in Chapter 6, and the section on "diagnostics" in the "Visual Narrative Structure" paper, though not enumerated for how to go through them).
Here's how I train students in my classes and workshops to analyze sequences: the first thing we do is find the Peak panels. The rest of the sequence hangs around the Peaks, so it's the first thing we find. How do we know what is a Peak then? We test panels by trying to delete them (if the sequence is weird without them, then it's likely a Peak) or replacing them with an action star (if it does replace, it's likely a Peak), or deleting everything else except them (Peaks should be able to paraphrase a sequence on their own). From here, other procedures are then used to determine the other categories and the hierarchy of the sequence.
The point being: you can't just look at a sequence and intuit the structure (even me). That's why I describe tests and diagnostics, so that you can do it without just relying on intuition at every step of the way. Procedure matters.
3. Theory as framework
Third, the theory is a framework, not a catch-all. Theories of syntax in language are not "fully formed" when they are written about, and no theory of syntax in any book or any paper—of any linguistic model—is designed to immediately encompass every sequence one could encounter "out of the box." Rather, the theory provides a framework by which to account for the various diversity found in sentences. One then uses the framework (or changes the framework) to describe the various phenomena that are found in actual language use.
I consider a theory to be "good" if it can do two things: 1) account for more phenomena that is found in a structure (here, visual sequences) than other theories, and 2) can be revealed in experimentation to have psychological validity.
Much of syntactic theory about language is not simply finding things in sentences and then describing them using a particular theoretical model. Rather, the examples found in sentence structures are both described with theories of syntax, but also are used to illustrate how they pose challenges to theories of syntax such that those theories must grow and change. Throughout the 1960s and 1970s, much of the "wars" that were fought throughout linguistics had this characteristic—finding various patterns of syntax that would force changes towards preferring one theory of grammar or another.
That said, my theory of "Visual Narrative Grammar" is not meant as an "out of the box" analysis tool that should apply to every sequence of images from a comic based only on the chapter from my book. There are many, many more traits of the theory that have yet to be published, all of which deal with more complicated sequences and the non-trivial issue of combining sequential images with text. What is in my book and papers so far is less than what I even teach in an introductory class on visual language: I have a draft manuscript of over 300 pages (and growing) detailing various phenomena with the theory, most of which hasn't been published yet.
In addition, various sequences should challenge the theory, which is exactly the method I've used for the past 15 years to build the theory in the first place. I've had a theory, then found sequences that force changes to the architecture, and then altered the theory to be able to account for those issues. It's an organic process. The theory gives us a way to discuss and analyze such complexity and see how it might work. It's a framework, not a catch-all, just like all linguistic theories.
This is exactly the opposite of something like panel-to-panel transitions which are based solely on the low-level changes in meaning that occur between images. Such a theory is simple—there will always be meaningful changes between panels, and so it always seems to work. That's its appeal. The problem is that such an approach doesn't explain much of the data, which is far more complex than such a simple approach can manage. Indeed, my approach first started by expanding McCloud's panel transitions, and then altering it as I found sequences that it couldn't handle.
The fact is, the way we express meaning—be it through verbal language, visual language, or their combination—is very complex. There are no simple answers, and we should distrust the simple answers that might be offered. Recognizing this complexity, and building a framework that can let us study it, is the first step to exploring how it is understood.