Gernsbacher's 1985 paper "Surface information loss in comprehension" is an important article on the comprehension of sequential images, and one that has informed much of my current research. It is based on her dissertation, and describes several experiments.
Overall, Gernsbacher had participants read the Mercer Mayer book Frog, where are you? to question whether people can accurately recall the exact surface images in the story, or if (like language) they are only able to retain the gist of meaning.
First, she asked participants to read this "picture story" and choose where they would divide it into parts. They simply drew lines between images where they felt that one episode ended and another began. Overall, she found that people greatly agreed on where these boundaries between segments were placed.
She then asked another group of people to read the stories, but the composition of certain images were flipped horizontally. These images either came before or after the boundaries that people agreed upon in the previous experiment. She found that people had a harder time accurately remembering the horizontal composition if the image came after the boundary as opposed to before it. This provided evidence that people were building up context throughout a segment, and that the start of a new segment incurred a cost on memory.
These experiments were important for several reasons. First, it confirmed her hypothesis that people mostly retain the gist of meaning and not the surface information of images. Given that people's comprehension did not appear overly damaged by flipping the composition of images, it could be pertinent to discussions of how much impact is really made by the left-right composition of images, such as in the 180º rule.
However, more importantly, these experiments showed very strong evidence that people group images together into segments. This poses a problem to theories like McCloud's panel transitions, which envision no stopping point for linear transitions: they keep going on and on throughout a visual narrative (either linearly or promiscuously between multiple panel relationships).
Rather, this experiment shows that people have some intuitions for dividing up visual narratives into segments (what I called in my book "visual sentences"), and that moving between those segments incurs a cost to comprehension.
Gernsbacher, Morton Ann (1985). Surface information loss in comprehension. Cognitive Psychology, 17 (3), 324-363 DOI: 10.1016/0010-0285(85)90012-X