The Patron Saint of Superheroes

Chris Gavaler Explores the Multiverse of Comics, Pop Culture, and Politics

I am delighted to announce that Bloomsbury just officially published The Comics Form: The Art of Sequenced Images. My last three books were two-in-one team-ups, so it feels pleasantly strange to be solo authoring again. This is also a culmination of work I’ve been building toward for about seven years (a very very early version of a chapter subsection appeared as an essay in 2016). I’m humbly hoping to offer some helpful ways to rethink foundational ideas in comics theory, but others will have to determine just how helpful they are. The book is priced for institutions (hardback and ebook), so probably not something most folks will be buying for their own shelves, but I do hope that anyone with a deep interest in comics will nudge their nearest library to purchase a copy.

I also thought the clearest way to preview the content is to start at the end. That’s how you write a mystery novel, but in this case I mean it literally. Below is the actual Conclusion to The Comics Form. It condenses the preceding 206 pages into three very very dense pages. Rereading them now out of context hurts my brain a little, so good luck. Still, I hope there are enough intriguingly shaped fragments in there to pique your interest to explore the book. Raise your hand if you have any questions (ie, email me).


A poem consisting of fourteen blank-verse lines following an ABABCDCDEFEFGG rhyme scheme is a Shakespearean sonnet because it is in the Shakespearean sonnet form. A work in the comics form is formally a comic for similar reasons. Unlike Shakespearean sonnets, however, a comic may be defined by other than form. A work may be a comic contextually, stylistically, conventionally, or by other criteria independent of or non-exclusive to form. Though it may be a comic according to multiple sets of defining criteria, if the work satisfies one set but not another set, it is both a comic and not a comic. The apparent paradox is due to each set using the same term, even though each usage is distinct. A ‘comic’ is not a ‘comic’ is not a ‘comic.’

The Comics Form defines the comics form by extracting the two most common physical features from a range of comics definitions and combining them as ‘sequenced images.’ Sequenced images may or may not define comics generally, but if a work consists of sequenced images, the work is in the comics form and so can be analyzed formally as a comic. Although that may be sufficient for the work to be considered a comic, others might understand that a work must be, for example, a mass-produced replica or be created, produced, and purchased with the understanding that it is a comic. I refer to such media-defined works as the comics medium. A single-image cartoon in a newspaper comics section is in the comics medium, but because it is not in the comics form, it is outside the scope of this study.

The terms ‘discourse’ and ‘diegesis’ differentiate images’ physical qualities and representational qualities. All images have discourses and many also have diegeses. Since I derive ‘image’ from extant comics definitions, I also infer its discursive constraints: an image in the comics form is a visual, static, flat image juxtaposed with another. If it is also a representational image, it represents some subject matter: the diegesis experienced in the mind of a viewer interpreting its discourse. Like ‘discourse,’ ‘diegesis’ has other usages, but I adopt an expansive meaning: all representational content, either overtly depicted or implied, including the larger context of a world. Diegeses vary between viewers but also presumably overlap significantly. Non-representational images have no diegeses, only discourses—which must still be mentally experienced, but without the construction of a mental model with diegetic qualities understood to be separate from the discourse.

Analyzing representational images involves a range of approaches for relating their discursive qualities to the diegetic qualities they produce. Rather than focusing on unknowable authorial intentions or the illusion of intentions in characters, I focus on viewers’ experiences of intentionality in author-constructed narrators, focusing first on image-narrators, which communicate diegetic content through an image’s discursive qualities. An image’s style is in one sense discursive: an arrangement of marks on a surface. In another sense, style is diegetic: subjects depicted in a certain manner. Style may then be understood as semi-representational: discursive qualities that represent subjects non-literally and indirectly. Style may follow certain norms or modes, including cartooning and naturalism, as well as other combinations of exaggeration and simplification. Those norms are understood to represent subject matter through overspecified or underspecified details that are not aspects of the diegesis, except indirectly through connotations. Viewpoint and framing effects are similarly semi-representational.

To be in the comics form, a work must include more than one visual, static, flat image. Distinguishing multiple images from a single image with multiple units poses a challenge. Common terms ‘panels,’ ‘frames,’ and ‘gutters’ are metaphors to describe drawn qualities that are not determining. Unless images are physically divided—two framed paintings hanging on the same wall, for example, or two facing pages—image division is determined by viewer perception. A physically unified page or canvas of visual subunits consists of multiple images only if a viewer perceives it to be multiple images. Like style, a layout of gutter-divided panels straddles discourse and diegesis. The division of images is neither a discursive quality nor a diegetic quality in the same sense as the images’ subject matter, because the divisions are not part of that diegetic world. Where style seems diegetically transparent (subjects are drawn as if accurately reflecting their literal appearance in their world), image divisions seem discursive (images are drawn as if separated physically in the discourse). Distinguishing the two effects, style is semi-representational and layout is pseudo-formal. Because pseudo-formal qualities are not physical qualities like page dimensions and divisions, pseudo-form is a kind of diegesis, but one separate from the primary diegesis of the representational content, and so a secondary diegesis.

For images to be sequenced and so to be in the comics form, they must be juxtaposed in one of three possible ways: 1) contiguous: images appear simultaneously within a single visual field; 2) temporal: an image appears immediately after a previous image in the same visual field; or 3) distant: a non-contiguous image that does not immediately follow a previous image is mentally recalled while observing a current image. Contiguous juxtaposition describes the pages of most works in the comics medium in which viewers understand panels to be separate images. Temporal juxtaposition is the norm of films but occurs in static images when, for instance, a viewer turns a page. Distant juxtaposition is dependent on memory and so may be juxtaposition in only a metaphorical sense. A viewer of a sequence may at any time recall a previous image and relate it to a current image, discursively, diegetically, or both. Contiguous juxtaposition also includes braiding effects (with and without repetition) in which the discursive relationship of visual elements influences a viewer’s understanding of their corresponding diegetic qualities.

Juxtaposed images trigger inferences. The most fundamental inference is recurrence: marks in separate images are understood to be representations of the same subject. Recurrence is reinforced by a parallel phenomenon, diegetic erasure, in which discursive qualities that would produce diegetic contradictions are ignored. Juxtaposition produces ten additional types of inferences: 1) spatial: images share a diegetic space; 2) temporal: images share a diegetic timeline; 3) causal: undepicted action occurs between depicted moments; 4) embedded: one image is perceived as multiple images; 5) non-sensory: differences between representational images do not represent sensory reality; 6) associative: dissimilar images represent a shared subject; 7) semi-continuous: discursively continuous but representationally non-continuous images are perceived as a single image; 8) continuous: images are perceived as a single image; 9) match: otherwise dissimilar images share matching similarities; and 10) linguistic: images relate primarily through accompanying text. The first seven are diegetic only; the second two can occur both discursively and diegetically, or discursively only; and the last is not primarily a result of image juxtaposition and so arguably is not a type of juxtapositional inference. While the subtypes of diegetic inferences are only discernible through analysis of the story world, discursive inferences must be analyzed at the level of the page. Purely discursive inferences involve only discursive marks understood in terms such as shapes and values without reference to representational content.

For images to be sequenced, they must be juxtaposed, but the relationship between sequence and juxtaposition is ambiguous. The juxtaposed images of a sequence follow a specific order. The juxtaposed images of a non-sequence, or set, follow no specific order. Since a set can be juxtaposed contiguously, when, for example, organized into a book or gallery, order has two kinds: 1) discursive order: the successive but non-diegetic arrangement of images, reflecting only happenstance, convenience, and/or the needs of physical presentation; and 2) sequential order: the successive arrangement of the images, reflecting some diegetic quality of the representational content. Sequential order and discursive order are identical for sequences.

When images, whether sets or sequences, are contiguously juxtaposed in a visual field, viewing them produces a viewing path that is either: 1) directed: determined by the image order of the sequence; or 2) variable: indeterminate and so open to multiple discursive orders. Since image orders and image viewing paths are apprehended simultaneously, both are an additional type of juxtapositional inference in which a viewer determines relationships between contiguous images. If other inferences (recurrence, spatial, temporal, etc.) suggest a sequential order organized in a directed viewing path, the images are hinged. Unhinged viewing describes variable viewing paths of a set arranged discursively but non-sequentially.

The image qualities of content, relationship, order, paths, and hinges suggest a six-part typology: 1) representational sequence: two or more related and ordered representational images, with, if contiguously juxtaposed, hinges that produce a directed viewing path; 2) non-representational sequence: two or more related and ordered non-representational images, with, if contiguously juxtaposed, hinges that produce a directed viewing path; 3) representational set: two or more related but unordered representational images, that, if contiguously juxtaposed, are viewed in variable paths; 4) non-representational set: two or more related but unordered non-representational images, that, if contiguously juxtaposed, are viewed in variable paths; 5) representational arrangement: two or more unrelated and unordered representational images with no contiguous hinges; and  6) non-representational arrangement: two or more unrelated and unordered non-representational images with no contiguous hinges. 

Representational sequences also provide a means to explain and constrict McCloud’s closure. Viewers make inferences about a diegetic world based on image content in combination with independent knowledge and assumptions about the depicted world. Those assumptions are usually mimetic, applying, for example, laws of physics to objects and human psychology to characters. The undrawn content that viewers experience through the juxtaposition of two or more representational images is the minimal content required by a viewer’s mental construction of a partially drawn but fully implied event, consisting of definable subunits. While anything could occur in the ambiguous lapse of time implied by paired images of discreet moments, viewers understand the images as parts of a unified event. According to event inferencing, any content that is not part of that event is not implied.

Though text is not a necessary quality of the comics form, many sequenced images contain text. An image-text is an image that combines linguistic and non-linguistic content, and sequenced image-texts are in the comics form. Since all text is necessarily images, there are two types: 1) word-image: an image with linguistic content; and 2) word-image art: a word-image rendered as graphic art. Since word-images and non-linguistic images function as though on parallel and independent paths, image-texts involve three kinds of narrators: 1) image-narration of non-linguistic content; 2) text-narration of linguistic content; and 3) image-text narration of combinational effects of linguistic and non-linguistic content, which produces embedded relationships including double referents.

These are the qualities of sequenced images, which together explain the comics form.

