Reading Race in the Comics Medium

Chris Gavaler (Lexington, VA)

Fig. 1: Grell; Kirby; Kirby and Coletta.

INTRODUCTION: Race and Reading

The four characters in Fig. 1 were drawn by White artists working in the comics medium: »Tyroc« by Mike Grell, »Non-Fat« by Jack Kirby, and »Marny« and »Larry« by Jack Kirby and Vince Colletta. Additional images not shown here but discussed below include »Ebony White« by Will Eisner and »Whitewash Jones« by Charles Nicholas Wojtkoski, racist caricatures repeating blackface minstrel traditions in the 1940s. All of the images represent characters intended by their artists and typically understood by their viewers to be Black. How they do so is contestable. I therefore pose two questions:

  • Is each image read? And if so, is the race of each character also read?

The verb ›read‹ is commonly applied to the act of viewing comics images, and comics viewers are commonly identified as ›readers.‹ If the terms are accurate, then the race of characters are readable—meaning racial identity is communicated linguistically. This would be true because information in comics generally is communicated linguistically, and race is one kind of information. If racial identify is not communicated linguistically, then the terms ›read,‹ ›readers,‹ etc. are used metaphorically or otherwise imprecisely, and racial identity is communicated by some other, nonlinguistic means.

Because determining what it might mean to »read race« requires first determining what it means to ›read‹, my analysis divides into two sections:

Part 1 explores the general linguistic question: can representational images be read? After assessing current semiotic approaches, I offer a distinction: the meaning of linguistic signs comes through their resemblance to linguistic types, and the meaning of nonlinguistic signs comes through their resemblance to spatiotemporal objects. The distinction clarifies the role of semiotics as applied to the comics medium by identifying three kinds of representational images: ones that are linguistic signs, ones that are nonlinguistic signs, and ones that are both, requiring both linguistic reading and spatiotemporal observing.

Part 2 explores the ramifications of Part 1 for understanding race in comics images. The reading/observing distinction, which comics semiotics does not currently account for, is central. Reading requires two participants: the reader and the creator of the object being read. observing requires only one participant: the observer. All comics images are created and so involve creators, but if the image is observed, the observer treats the image as if directly observing its subject. The race of a depicted character is visually interpreted in the same way that the race of an actual person is visually interpreted. The artist could conceivably not be aware of the race of a character they have rendered and only be intending to represent optical experience. The opposite is true of reading. The artist must intend to communicate a character’s racial identity to a reader through shared visual conventions. Some of those conventions, such as blackface minstrelsy, are virulently racist. Viewers may experience that racism through complex combinations of spatiotemporal observing of impossible bodies and linguistic reading of culturally structured signs.

PART 1: Reading

Joseph Witek identifies »reading« as the medium’s defining quality: »to be a comic text is to be read as a comic« with »an evolving set of reading protocols« (149). Thierry Smolderen uses »slow read« to describe viewing a single but complex image »that invites the eye to lose itself in the details« (8). Sam Cowling and Wesley Cray consider the verb a »linguistic accident,« because »whatever reading we do when we engage with comics, it is not the same activity as the reading we undertake when we engage with a novel« (77). They adopt Fredric Wertham’s »picture-reading,« but without stipulating a meaning. Wertham uses »picture reading« pejoratively, naming some lesser form of reading, which »consists in gazing at the successive pictures of the comic book with a minimal reading of printed letters« (139). Picture-reading also implies a distinction from word-reading, and since both are typically involved in the comics medium, »reading« seems to be a superordinate category combining both. Since »reading« is already the term for word-reading, it remains unclear what the combined activity entails.

The Oxford English Dictionary lists dozens of definitions for »read,« some specific to written words, some to interpreting and discerning generally (Read). Semiotics, or the study of signs, tends to obscure that difference. According to John Berger, »people are ›speaking‹ all the time, even when they aren’t saying anything verbally,« because »[h]airstyles, eye-glasses, clothes, facial expressions, posture, gestures, and many other things communicate or ›speak‹« as communicative signs that can be read (15). Setting aside the ambiguity of whether the individual intends to communicate through the selection of objects and other actions or if an object or action can itself be the communicator, the use of ›read‹ seems to be another linguistic accident since »reading« posture or clothing is unlike reading written language. Unlike American Sign Language, which contains fundamental features of language including rules for word formation and word order, body language is a language only in a metaphorical sense.

Semiotics has been applied to comics studies in a manner that further blurs the distinction between reading literally and reading metaphorically. The two founders, Peirce and Saussure, meant different things by »sign.« Saussure studied linguistic signs, and Peirce studied signs in a broader, nonlinguistic sense. The difference poses a special challenge to comics studies because the medium includes both words and pictures. While words are unquestionably linguistic, pictures are linguistically ambiguous. That ambiguity overlays a larger ambiguity within semiotics that must be addressed before applying semiotics to comics images generally and then to representations of race specifically.

Linguistic Signs and Nonlinguistic Signs

Semiotics developed from the works of American philosopher Charles Sanders Peirce (who coined »semiotics«) and Swiss linguist Ferdinand de Saussure (who preferred »semiology«) in the late nineteenth and early twentieth centuries. Though both study »signs,« Halina Yakina and Andreas Totua identify a defining difference: »a sign for Saussure is something delivered by someone with a purpose and specific meaning intentionally,« and all such signs are also conventions, »something that is mutually or commonly agreed by all those involved in the particular culture« (7). According to Peirce, however, »everything can be a sign, as long as it has the ability to represent something according to the individual’s interpretation« (7).

Saussure’s sign systems require learned conventions to communicate, while Peirce’s signs are not necessarily linguistic, cultural, communicative, or even human. Albert Atkin provides an example of Peirce’s broader use of »signs«: »Consider, for instance, a molehill in my lawn taken as a sign of moles ... since moles make molehills, molehills signify moles« (Atkin). Presumably the moles in Atkin’s lawn are not attempting to communicate with him, linguistically or otherwise. They construct molehills for independent reasons, and despite the active voice construction of Atkins’ sentence, molehills signify without intentionality. As Atkin explains: »a sign signifies only in being interpreted.«

While drawing its name and its vast range of available signs from Peirce, semiotics emphasizes Saussure’s linguistic approach. According to Umberto Eco in A Theory of Semiotics, »semiotics is concerned with everything that can be taken as a sign,« and a »sign is everything which can be taken as significantly substituting for something else« (7). While Eco’s definition does not define »sign« linguistically, the term carries Saussure’s linguistic connotation. As Berger celebrates: »the essential breakthrough of semiology is to take linguistics as a model and apply linguistic concepts to other phenomena – texts – and not just to language itself« (6). A »text« in this sense is any object that can be interpreted. Berger and Eco do not differentiate between linguistic and nonlinguistic signs, treating both linguistically. This essential breakthrough is also the source of semiotics’ essential ambiguity.

If anything can be read as a »text,« »reading« becomes synonymous with »interpreting,« even though »reading« denotes an act of linguistic communication. Semioticians, explains Berger, »treat texts as being like languages« (6–7), which produces language-like analysis that incorrectly implies that a nonlinguistic »text« involves actual language. A nonlinguistic object, such as a molehill, never becomes linguistic as a result of linguistic analysis. The molehill remains outside of language. In an often-cited passage, Saussure explains his central linguistic-specific definition of a sign: »The linguistic sign unites not a thing and a name, but a concept and a sound-image« (66–67). In semiotics practice, Saussure’s »concept« and »sound-image« are the »signified« and »signifier,« semiotics’ two most central terms. They originated with specifically linguistic meanings: the combination of a signified and a signifier is a linguistic sign, not Peirce’s broader nonlinguistic sign. If »signified« and »signifier« are applied to things such as moles and molehills, it is in a nonlinguistic, non-communitive sense.

Peirce, though his semiotic approach is not language-based, provides a means for distinguishing linguistic signs and nonlinguistic signs—an essential first step in determining what images can and cannot be read. Peirce coined the terms »token« and »type.« John Lyons in Semantics clarifies:

The relationship between tokens and types will be referred to as one of instantiation; tokens, we will say, instantiate their type ... Tokens are unique physical entities, located at a particular place in space or time. They are identified as tokens of the same type by virtue of their similarity with other unique physical entities and by virtue of their conformity to the type that they instantiate. (13–14)

A token of a word is the individual appearance of a word in a specific context. A word type is an abstraction independent of any of the word’s individual appearances. The following word token ›face‹ is the ink marks (or pixels depending on how you, the actual reader, are accessing this sentence) that combine to create its physical presence on this page (or screen). It is a token because the ink marks or pixels are additionally recognized as combining into the word type ›face‹ that is a learned abstract category in your, the actual reader’s, mind. Having recognized the word type (which is the combination of the four letterforms ›f, ›a,‹ ›c,‹ and ›e‹ in that order regardless of variations in font or handwriting), you then may associate it with a previously learned meaning (the front part of a head presumably) also stored in your memory.

The token/type distinction may also be used to distinguish linguistic and nonlinguistic images. Lyons concludes that the »linguist ... is interested in types, not tokens« (16, 28), because a word token only has meaning through its word type. Since Peirce was not making a linguistic distinction in his use of »token« and »type« (which are as broadly applicable as his »signs«), I stipulate uses that are specific to linguistic units: a token is a specific occurrence of a linguistic unit; the linguistic unit is an abstract category called a type; and the type is associated with a learned meaning. A linguistic unit is a linguistic unit because it has the three-part token-type-meaning quality, which a nonlinguistic unit necessarily lacks. Lyons acknowledges that »there is room for considerable disagreement as to where the bound should be drawn between language and non-language« (61), but the presence of token-type-meaning processing may be a defining quality of languages that distinguishes them from nonlinguistic sign systems. Token-type-meaning processing is reading, and only images that involve token-type-meaning are read.

Words and Snapshots

While semiotics covers a vast range of possible signs, the comics medium includes only a subset: two-dimensional marks on surfaces, prototypically ink on paper. The subset includes usually two kinds: rendered words and pictures. Both are visual images, but sets of two-dimensional marks that are understood as letterforms combined into words have an arbitrary and conventional relationship to what they linguistically refer to, while representational images in some way resemble what they refer to and that resemblance is how they refer.

Lyons identifies four design features that are unique to languages (arbitrariness, duality, productivity, and discreteness), which are »interconnected in various ways« and »present in all languages ... Whether they are to be found in any semiotic system other than language is questionable. But, if they are, they do not appear to be present on the same scale or to be interconnected in the same way« (79). of the four, I pause only on arbitrariness, because Saussure similarly observes the arbitrariness of the linguistic sign. In contrast, a resemblance-based representational image does not have an arbitrary relationship to the subject it represents because the image resembles the subject. Peirce terms such a non-arbitrary resemblance-based signs »likenesses, or icons; which serve to convey ideas of the things they represent simply by imitating them,« distinguishing them from »symbols, or general signs, which have become associated with their meanings by usage. Such are most words, and phrases, and speeches, and books, and libraries« (1894), which are arbitrary in Saussure’s sense. Pierce’s »likenesses« are not linguistic signs.

Referring-by-resembling typically involves a viewer experiencing the set of two-dimensional marks as a spatiotemporal representation: a three-dimensional object viewed from an implied viewer’s specific angle and proximity at a specific historical moment captured in the representation. By understanding a comics discourse’s two-dimensional marks as three-dimensional diegetic objects, viewers also understand that a represented object could be viewed from different angles, proximities, and moments. That is because the object is perceived as part of a diegesis (represented subject matter) that has spatiotemporality (it exists in space and time) and so is part of an implied larger world. The image refers to that world. Visual images that consist of letterforms also require viewers to recognize their resemblance to something else: the two-dimensional shapes of the individual letters that combine to form the word type (which do not imply the ability to be viewed from different angles, proximities, or moments). If a set of two-dimensional marks resembles letterforms that combine as a word known to a viewer, that viewer next accesses the word’s linguistic meaning. That is token-type-meaning processing discussed above.

The two processes (spatiotemporal and token-type-meaning) are distinct. Representational images create the illusion of a viewer’s direct interaction with the represented content in its diegetic context (however minimal or implied), and their meaning is a product of that interactive illusion. Rendered words create no such illusion. They are recognized as word tokens which reference word types which trigger meanings and, when in relationships with other word types, have grammar. Word tokens have meanings only to the degree that they are recognized as word types. They are what Peirce terms ›symbols.‹

Though symbols may have some iconic or pictorial qualities, symbols and icons function differently. Lyons clarifies further: »The conventionality, or arbitrariness, of symbols, in contrast with what might be called the naturalism of iconic signs is grounded in the user’s knowledge or awareness of the conventions,« and although »there are many iconic features in language,« such as »the characters and hieroglyphs of so-called ideographic writing systems,« »it is a relatively weak kind of iconicity that is found in language« (102–3). The iconicity is weak because language types with iconic features display only minimal resemblance to what they represent, and they do not represent any spatiotemporal particulars. The hieroglyphic sign for »mountain,« for example, minimally resembles the contours of two round mountain peaks, but it can refer to any mountain regardless of the number and roundness of its peaks. The resemblance is irrelevant to its being read. A token of a type with iconic features must foremost resemble its type, and then the type provides the connection to the represented subject. That the token also minimally resembles the represented subject is secondary to and unnecessary for its linguistic function.

Though words and pictures understood in this sense are distinct, they can also combine. Representational images can include words (Magritte’s This Is Not a Pipe), and words can be rendered in ways that create representational images (calligrams arrange words in the shape of a poem’s subject). Despite such overlaps, words and representational images require independent processing. Where a language requires token-type-meaning processing, the relationship between a spatiotemporal image and its subject is paradigmatically »one unique physical entity, located at a particular place in space and time« (the rendered image) representing another »unique physical entity, located at a particular place in space and time« (the represented content) (Lyons, 13-14). Recognizing a drawing of a particular person, for example, requires accessing mentally stored information about that person, but the drawing represents a specific spatiotemporal instantiation, not the person as a general concept. It is a spatiotemporal particular representing another spatiotemporal particular. Roughly speaking, token-type-meaning processing is object-idea-idea, and spatiotemporal processing object-object. If viewers recognize a person in a drawing, they do so by accessing knowledge about the person, producing a different three-step relationship: object-object-idea. The first step, object-object, is not part of a linguistic process because the drawing is no more a linguistic sign than is the person when viewed directly. If for some reason the person is not recognizable in the image (due to a quality of shading, for example), the drawing still represents the spatiotemporal event.

Calling a resemblance-based image a »sign« invites confusion. The colloquial term »snapshot« may be clarifying. In film, »shot« denotes the distance and angle of a camera in relationship to its subject, and »snap« suggests a nearly instantaneous passage of time—the spatiotemporal combination absent in linguistic tokens and types. Andrei Molotui observes similarly of realistically drawn comics images: »Because of its approximation of a photograph, a photo-realistic panel can seem close to a snapshot—that is, a representation of only the fraction of a second in which the photo was taken« (168). Since »snapshot« denotes a photograph, I use a broader term. A spatiotemporal image is an image that resembles a spatiotemporal occurrence. The occurrence includes the subject matter and the angle and proximity of an implied viewer at a moment in time. A spatiotemporal image is also itself an object that exists spatiotemporally as a physical discourse distinct from its diegetic content. Viewing a spatiotemporal image is viewing the viewing of something else, creating the illusion of interaction in a diegetic world.

Grammar and Events

An assumption that spatiotemporal images are linguistic signs hinders rather than aids visual analysis. The study of works in the comics medium, as well as visual arts generally, requires differentiating an image’s linguistic and nonlinguistic elements. Comics semiotics does not recognize this difference and so misreads the medium by claiming always to read it.

Thierry Groensteen calls layout »the device upon which the language is founded« (2007: 28), and he calls any three panels »composed of the panel that is currently being read, the panel that preceded it, and the panel that immediately follows it« a syntagma, a sequence of linguistic units in syntactic relationship (111). Neil Cohn clarifies that »comics are written in visual languages in the same way that novels or magazines are written in English« (2), and he categorizes panels according to their representational content into categories he calls Visual Language Grammar. Cohn acknowledges that »the combination of images may be closer to the structure used between whole sentences« and so operating »at a higher level than syntax« (65). The same is true of what Groensteen calls syntagma, since previous-current-next triads could also apply to sentences, paragraphs, chapters, and books in a series.

According to the OED, »syntax« and »grammar« are synonyms, denoting the »set of rules and principles in a language according to which words, phrases, and clauses are arranged to create well-formed sentences« or the »ways in which a particular word or part of speech can be arranged with other words or parts of speech« (Syntax, Grammar). The terms are limited to the scope of a single sentence and do not apply between sentences or to larger structures such as paragraphs. The OED offers a secondary set of definitions for »syntax« and »grammar« not related to language but to arrangement and connection generally, making »structure« and »organization« additional synonyms. This seems to be the nonlinguistic sense that Groensteen and Cohn use, while connotatively implying that a comic’s spatiotemporal images are linguistic. Neither author provides reasons for why such images should be considered linguistic, and I consider them nonlinguistic for reasons detailed above (the absence of token-type-meaning processing).

Unlike molehills, however, comics images do involve human intentions to communicate. The communication also typically involves culturally defined conventions, but the images lack the arbitrariness of a linguistic sign. Instead of grammar, Cohn’s narrative panel types categorize segments of a structure applicable to any actual event, not just an event’s graphic representation. Unlike actual events, the panels are two-dimensional marks and to that degree resemble written language, but instead of recognizing an image as a linguistic token, viewers respond to each image’s represented content which determines the image’s categorization. Just as actual spatiotemporal events are unrelated to grammar, two-dimensional renderings of spatiotemporal events are also unrelated to grammar.

The process for understanding spatiotemporal images is event structure. Jeffrey M. Zacks and Barbara Tversky define an event as »a segment of time at a given location that is conceived by an observer to have a beginning and an end,« and they define »event structure perception« as »the process by which observers identify these beginnings and endings, and their relations« (3). Ray Jackendoff accordingly proposes an event structure model divided into three parts around a central goal: »a Head (the main action), with an optional Preparation (things that have to be done before the Head can be begun) and an optional Coda (things that are done to restore the status quo ante)« (201). Cohn bases his narrative panel types on Jackendoff ’s model, acknowledging that »each narrative category maps to prototypical event structures« (8). For example, a »Peak,« Cohn’s variation on Jackendoff ’s »Head,« »marks the height of narrative tension and point of maximal event structure« (70). Where event structure is a mental process, Cohn’s panel types are categories of two-dimensional marks that represent event segments. What Cohn calls grammar is based on the spatiotemporal event of an image’s represented content. The process for understanding an actual event or a represented event are the same, and that process is not linguistic but spatiotemporal. Jackendoff ’s model is spatiotemporal too. If a Prep-Head-Coda event were rendered into images, the images would still convey the meaning of the event they represent because the images are processed as if they were actually spatiotemporal—that is, directly observed in the actual world. Grammar is not involved.

Cohn argues against the objection that panels are not linguistic signs and therefore are not part of a visual language by noting that some languages, including American Sign Language, use »both arbitrary and non-arbitrary features« (18). While true, the non-arbitrary iconic features are still processed as tokens of types. As Lyons says of hieroglyphs: »it is a relatively weak kind of iconicity« (103). A pictogram, for example, has non-arbitrary features, but those features are not involved in its token-type-meaning processing. If the non-arbitrary features were substituted with arbitrary ones, the linguistic sign would function the same. Cohn also notes that, while Saussure emphasized the arbitrariness of linguistic signs, Peirce recognized other kinds of »signs,« including resemblance-based »icons« (18). While true, Peirce did not claim that icons were therefore a kind of linguistic sign, since his »signs« include many things (such as donkeys and drunken men) that are not parts of any language.

Cohn also likens comics art to synthetic languages in which units smaller than words must be combined to form meanings, equating that linguistic process to artists following norms for representing such things as hands and faces. He concludes that rather than »the articulation of perception,« drawing »uses schemas that are stored in memory and then combined using rule systems,« and therefore »drawers must use graphic schemas to represent their intended meanings« (33). Yet many drawers in the comics medium, most overtly photorealistic ones such as John Muth, Bill Sienkiewicz, and Dave McKean, have repeatedly articulated their visual perceptions without the use of graphic schemas.

Cohn uses the linguistic term »dialect« to label shared sets of comics drawing norms within what he calls »American Visual Language,« such as »Kirbyan,« Barksian,« and »Independent,« equating drawing norms with graphic schemas (139-143). He provides an illustration of how artist Eric Larsen draws various body parts in a repeated style and then combines those units to create an image of a specific character in »Kirbyan« dialect (29). Since the rules for combining body-part units is human anatomy, the combining is not a kind of grammar. When linguistic signs are involved in a representational image, they do not combine in a way that constitutes a language. Consider a face composed by placing words in areas that correspond to their linguistic meanings (Fig. 2). The individual words are processed linguistically, but their arrangement is processed spatiotemporally. If the words are in a language unfamiliar to the viewer (the second example includes seven Slovak words), their linguistic meanings are likely inferable.

Fig. 2: Verbal linguistic signs in a spatiotemporal arrangement.

If the placement of the words is misaligned with their meanings, the wrong meanings will be implied (the Slovak word for mouth is »usta,« not »krku,« which means neck). There is no grammar-analogous structure for organizing the diegetic elements of a representational image. Cohn’s »dialects« may seem linguistic because they potentially involve tokens recognized as types, but different drawing norms may reference the same spatiotemporal content non-linguistically. An artistic style is not linguistic unless it reproduces a set of marks that viewers recognize as repeatable tokens. Even then, the combination of those tokens does not constitute grammar.

Andrei Molotiu similarly identifies a »basic vocabulary« of »concept-images,« such as »a dot for an eye,« but though each »sign« is »closer to a conventionalized symbol than a mimetic copy of the feature it stands for,« they »do not function exactly like words« because »they are not arranged linearly« but by »the gestalt of their configuration« (164). In his 2002 poster comic »Whitney Prevaricator,« Chris Ware calls cartooning »a complicated pictographic language intended to be READ, not really SEEN!« (2002). The claim is partly true. Cartooning includes a range of conventions familiar to artist and viewers, and when any of those conventions are used and recognized as token of types, they function as »pictographic« linguistic signs. They do not, however, combine into a »pictographic language,« because their combinations are spatiotemporal rather than grammatical. Applying Ware’s playful claims to Molotiu, a vocabulary of concept-images is both READ as conventionalized symbols and SEEN in their gestalt configuration.

Marks and Viewers

Works in the comics medium include both linguistic signs (two-dimensional marks recognized as two-dimensional tokens of types) and spatiotemporal images (two-dimensional marks recognized as spatiotemporal objects). Though the two do not combine in a language, they do combine since an image may contain both linguistic marks and spatiotemporal marks.

The notion of a combined category is not specific to the comics medium. In »optical Laws or Symbolic Rules? The Dual Nature of Pictorial Systems,« John Willats explores »whether representational systems are natural or conventional« in the »sense that the relations between the spatial systems in pictures and the scenes they represent are arbitrary and can therefore be determined by cultural conventions rather than natural laws« (125). The arbitrariness suggests that such conventions might also be linguistic, which Willats acknowledges with a reference to Goodman: »pictures such as Byzantine mosaics and cubist paintings ... are often said to be ›languagelike‹ because they symbolize features of the scene rather than represent them optically« (126). Willats, however, instead asserts »that these two modes of description – in terms of the laws of optics or symbolic rules – are not mutually exclusive but complementary and that in many cases descriptions given in terms of symbolic rules can also be related back to optical laws« (126). Since some of Willats’ examples of images that combine optical and symbolic laws are still nonlinguistic (because their conventions are merely »languagelike«), I explore a smaller subcategory.

Combined sets of linguistic marks and spatiotemporal marks pose a challenge since they involve two distinct kinds of processing, which viewers might experience simultaneously, serially, or vacillatingly. If, for example, a viewer ultimately understands combined marks as linguistic, then the marks are linguistic marks accessed through an initially spatiotemporal process. If a viewer instead ultimately understands combined marks as spatiotemporal, then the marks are spatiotemporal marks accessed through an initially linguistic process. Some marks may instead remain mixed, and some may not be understood through either process.

This produces a four-part spectrum:

  • 1) Linguistic marks
  • 1a) initially processed linguistically
  • 1b) initially processed spatiotemporally
  • 2) Spatiotemporal marks
  • 2a) initially processed spatiotemporally
  • 2b) initially processed linguistically
  • 3) Mixed spatiotemporal-linguistic marks
  • 3a) serially mixed (including 1b and 2b)
  • 3b) ultimately mixed
  • 4) Non-representational marks

Subsections 1a and 2a are default assumptions: for 1a, a set of marks is recognized as a word token and no further recognition follows; for 1b, the set of marks is recognized as a representation of a spatiotemporal object and no further recognition follows. 1b, 2b, and 3b instead involve some additional recognition.

Consider Paul Agule’s optical illusion »Liar« in Fig. 3. For 1b, what is at first registered as a set of letters forming the word »Liar« is then recognized as also comprising a drawing of a face; for 2b, what at first is registered as a drawing of a face is then recognized as also comprising the word »Liar.« 3a includes 1b and 2b, categorizing the marks according to how they ultimately resolve, either as a word or as a representational image. 3b indicates marks that maintain both processes, never resolving one way or the other but vacillating between two perceptions. Agule’s »Liar« vacillates largely according to its orientation on a page, but Will Eisner renders the letters of his title character as spatiotemporal objects (buildings) on his The Spirit splash page, prompting both perceptions though a viewer/reader likely attends to only one at a time.

Fig. 3: Agule and Eisner.

Lukas Wilde discusses an example of a serially mixed image that is initially processed spatiotemporally before resolving linguistically. Using »kie’yu,« Natsume Fusanosuke and Takekuma Kentarô’s term for pictogram, Wilde writes: »A picture of a light bulb, for instance, can be used as a kie’yu, if placed over a character’s head. It then becomes a conventionalized way to represent that he or she experiences a sudden realization« (73). Evoking the linguistic term »prefix,« Cohn calls such an image an »upfix,« defining it as a »class of bound morphemes [the smallest language units] ... that appears above the head of characters ... most often to depict emotional or cognitive states« (42). Cohn acknowledges that none »of these objects literally float above people’s head,« yet »up« refers to an effect of gravity in a diegetic setting and so his term is spatiotemporal and so conflates discourse and diegesis (43). His example illustrations include a simplified lightbulb with emanata rays over an emoji-like head and the label »Inspiration« (42).

»Kie’yu« is a linguistic term too, but Wilde acknowledges the complexity of placing the realization-lightbulb in an otherwise spatiotemporal context: »this difference – representing a physical light bulb versus using the light bulb to represent that a character is experiencing a revelation – is tremendous« (73). When describing the kie’yu, Wilde identifies the essential quality of a word token: »A pictogram is a graphic configuration that may be understood as a picture, but this picture is itself a (more or less) conventionalized symbol. Its pictoriality is relevant only insofar as it lets recipients infer the appropriate assertion (that something is the case within the diegesis)« (73). Recall again Lyons’ description of hieroglyphs: »it is a relatively weak kind of iconicity« (103). The rendering of the lightbulb might include any of a range of physical details, but none are relevant because none represent anything spatiotemporally. As with letterform words, differences in pictogram rendering are no more significant than differences in fonts or handwriting. Though Wilde notes the »picture theoretical consequence« that the pictogram’s »›closeness to perception‹ is drastically reduced« (74), it may instead be eliminated entirely since no spatiotemporal object is perceived. Linguistic marks make no spatiotemporal claims.

Wilde also identifies the mixed nature of the processing: »In the case of the light bulb, it usually must be decided on which side of the ›threshold‹ we are: either there is a physical thing within the storyworld or there is not« (74). Since a viewer must first recognize the lightbulb as a lightbulb, the initial processing is spatiotemporal: the lightbulb is a lightbulb because it resembles a lightbulb. However, upon next recognizing the lightbulb as a token of the realization-lightbulb type, the processing shifts to linguistic, with its meaning derived from a viewer’s knowledge of the type. While some mixed images remain mixed (such as Eisner’s and Agule’s above), the realization-lightbulb resolves as exclusively linguistic, possessing no spatiotemporal qualities. If an artist instead rendered the letters »IDEA« within a circle of emanata above a character’s head, the overall image would change discursively but not diegetically. Since the realization-lightbulb has a primarily arbitrary relationship to its meaning, it meets the necessary requirement for being a linguistic token-type.

The presence of a pictogram in an otherwise spatiotemporal image also reveals the necessity of clarifying not just the nature of marks, but the knowledge of viewers. The meaning of linguistic signs, whether letterform words or pictograms, is available to viewers familiar with the cultural conventions of the linguistic sign. An unfamiliar viewer would only understand the marks spatiotemporally. A viewer unfamiliar with the realization-lightbulb pictogram, for example, would understand it to represent a spatiotemporal object floating above the spatiotemporal object of the character’s head. A viewer unfamiliar with the word »IDEA« might understand it as non-representational marks drawn above a character’s head—even if they suspect the marks are intended to communicate some meaning unknown to them. Such linguistic marks would instead be understood as non-representational, what is often termed »abstract.« Since convention-familiar viewers may also understand certain marks to be neither spatiotemporal nor linguistic, the category is necessary generally. While convention-familiar viewers access the four-part structure outlined above, unfamiliar viewers understand two-dimensional marks within only a two-part structure: representational or non-representational.

A lightbulb drawn above a character’s head suggests nothing about the race of the character. other drawings include linguistic and spatiotemporal marks that viewers may read or observe as indicators of race—which the next section explores.

PART 2: Race

Part 1 established a reading/observing distinction and a four-part structure for the range of ways an image may be processed. Though works in the comics medium do contain linguistic images, those images are not part of a larger system with grammar or syntax and so not a language. Part 2 applies these concepts to depictions of race in the comics medium, demonstrating how the race of a character is read, observed, or both read and observed, depending partly on qualities of specific images and partly on the knowledge and preferences of specific viewers.

To clarify the difference between linguistic signs and spatiotemporal images as applied to race, consider a criminology study correlating sentencing with racial appearance. In »A Punishing Look: Skin Tone and Afrocentric Features in the Halls of Justice,« Ryan D. King and Brian D. Johnson analyzed »850 booking photos of black and white male offenders in two Minnesota counties« and »coded and then matched to detailed sentencing records,« concluding that »darker skin tone and Afrocentric facial features are associated with harsher sanctions« (90).

Facial features and skin tones are signs in Peirce’s very broad sense but not in Saussure’s linguistic sense. Unlike Berger’s list of »[h]airstyles, eyeglasses, clothes, facial expressions, posture, gestures,« any of which might be selected and performed in an attempt to »speak« something (15), an individual’s physiognomy is independent of the individual’s intentions. The offenders and their photographs are not communicating anything except in the loose sense that their appearances can be interpreted. »Afrocentric facial features« and »darker skin tone« apparently communicated to White judges that the individuals deserved harsher treatment.

King and Johnson’s study focused on photographs, which appear atypically in the comics medium. Images of Black and White characters, however, appear regularly, and viewers may understand characters to have »Afrocentric facial features« and »darker skin tone,« which they use to determine race. Artist Mike Grell, for example, created the Black superhero Tyroc (the first image in Fig. 1) for Superboy Starring the Legion of Super-Heroes #216 (April 1976). He told an interviewer: »I modeled him somewhat after Fred ›The Hammer‹ Williamson, who was a movie star at the time« (89-90). Williamson stared in the 1972 Hammer, and Grell’s drawings of Tyroc’s facial features resemble Williamson’s facial features in the movie poster. Whether perceived as »Afrocentric facial features« or not, they are processed spatiotemporally. The resemblance makes it difficult to separate the representational qualities of the image from the qualities of the subject they represent. This suggests why linguistic signs are functionally arbitrary. If a sign resembles its meaning, then its meaning is that resemblance. The spatiotemporal representation is non-arbitrary because it evokes the relevant qualities of the represented subject. If a semiotician wishes to analyze a spatiotemporal image’s race-signifying signs, she must do so spatiotemporally—as if analyzing the subject directly.

Grell’s Tyroc presumably appears Black because Fred Williamson appears Black and so therefore is nonlinguistic. Can other representational image’s race-suggesting details, such as »Afrocentric facial features« and »darker skin tone,« be linguistic? That is, can race be communicated through pictogramic conventions meaningful only to viewers familiar with them? Answering requires first identifying the drawing conventions of racial depiction.

Spatiotemporal Racial Marks and Linguistic Racial Marks

To further distinguish kinds of racial perception in the comics medium, consider two projects by Jack Kirby. The first involves spatiotemporal racial marks, the second linguistic racial marks.

Mark Evanier recalls the history of a never-published and so unnamed romance comic intended for a Black audience in the early 1970s. Kirby acquired »copies of Ebony magazine to use as reference for faces and dress,« but when »a magazine distributor who was said to have expertise on the kind of mostly-black neighborhoods where DC hoped to sell most of the press run« looked at Kirby’s art, he »felt that the faces were ›too realistic,‹« and so DC instructed inker Vince Colletta to redraw them so that »all the women look like Diahann Carroll and all the men look like Sidney Poitier,« »two popular black stars of the day who were considered very attractive and perhaps more acceptable in some circles« (7). Based on the white-outed interior facial areas on the artboards (as seen in Fig. 1), Colletta did not alter Kirby’s eyes, only his cheeks, mouths, and noses. Jerry Boyd argues that it

wasn’t really necessary for them to be altered (they’re fine). Since black faces cover a spectrum (Harry Belafonte doesn’t look like Sidney Poitier who doesn’t look like Robert Hooks, and Diahann Carroll doesn’t look like Aretha Franklin who doesn’t look like Eartha Kitt, and I could go on and on), the »faces« issue should’ve been a minor concern. ... If it had reached the newsstands, believe me, black readers would’ve given their input and the matter would’ve been resolved. (63-64)

Based on Evanier’s and Boyd’s assessments, Kirby drew spatiotemporal images. He imitated real-world source materials of actual Black people in a way that created impressions that certain viewers experienced as realistic. Even Colletta’s alterations are spatiotemporal since they attempt to resemble the faces of two specific real-world actors. Though adopting Carroll and Poitier as models for Black characters generally is a stereotype and so a kind of a convention, it is not conventional in the sense that linguistic signs are conventional. Viewers are meant to recognize each image’s definingly non-arbitrary resemblances to an actor. As discussed above, the spatiotemporal processing focuses on the content of the representational images as though that content were actual and not rendered two-dimensionally. It occurs at the diegetic level triggered by the images, not at the discursive level of the physical marks. In short, nothing pictogramic is involved.

Kirby’s approach for a later humor comic about four teenagers differs. originally intended as an on-going title, »Dingbats of Danger Street« appeared as a one-off in 1st Issue Special #6 (September 1975). When two additional unpublished issues were collected and completed for Dingbat Love, John Morrow explained: »Tom Ziuko ... colored the whole batch in a 1970s-appropriate style. DC had oddly chosen to color Non-Fat with a Caucasian skin tone in 1st Issue Special #6, but we’re staying true to Kirby’s vision for this book’s presentation« (111). Rob Steibel notes that that the differences are more than coloring: »you can see the character Non-Fat appears to have African American features in the original pencils; in the published version the character has been changed to look more like a Caucasian« (Steibel). It is unclear why the character was redrawn, but Steibel suggests it was a general policy to avoid »accusations of stereotyping and racism,« even though DC had mandated stereotyping in Kirby’s earlier Black romance art. Regardless of the editorial intent, Kirby redrew three areas of Non-Fat’s face: the cheeks appear flatter, the lips thinner, and the nose longer. The tip of the nose, originally a circle, became a half-oval extending closer to the top lip and so reducing the undrawn area of the philtrum by roughly half. None of the other three non-Black characters have a circle for a nose tip, and so it seems to be intended as a specifically Black trait.

Did Kirby draw a Black character with a circular nose tip because Black people commonly have round nose tips or instead because a circle for a nose tip was a norm for drawing Black faces? Quantifying actual nose shapes is complex. For their study »Investigating the case of human nose shape and climate adaptation,« Arslan A. Zaidi and his co-authors »captured high resolution 3D images of participants’ faces using the 3dMD Face system,« placed five »positioning landmarks (two on the inner corner of the eyes, two on the outer corners of the mouth, and one on the tip of the nose) ... to establish facial orientation,« mapped a »spatially dense mesh of 7,150 quasi-landmarks ... onto each image,« removed »differences in position and orientation« using a »Generalized Procrustes Superimposition« program,« calculated »linear distances ... using seven standard anthropometric landmarks,« and averaged the »Procrustes coordinates ... to remove effects of bilateral asymmetry« (Zaidi). Their results reveal nothing about the roundness of nose tips.

Regardless of whether a rounded nose tip is a common Black feature, Kirby did not derive his character design for Non-Fat from a human model. Unlike his photo-referenced images for the unpublished romance comic, Non-Fat instead reiterates an established drawing convention. The small circular nose is common in racist Black caricatures since at least the early twentieth century, appearing in the 1945 Little Black Sambo board game, the 1941 Walter Lantz Studio cartoon »Scrub Me Mamma with a Boogie Beat,« and Will Eisner’s character Ebony White beginning in the first 1940 The Spirit newspaper installment. The caricatures are better known for other features, most especially their grotesquely exaggerated lips, none of which Kirby reproduces.

Fig. 4: 20th century Black characters with circle noses in Strömberg 2003.

That Non-Fat and Ebony White share no features other than the circles of their nose tips suggests that Kirby, while intending to avoid racist caricature, was still using a different drawing approach than when he was using Ebony photo references. Fredrick Strömberg’s 2003 Black Images in the Comics: A Visual History includes over ninety depictions of Black characters, nearly all from the 20th century and most drawn in a cartoon style. Roughly one third of the images include a nose represented by a full or partial circle. Tallied by decades, the images map the rise, fall, and contextual transformation of the drawing norm:

Blackface conventions, including circle noses, peaked from 1925 to 1945. While all of the pre-1970 images include circular noses in the context of other minstrel norms, three-quarters of the post-1970 images isolate and extract the circular nose from that tradition and recontextualize it within racial but non-blackface depictions. Kirby’s Non-Fat, which is too obscure to merit inclusion in Strömberg’s visual history, belongs to this later tradition.

Non-Fat’s nose then is distinct from spatiotemporal images that are entirely resemblance-based. The circle is as simple as a letterform, and though it represents the idea of a nose more than it resembles any actual nose, it still bears the rudimentary iconic quality of a pictogram. However, unlike a pictogram, the circle is recognizable as a nose only in the context of the larger spatiotemporal image of Non-Fat’s face. That spatiotemporal location is its most determining quality, since a wide range of dissimilar marks are understood as representing a nose if drawn in the same location (as demonstrated in Fig. 2), and the identical circle drawn in a different location would not produce the impression of a nose. Where a realization-lightbulb pictogram requires a viewer to recognize it first as a representation of a lightbulb, the circle-nose drawing convention requires a viewer to recognize the context of a face first. Reading in this case involves simultaneously observing.

Mixed Marks and Mixed Processing

Kirby’s circle nose tip, like many other drawing conventions, does not function simply as a pictogram, but it still may possess linguistic qualities. As detailed in Part 1, arbitrariness distinguishes linguistic signs and spatiotemporal images, but some sets of marks are not clearly arbitrary nor non-arbitrary because the image’s resemblance to its subject matter is ambiguous. The less resemblance-based the image is the more arbitrary it is and so the more potentially linguistic. Racist caricatures provide a further example, because, as Daniel Stein concludes about Edward Windsor Kemble’s 1898 Comical Coons, the racist »images seem less rooted in an intimate knowledge of African American life than in a lineage of visually coded racial fantasies« (213).

Like Eisner’s Ebony White, Charles Nicholas Wojtkoski’s Whitewash Jones in Marvel’s Young Allies #1 (June 1941) reiterates blackface minstrel norms that defy human anatomy and so resemblance-based representation. When Wojtkoski draws Whitewash Jones in profile, the character’s lips and jaw appear mule-like. Human lips, regardless of an individual’s race, cannot extend to such proportions. Instead of any specific spatiotemporal reference, the image more closely resembles other instances of the blackface minstrel tradition. The original blackface images were drawn not on paper but on White skin, creating the illusion of lips extending roughly an inch beyond the edges of the performer’s actual lips. If a later iteration on paper is recognized and processed as a drawing-convention type understood to represent its subject non-literally and so semi-arbitrarily, it is a token and so a kind of linguistic sign. If its meaning is the racist denotation »Black lips,« that meaning is accessed through a viewer recognizing the linguistic type in Wojkoski’s rendered token, not through spatiotemporal resemblance to any actual Black lips.

Nicholas Sammond traces the evolution of nineteenth-century blackface minstrels into early twentieth-century animation as embodied in Mickey Mouse, Buggs Bunny, and other cartoon characters that »exhibited a number of physical features that marked them as minstrels,« including »the wide, expressive mouths and eyes of the minstrel painted onto black bodies,« »accentuating the eyes and mouth to make them seem larger and wider« (28, 26, 19). These became the norms of static cartoons too. E. C. Matthews instructed aspiring cartoonists in his 1928 How to Draw Funny Pictures: »The wide nose, heavy lips and fuzzy hair are all as important for a colored cartoon character as the dark complexion« (24, 64). Rebecca Wanzo argues that »the use of caricature and other aesthetic techniques to stereotype bodies is essential to constructing meaning« in the comics medium, and that »stereotypes—repetition of generalized typology—[are] embedded in [its] foundations« (23, 2). Wanzo explores the tradition of racist Black caricature that Eisner and Wojtkoski reproduce, noting how a 1902 example by George Herriman »clearly evokes blackface,« and how in a 2007 example by Kyle Baker the »grotesque representation has no relationship to real phenotype,« but »readers’ knowledge of whom the [figure] references doubly emphasizes the phantasmagoric nature of stereotype« (1). Though Wanzo calls caricature a »language« and stereotypes »visual grammar« (2, 4, 5, 6, 24), she also refers to »visual vocabulary« (24), which may better reflect her general meaning. The repeated units of Blackface caricature are linguistic signs because viewers perceive them as tokens of types rather than as resemblance-based spatiotemporal images.

Wojtkoski’s Whitewash Jones, however, is not exclusively linguistic. The face, though grotesquely disproportionate and reproduced from a tradition of similarly racist images, maps onto a diegetic reality—that is, viewers understand the character’s body to function in a spatiotemporal environment. Unlike a realization-lightbulb pictogram which is initially perceived spatiotemporally but resolves into a linguistic sign only, Whitewash Jones also remains a spatiotemporal image. Described as 3b above, an image may be perceived as ultimately mixed, because viewers process the same marks as both spatiotemporal and linguistic, with neither quality fully dominating.

Again, viewers play a key role. If a viewer is familiar with blackface minstrel conventions, the marks can function linguistically as tokens of blackface types. A viewer not familiar with the blackface drawing conventions that Wojtkoski repeats could only understand Whitewash Jones spatiotemporally. Since the grotesque features lack spatiotemporal resemblance to the features of any actual person, Black or otherwise, they may not be interpreted as racial markers at all. If so, the nonlinguistically perceived image would not »read« as Black but instead be observed as representing some apparently fantastical creature.

Manga Faces and Japanese Faces

The same double processing applies to other ultimately mixed images, including non-racist representations. While blackface drawing conventions for representing Black people were developed by White artists for White audiences, manga drawing conventions for representing Japanese people were developed by Japanese artists initially for Japanese audiences.

Conventional manga faces often include pointed chins rendered with a wide »v«-like line and a minimal nose rendered as a sideways »v« as if the nose were viewed in profile even when the face is forward-looking. Viewers unfamiliar with these manga drawing norms would not recognize the wide v-mark as a token of a chin type or the sideways v-mark as a token of a nose type, but, because the marks still have spatiotemporal properties, those viewers would still likely understand the marks to represent a chin and a nose through their minimal but sufficient resemblances and anatomical positions. Manga-familiar viewers would recognize the marks as linguistic signs but also process them spatiotemporally, understanding the facial features and the rest of the character to exist in a specific location and moment in a diegetic world viewed from an implied angle and proximity. Such drawing norms are not exclusively manga. Fredrick Strömberg observes a similar tendency in Marjane Satrapi’s Persepolis, where »even though a face might be shown directly from the front, the nose is still depicted as if seen from the side, with variations of an L-shaped line,« suggesting that »over-dimensional, non-anatomical noses« are an international norm (220). Aaron McGruder also uses a small sideways »v« nose for Black characters in his 1999 comic strip Boondocks, the final image in Strömberg’s visual history. Satrapi’s and McGruder’s artworks, however, do not reproduce other manga drawing norms and so their discursive noses likely do not trigger manga-based reading.

If manga-familiar viewers associate manga-style faces with Japan, those viewers may also perceive manga characters as Japanese. This would not be due to any spatiotemporal resemblance of manga facial features to any actual Japanese facial features. Unlike racist blackface caricatures which grotesquely exaggerate a White social perception of Black lips being definingly larger than White lips, the exaggerations of manga features do not overlay racial stereotypes. The chins of Japanese or, more generally, Asian people are not imagined to be more pointed than the chins of other racial groups. Though White artists and viewers might participate in a racist drawing norm of straight horizontal lines representing Asian eyes, manga eyes are typically non-naturalistically large and round. A viewer unfamiliar with manga would have limited basis to associate manga facial features with any specific racial group and might perceive a manga figure as humanlike but not necessarily human (because manga drawing norms are more naturalistic for bodies, the contradictory combination of semi-arbitrary facial features with resemblance-based bodies may produce effects especially surreal for unfamiliar viewers).

Manga drawing conventions also further demonstrate how cultural conditioning shapes individual viewer perception. Race in manga, Terry Kawashima argues, »is generated through a visual reading process in which certain features are highlighted and others suppressed or ignored to ensure a coherent result« (164). From the »white-privileging subject position« common in the U.S., »blond hair and blue eyes are almost unquestionably considered ›white‹ characteristics, whereas ›olive‹ (or less flatteringly, ›yellow‹) skin, ›almond‹ eyes and straight black hair are understood as markers of ›Asian-ness‹« (163), but »only because that viewer has been culturally conditioned to read visual images in specific racialized ways that privilege certain cues at the expense of others and lead to an overdetermined conclusion« (161). Unlike many White viewers, Japanese viewers do not »routinely perceive big-eyed, non-black-haired characters as being ›white‹ rather than being ›Japanese‹ because »style is used in representing both ›white‹ and ›Japanese‹ characters« and is part of »a certain kind of aesthetic promoted in contemporary Japan (169, 173). Kawashima is describing manga facial features as linguistic signs that vacillate according to whether viewers are familiar with their cultural-specific meanings.

Generalizing beyond White and Asian examples found in manga, Kawashima also argues that this »visual reading process operates at a level below everyday awareness and is thus naturalized; it is central to the ways in which ›race‹ itself is conceptualized, perpetuated, and constantly reconfigured« (162). He claims:

every new encounter with a stranger or a visual representation of a human figure ... sets into motion this reading process in which we discern, consciously or subconsciously, the person‘s racial category. We »read« and thereby produce »race« on a daily basis, and the charged arena of Japanese shojo manga renders the process most visible. (176)

Unlike his earlier and accurate linguistic claims, here Kawashima blurs the meaning of »read« in the ambiguous semiotic sense, applying it both to interpreting the races of actual people through their actual appearances and to decoding two-dimensional marks produced by artists with the intent of communicating drawn characters’ racial categories. The first is spatiotemporal observing, and the second is linguistic reading.

The reading/observing difference is especially critical for understanding race because spatiotemporal observations of race are inherently inexact. observers of actual individuals may misattribute race due to the necessary absence of linguistic racial marks. »The differences in visual appearance, or phenotype, between different racial or ethnic groups are real,« explain Benson and Singsen in their study of Whiteness in U.S. comics, »but no scientific basis exists for separating people by phenotype into racial or ethnic categories« because »phenotypical attributes such as skin, hair, and eye color or the shape of various facial features vary heavily within racial and ethnic groups as well as between them« (2022: 8). Though real, phenotypes may or may not reflect racial drawing norms.

Jeesham Gazi understands manga-style characters not as racially ambiguous, as Kawashima claims, but as specifically Japanese »to the exclusion of all other peoples or races« (127). Gazi argues:

The apparent ambiguity in the ethnicity of manga characters, then, is specifically related to the Japanese self-image. This self-image transcends real-world racial signifiers and instead trades in what, in real terms, would be considered a kind of hybridity. Yet, as an accepted self-image, it is in real terms that this default visage of ambiguity is accepted, by some Japanese, as constituting the concrete visual cues for their own specific ethnicity. (124)

Japanese phenotypes and manga drawing norms for depicting Japanese characters diverge, producing two visual processes with minimal overlap, one based on the nonlinguistic phenotypes observed in actual Japanese people and rendered as spatiotemporal images, and one based in drawings norms that represent Japanese people without necessarily resembling them.

Others and Tyranny

Visual representations that supersede »real-world racial signifiers« are more common and significantly more harmful when applied between racial groups. Gerald Vizenor refers to the misrepresentations of native people as »simulations of the other,« meaning »the image is an invention with no connection or referent to the real. Simulations deliver popular stereotypes, and they are very familiar messages, very powerful« (1999: 158). Mary Gregg observes similarly that:

WW2 propaganda against the Jews and Japanese forces was pervasive because of the fabricated traits its cartoons so successfully attributed to the real human subjects it was clearly intended to refer to out in the world, although the depiction itself bore no physical resemblance to its referent (1325)

Writing four decades after creating Ebony White, Eisner calls stereotypes an »accursed necessity« of the comics medium, specifically »the simplification of images into repeatable symbols,« including ones that can be used »as a weapon of propaganda or racism. Where it simplifies and categorizes an inaccurate generalization, it can be harmful, or at the least offensive« (11). Though Art Spiegelman repeatedly refers to cartooning as a »language« (47), one with an »impoverished vocabulary« (45), the use of linguistic terms does not mean that cartoons are a language. However, that cartooning »is mostly limited to deploying a handful of recognizable visual symbols and cliches« defined by a »compression of ideas into memorable icons« does further suggest that the drawing approach includes learned cultural conventions reproduced by artists with the intention of being recognized by viewers as linguistic tokens/types—including Eisner’s harmfully »inaccurate generalizations« and Vizenor’s false »simulations.«

According to Spiegelman, cartooning also has a »mocking tone built into« itself due in part to its »use of the discredited pseudoscientific principles of physiognomy to portray character through a few physical attributes and facial expressions (47, 45). Ernst Kris and Ernst Gombrich credit the Carracci brothers for developing cartooning’s forebear, caricature, near the end of the sixteenth century through »conscious distortion ... with the aim of ridicule« in which »a person is represented by one salient characteristic only« (1938). To create »a likeness more true than mere imitation,« caricaturists seek what Kris and Gombrich call »the perfect deformity, thus penetrating through the mere outward appearance to the inner being in all its littleness or ugliness« (1938). Literally dehumanizing animal motifs were especially prevalent:

A human head, for instance, becomes gradually transformed into the head of an animal without losing the portrait note, the likeness. The Carracci ... drew this idea from the dogma of physiognomy. The pseudo-Aristotelian idea according to which the human character can be determined from the similarity of the human countenance to that of certain animals was widely disseminated through the writings of Giovanni Battista Porta. The illustrations to Porta‘s treatises always show the heads of men and animals in convincing similarity.... Caricature exploited for its own purpose what was here the illustration of a 'scientific' doctrine. It turns man into an animal. (1938)

Benson and Singsen similarly observe that Ebony White »was depicted by Eisner and the artists he hired with enormous eyes and round, doughnut-shaped lips that appear more simian than human« (2022: 34). Viewers of Wojtkoski’s Whitewash Jones likely recognized a horse-like resemblance. Both White and Jones conform to what Benson and Singsen identify as »the standard representation of characters in the blackface minstrel tradition« (2022: 34). Andrew Kunka instead describes White’s »big saucer eyes, giant red clown lips, and minstrel speech« in relation to Black stereotypes in Hollywood films, noting Eisner’s paradoxical self-defense of being »powerless against the zeitgeist for the images that he uses« despite his also being »hailed as an innovative genius« (71). Eszter Szèp might place such racist images on the »nontransparent« extreme of her line-subjectivity scale, understanding the marks to be »born out of conventions and conventionalized systems, such as culture, education, and expectation« and so »socially conditioned« by »institutions, training, and contexts« (2020: 42).

»Any visual text,« Strömberg similarly writes, »produced individually cannot help but articulate visions of the culture it belongs to« (2022: 205). Strömberg accounts for such cultural transmission through Gombrich, who argues that artists learn to represent subjects not by observing those subjects directly but by studying how other artists have represented them and then reproducing their schemata—what might constitute linguistic signs. Strömberg clarifies Gombrich’s varying uses of »schemata« by proposing three subcategories: 1) »visual ideas,« which are »thoughts on how we perceive patterns in certain phenomena in art and real life ... based on the general characteristics of something« [97], 2) »visual building blocks,« which are »based on the practical methods an artist works with, conventions for representations of different phenomena« (96), and 3) »visual elements,« specific examples of depictions that embody the »mental concepts« of the first two (98, 194). Coining the phrase »tyranny of visual schema,« Strömberg explains:

In a specific comics culture, there seems to be unwritten rules as to which visual schemata, which conventions to use to create visual elements. Breaking with these rules ... [may] run the risk of failing to live up to the expectations of what the art is supposed to look like. [...] Consciously or unconsciously, a comics artist may feel a need to fit his or her art into certain visual parameters, i.e. sets of schemata and engage in schema preserving (205-6).

If an artist copies »other artists’ visual solutions more or less directly,« Strömberg terms the process »assimilation,« and if the artist produces »new, personalized versions« of visual building blocks, the process is »accommodation« (98). Both produce »visual genealogies,« including »sets of visual patterns of a certain style that can be traced between different pieces of art« and »how influences ... have been communicated and developed over time« and integrated »between different cultures« (229).

A Gombrich-Strömberg approach accounts for the development of Eisner’s White, Wojtkoski’s Jones, and Kirby’s Non-Fat. All three have visual elements traceable to the general visual idea of blackface minstrelsy, which produced the visual building blocks of distorted facial features such as lips, which Eisner and Wojtkoski assimilated to produce the specific visual elements in their racist caricatures—presumably because the tyranny of those visual schema dominated the subculture for depicting Black characters in a comics context intended to be humorous to a White U.S. viewership in the early 1940s. Kirby’s use of a round nose is also a visual building block traceable to the visual idea of blackface minstrelsy, but in Non-Fat’s case, Kirby accommodated rather than assimilated, by including the isolated visual element in a face free of other minstrelsy visual building blocks, especially the most distorting and so defining ones.

Gregg, who explores »implications of Gombrich’s schemata that don’t become apparent until they are applied to cartoons,« identifies three kinds of potential harm:

  • 1) Harm to viewer by lying visually, spreading misinformation
  • 2) Harm to subject of depiction by defamation
  • 3) Harm to group affiliated by the schema we use to misinform the public, training them to treat the group as the depiction informs them (1310, 1324)1b) initially processed spatiotemporally


These harms would likely have most occurred during the period in which the images were first published and viewed by predominately White audiences who read the misinformation as factual. Contemporary viewers of White and Jones may also process their visual elements linguistically, but while also recognizing them as »simulations of the other« that express offensively »inaccurate generalizations« with no relation to »real-world racial signifiers.« Contemporary viewers of Kirby’s Non-Fat, however, likely do not recognize the discursive circle of his nose as a blackface linguistic sign and so instead observe the visual element spatiotemporally, understanding the character to have a diegetically round nose rendered discursively as a circle. If those viewers interpret the round nose as specific to the character and not as a generalized Black trait, then the image spreads no misinformation, defamatory or otherwise. Viewers who do recognize the circle nose as a linguistic sign inherited from blackface conventions—and those viewers include myself and now presumably readers of this essay—will likely both read and observe Non-Fat’s nose.

Conclusion: Reading and Observing

Different drawing conventions tend to trigger different processing. Naturalistic drawing emphasizes techniques that produce spatiotemporal effects without repeating recognizable tokens-types, allowing two images of the same subject to share a close resemblance to their subject without sharing any similar marks. Cartooning, because it both simplifies and exaggerates representational content, is more prone to repeat a set of marks when representing the same subject multiple times. The repeatable marks become not only drawing conventions for an artist but, when recognized as tokens of representational types, they become linguistic signs for viewers. Unlike linguistic signs in other visual contexts though, each set of marks is also spatiotemporal, requiring two kinds of processing.

The comics medium divides accordingly. At one end of the spectrum, a representational work that is drawn naturalistically is processed spatiotemporally only. At the other end, a representational work drawn in a cartoon style that repeats recognizable marks is processed spatiotemporally and linguistically both. In contrast, a line of emojis in a text message is linguistic because each image is a token of a type and their juxtapositions and order are also linguistic. The representational content of each emoji does not exist in a diegetic space and so cannot share a diegetic space with other emoji. Even if a work in the comics medium consisted entirely of linguistic signs, they could still be arranged to produce a shared diegetic space, making them also spatiotemporal images. Comics semiotics does not account for this double quality, and so exclusively »reading« comics misunderstands the medium.

Applying these principles to the initiating challenge of »reading race« reveals two ways to perceive race in a work, the same two ways any sets of marks may be processed. one process is akin to actual reading: resemblance-independent and therefore at least partially arbitrary signs are recognized as tokens of race-denoting types familiar to the viewer. The other is akin to observing actual individuals and recognizing human features that the viewer believes denote race. When viewing comics images, viewers either read and observe simultaneously, or observe only, but never only read because the images include more than linguistic content. When artists render linguistic marks—such as manga or minstrelsy facial norms—only viewers familiar with the drawing conventions may recognize and so read them as tokens. While racist caricatures may be composed of racist linguistic signs, spatiotemporal images that are observed only may still evoke racist stereotypes. Racist imagery is not limited to token-type processing.

While observing generally is unlearned, observing race is learned and inherently ambiguous. observing the race of an actual individual requires two participants: the viewer and the viewed. Spatiotemporal images recreate that two-person impression, even though no actually viewed individual is present. If an artist intends to communicate a drawn character’s race, the means of communication is still nonlinguistic. The artist does not represent race per se but represents an individual whose race the viewer interprets, and the viewer’s interpretation may be correct, incorrect, or indeterminate, and it may or may not match the artist’s intentions.

Reading generally and reading race are both learned and comparatively precise. Reading the race of a drawn character requires multiple actual participants: the reader, the artist-writer, the members of the (mis)read racial group, and a culture of additional individuals structuring the linguistic tradition. The artist-writer is representing race directly by incorporating a linguistic racial sign into a character’s drawn appearance. The racial sign is generic and so not a representation of the character per se but of a racial group. The sign indicates that the character is a member of that group, and while the writer-artist could be using the sign incorrectly, if the reader knows the sign, the reader’s interpretation of the sign is necessarily correct because it is the sign’s established meaning. If the artist-writer has used the linguistic sign according to its convention, and the reader reads the sign according to its convention, then the reader’s interpretation matches the artist-writer’s intention.

While reading and observing apply to many kinds of images, the difference is especially significant for racial depictions. Race is a non-scientific social construction that cannot be determined by appearance—despite its central conceit and synonym, Color, implying otherwise. observing race in spatiotemporal images retains race’s inherent indeterminacy. A viewer may misinterpret a drawn character’s race just as a viewer of an actual person may misinterpret that actual person’s race. Reading race in linguistic images instead reinforces race’s claim of definitive and visually determinable difference by applying socially structured drawing conventions to denote socially structured categories, now twice removed from reality. In short, the precision of reading is anathema to the imprecision of race.



  • Agule, Paul: Liar. In: BrainDen. http://brainden.com/word-illusions.htm#prettyPhoto. 2012. Accessed 30 September 2022.
  • Atkin, Albert: Peirce’s Theory of Signs. In: Stanford Encyclopedia of Philosophy. https://plato.stanford.edu/entries/peirce-semiotics/. 15 November 2010. Accessed 30 September 2022.
  • Berger, John: Media Analysis Techniques. Thousand Oaks: Sage, 2005.
  • Boyd, Jerry: Let Your Soul...Love! In Dingbat Love. Ed. John Morrow. Raleigh: TwoMorrows, 1992.
  • Cadigan, Glen: The Legion of Super-Heroes Companion. Raleigh: TwoMorrows, 2003.
  • Cohn, Neil: The Visual Language of Comics. London: Bloomsbury, 2013.
  • Cohn, Neil: Your Brain on Comics. A Cognitive Model of Visual Narrative Comprehension. In: Topics in Cognitive Science, 12.1 (2019), p. 1–35.
  • Cowling, Sam, and Wesley D. Cray: Philosophy of Comics. London: Bloomsbury, 2022.
  • Eco, Umberto: A Theory of Semiotics. Bloomington: Indiana University Press, 1979.
  • Eisner, Will: The Spirit, June 26, 1943. In: Will Eisner’s The Spirit Archives, 6. New York: DC, 2001.
  • Evanier, Mark: True-Life Divorce. An Introduction. In: Dingbat Love. Ed. John Morrow. Raleigh: TwoMorrows, 1992.
  • Gazi, Jeesham: De/facing Race. Towards a Model for a Universal World Comics. In: Journal of Graphic Novels and Comics, 8.2 (2017), p. 119–138.
  • Grammar. In: Oxford English Dictionary. Oxford: Oxford University Press. 2023. https://www.oed.com/search/dictionary/?scope=Entries&q=grammar. Accessed 6 October 2023.
  • Gregg, Mary: The Unique Depictive Damage of Gombrichian Schemata in Cartoons. In: Philosophia, 51.3 (2023), p. 1309–1331.
  • Groensteen, Thierry: The System of Comics. Trans. Bart Beaty and Nick Nguyen. Jackson: University Press of Mississippi, 2007.
  • Kawashima, Terry: Seeing Faces, Making Races. Challenging Visual Tropes of Racial Difference. In: Meridians, 3.1 (2002), p. 161–190.
  • Kirby, Jack: Dingbat Love. Ed. John Morrow. Raleigh: TwoMorrows, 1992.
  • Kunka, Andrew J.: How Else Could I Have Created a Black Boy in That Era? Racial Caricature and Will Eisner’s Legacy. In: Desegregating Comics. Debating Blackness in the Golden Age of American Comics. Ed. Qiana Whitted. New Brunswick: Rutgers, 2023.
  • Jackendoff, Ray: Parallels and Nonparallels Between Language and Music. In: Music Perception, 26.3 (2009), p. 195–204.
  • King, Ryan D., and Brian D. Johnson: A Punishing Look. Skin Tone and Afrocentric Features in the Halls of Justice. In: American Journal of Sociology, 122.1 (2016), p. 90–124.
  • Kris, Ernst, and Ernst Gombrich: The Principles of Caricature. In: British Journal of Medical Psychology, 17 (1938), p. 319-42. https://gombricharchive.files.wordpress.com/2011/05/ showdoc85.pdf. Accessed 24 August 2023.
  • Lyons, John: Semantics. Cambridge: Cambridge University Press, 1977.
  • Matthews, E. C.: How to Draw Funny Pictures. Chicago: Frederick J. Drake & Co., 1928.
  • Molotiu, Andrei: Cartooning. In: Comics Studies. A Guidebook. Eds. Charles Hatfield and Bart Beaty. New Brunswick: Rutgers University Press, 2020.
  • Pierce, Charles S.: What is a Sign? (1894). In: Marxist Internet Archive. https://www.marxists.org/reference/subject/philosophy/works/us/peirce1.htm. Accessed 24 August 2023.
  • Read. In: Oxford English Dictionary. Oxford: Oxford University Press. 2023. https://www. oed.com/search/dictionary/?scope=Entries&q=read. Accessed 6 October 2023.
  • Sammond, Nicholas: Birth of an Industry. Blackface Minstrelsy and the Rise of American Animation. Durham: Duke University Press, 2015.
  • Saussure, Ferdinand de: A Course in General Linguistics. Trans. Wade Baskin. New York: Columbia University Press, 1959.
  • Showcase Presents Legion of Super-Heroes 5. New York: DC, 2014.
  • Smolderen, Thierry: The origins of Comics. From William Hogarth to Winsor McCay. Trans. Bart Beaty and Nick Ngyen. Jackson: University Press of Mississippi, 2009.
  • Spiegelman, Art: Drawing Blood: Outrageous Cartoons and the Art of Outrage. In: Harper’s Magazine (June 2006).
  • Steibel, Rob: Dingbats Part 2: White-out? In: KirbyMuseum.org. https://kirbymuseum.org/blogs/dynamics/category/uncategorized/page/104/. 30 March 2011. Accessed 30 September 2022.
  • Stein, Daniel: Racialines. Interrogating Stereotypes in Comics. In: The Cambridge Companion to Comics. Ed. Maaheen Ahmed. Cambridge: Cambridge University Press, 2023.
  • Strömberg, Fredrick: Black Images in the Comics. A Visual History. Seattle: Fantagraphics, 2003.
  • Strömberg, Fredrick: Comics and the Middle East. Representation, Accommodation, Integration. Malmö: Malmö University, 2022. Syntax. In: Oxford English Dictionary. Oxford: Oxford University Press. 2023. https://www.oed.com/search/dictionary/?scope=Entries&q=syntax. Accessed 6 October 2023.
  • Wanzo, Rebecca: The Content of our Caricature. African American Comic Art and Political Belonging. New York: New York University Press, 2020.
  • Wertham, Fredric. Seduction of the Innocent. New York: Rinehart, 1954.
  • Wilde, Lukas R. A.: Material Conditions and Semiotic Affordances: Natsume Fusanosuke’s Many Fascinations with the Lines of Manga. In: Mechademia 12.2 (2020), p. 62-82.
  • Willats, John: Optical Laws or Symbolic Rules? The Dual Nature of Pictorial Systems. In: Looking Into Pictures. An Interdisciplinary Approach to Pictorial Space. Cambridge: MIT Press, 2003.
  • Witek, Joseph: The Arrow and the Grid. In: A Comics Studies Reader. Eds. Jeet Heer and Kent Worcester. Jackson: University Press of Mississippi, 2008.
  • Wojtkoski, Charles Nicholas (P): Young Allies #1 (June 1941). In: Marvel Firsts: WWII Superheroes. New York: Marvel, 2013.
  • Yakina, Halina Sendera Mohd., and Andreas Totua: The Semiotic Perspectives of Peirce and Saussure: A Brief Comparative Study. In: Procedia. Social and Behavioral Sciences, 155 (2014), p. 4–8.
  • Zacks, Jeffrey M., and Barbara Tversky: Event Structure in Perception and Conception. In: Psychological Bulletin, 127.1 (2001), p. 3–21.
  • Zaidi, Arslan A. et al.: Investigating the Case of Human Nose Shape and Climate Adaptation. In: PLoS Genetics, 14.1 (2017). https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1006616. 16 March 2017. Accessed 30 September 2022.

Table of Figures

  • Figure 1: Grell, Mike (P): Showcase Presents Legion of Super-Heroes. Volume 5. New York: DC, 2014, p. 434. Kirby, Jack (P): Dingbat Love. Ed. John Morrow. Raleigh: TwoMorrows, 1992, p. 112. Kirby, Jack (P) and Vince Colletta (I): Dingbat Love. Ed. John Morrow. Raleigh: TwoMorrows, 1992, p. 98.
  • Figure 2: Chris Gavaler.
  • Figure 3: Agule, Paul: Liar. In: BrainDen. http://brainden.com/word-illusions.htm#pretty Photo. 2012. Accessed 30 September 2022. Eisner, Will: The Spirit, June 26, 1943. In: Will Eisner’s The Spirit Archives 6. New York: DC, 2001. See also: https://www.comics.org/issue/272645/. Accessed 7 October 2023.
  • Figure 4: Chris Gavaler, based on Strömberg, Fredrick: Black Images in the Comics: A Visual History. Seattle: Fantagraphics, 2003.