In claiming that the graphic form (especially in sequence) is structured as a language, it might help to parse out how to make the analogy reasonably. It's not as if no one has ever noted the similarities between the forms — in fact, it's fairly common. However, mistakes are often made in my opinion.
First, I assume that there is "Equivalence" between the different modalities, which can be summarized as "the expectation that the human mind/brain treats all modalities in an equal way, given modality specific constraints."
By this account, it essentially means that we would expect all modalities to feature the same sort of storage in memory for patterned signs of differing sizes and functions (from sound patterns of phonemes to sentence patterns of idioms) and feature ways to combine all those elements at different levels of structure (phonology through discourse). We would also expect development to be similar, with a critical learning period and drop offs after that.
However, this is not the take that most comparisons of the verbals and visual forms take. Rather, they often try to make direct superficial analogies between specific types of structures. For example, "such and such" is the equivalent of a "word" or "sentence." This is often why many want to claim that single images have "grammar" — because a single image has lots of information in it, like a "sentence" and unlike a "word" — even though composition within single images behaves nothing like a grammar. (...nor should we expect it to given the differences between sound and light!)
A similar endeavor has tried to find "minimal units" of the structure of the forms, following the school of Structuralism (most popular in American linguistics from around 1920-1960ish). However, again, just knowing minimal units doesn't tell you about the broader structure, and units larger than minimal units might also be useful and insightful. It also gives no beneficial comparison other than that "minimal units" exist in both domains.
All of this is an argument for looking beyond the superficial understandings of "language" and to look for comparisons in deeper, more fundamental aspects of structuring.