Please choose  from the categories below
Cuneiform Signs

Analysis and reports to support an international standard for computer encoding of the Cuneiform writing system

Research on the development of Cuneiform signs

 

Single Signs with Two Components vs. a Sign Sequence?

This page outlines some aspects of the problem of distinguishing between a sequence of two signs SIGN-1 followed by independent SIGN-2, and a sequence of two components COMPONENT-1 and COMPONENT-2 within a single sign. Some of these same signs can be shown to need a distinctive encoding also via other kinds of evidence.

For Han (Chinese-Japanese-Korean) characters this is easy, since single signs always occupy a single square block. The sign U+67F4 (Japanese readings SHI, SAI, shiba, 'brushwood, firewood') is composed of the complex character (two components) U+6B64 (Japanese readings SHI, ko(no) 'this, current') and the semantic radical U+6728 (Japanese reading ki 'tree'). But Unicode does not divide between the first and second halves of this character, it is encoded as a single character.

In the Cuneiform situation, we do not have the luxury of constant square blocks to help us identify what are characters, and characters are of variable widths. There are however other reflections of status as single characters vs. as sequences of characters. We can even take certain aspects of traditional sign lists as a default, since the compilers of those lists made distinctions between what they treated as single characters vs. sequences of characters. We certainly need not follow those slavishly, and we may also detect some cases in which sign sequences are given single identifiers such as a number in a list. Such a list numbering is not by itself determinative. But it is also misguided to disregard the evidence traditional sources may be pointing to. They often presuppose things they do not say, or that the compilers were not even conscious of. Unconscious usage is often more systematic and reliable than conscious theories about usage. A well-established fact. We should be very careful before disregarding traditional authorities.

For Cuneiform, there are in addition multiple "readings", some of them single-word (morpheme) readings of what really are sequences of signs, not a single sign. So the fact that we have both single-morpheme and multi-morpheme readings for the same sign does not help us, unless we have some more detailed reasoning.

 

Some History of Discussion

A group meeting under the label ICE did suggest splitting things which are named in the format "SIGN.SIGN". There are certainly examples to confirm this as a good choice, since true sequences of signs are named that way. But single signs are also named that way.

When a first draft came back, the same individuals felt that some obvious traditional signs had been excluded. One or two of these are part of the very common syllabary, and it was thought a good idea to encode these as distinct characters even if they did not fit the generalization about names "SIGN.SIGN". So they were treated as exceptions included for a practical reason.

But the need to pull back signals something much larger. What are indubitably single signs are also named "SIGN.SIGN". The exceptions mentioned above may be the tip of the iceberg that we had an over-generalization. So we may perhaps come back in part to the traditional lists (the specified exceptions were already doing that in a tiny way). WHICH of signs named this way, or which can be named this way, should go by the pattern splitting "SIGN.SIGN", and which should be encoded as single signs with multiple components? The pullback may signal us that we have not divided sufficiently among the various things named as SIGN.SIGN, since some of them are merely "COMPONENT.COMPONENT". Obviously the criterion did not mean that we should split anything which we *could conceivably name* as SIGN.SIGN, since a large number of signs commonly acknowledged as single signs can also be named that way (even KI *could* be named as "U.KU"), and some are normally named that way.

I have to admit that I was merely worried about this earlier, but with increasing immersion in the actual vocabulary of signs, and considering implementation difficulties (something ICE did not discuss), I have come to realize this is a much bigger problem.

 

More Subtle Evidence

So we need to go to more subtle kinds of evidence for Cuneiform sign identification. We do not have the luxury of the constant block size which delimits single Han characters. The more subtle evidence can be in spacing or in line breaks. Any tendency of the combination of components to avoid splitting across line breaks, in any scribal traditions, would suggest we may have a single sign in that scribal tradition.

As an analog, consider the English sequence "th" in "hothouse". Clearly two separate letters. In "brother", although the "th" is pronounced as a single sound, it is still two letters, spaced the same as in "hothouse". However, if contrary to the present reality we had a ligature of "t" and "h" and that ligature were used *only* when the pronunciation is as in "brother" rather than as in "hothouse", we would have the linkage of form and function which would establish an independent character (whether or not the sequence "th" retained one or two pronunciations). We do have examples of both of these types. The two uses of present-day "th" *can* of course be hyphenated differently, based on a list of words.

A kind of evidence which can confirm such distinctions but is rarely going to be available is a contrast between two signs which simply happen to occur one after another in a single sentence, and a combination of two components which are a single sign (not merely supporting an idiomatic single-word reading for the sequence of signs). This sort of contrast is very difficult for specialists to even remember occurrences of, it is like a gestalt-shift in the classical image of an hourglass vs. two faces. We simply don't mentally catalog these together. But some specialists may remember such cases.
(Linguists have a term "minimal pairs" which applies among other things also to this sort of case. Minimal pairs are sufficient, but not necessary to have in all cases to establish the distinctive status of contrasting units in a communication system.)

 

Extreme vs. Normal Usage

It is a given for assyriologists that there do exist extreme treatments where even single signs can be split into glyphs. We need to distinguish these extremes from normal usage, from usage where spacing does matter in the original documents, or in texts produced on computer for easier reading. This matter can be addressed elsewhere, suffice it to say that a special kind of hyphenation algorithm can handle the split of a single cuneiform sign across a line break into glyphic fragments.

For the possibility of analysis as Kerning of two independent signs, please see the discussion of IGI, SAL, and RU (click here).