Please choose  from the categories below
Cuneiform Signs

Analysis and reports to support an international standard for computer encoding of the Cuneiform writing system

Research on the development of Cuneiform signs

Notes for verbal presentation to UTC meeting, 3 February 2004 accompanying the paper
"Fitting Cuneiform Coding to Cuneiform Script" (doc. L2/04–041). This is UTC Doc. No. L2/04–086 revised
Lloyd Anderson, Ecological Linguistics, PO Box 15156, Wash. DC 20003 ecoling@aol.com , (202) 547-7683

1. Importance of Cuneiform: foundations of western civilization; both the invention of writing, and the paradox (if you will) that it was used to support beaurocracy and complete control of lives of people

2. Optimism on encoding, some movement towards convergence of views:
(a) Only a PORTION of older signs need to be encoded now – those which are secure, not the others
(b) Some flexibility is now manifest that the time boundary is not as sharp as it was previously

3. Numbers: (see now document on Number Signs Unification, please click here)
(a) Almost all archaic numbers are already included, it would be a shame to prevent use for the older
period by omitting just a few.
(b) Only decision in-principle is diacritic vs. single-sign listing for the "texture-marked" signs
(c) The statement at end of N2698 section 3.7 "will need to be revisited" should be "can be revisited … if warranted"

4. Why consider the full range of cuneiform now, not later, and why encode all securely distinguishable signs?
(a) Because the criteria in N2664(R) and WG2 N2698 for what signs to encode yield unstable results depending on inclusion or exclusion of particular stages, even within the more limited time range favored by some. The sign UMBIN 'talon' illustrated below from Labat [see Labat #92b] is historically not a sequence of GAD, TAG4, and DU. That analysis is possibly valid only for the NeoAssyrian times (upper right). What Labat calls "Classical Sumerian" shows the sign originated as something like "leg" or "talon" x TAG4 (5 clear examples are cited for Fara under LAK#289), also appearing similar to a combination of two components GAD and DU x TAG4 (4 clear Fara examples). Also in Old and Middle Babylonian the sign does not look like a sequence. Even for NeoAssyrian, the question remains open whether this is a single sign made up of three apparent components, or a sequence of three signs written closely together. Since the single-sign analysis is clearly proven, there is as yet no demonstrated need for a splitting into a sequence of signs. Han CJK characters show examples of similar changes of internal component structure while the identity of the single sign remains constant through time. Fragmenting single signs into sequences which disregard this normal type of development will break many instances of simple linear historical descent of single signs. There is simply no need for such counter-productive false analyses.

[illustration here]

(b) Because omission of needed signs will lead some users to wrong encodings (example just above)
(c) Because the entirety is "manageable" now. Triage of ZATU and use of lexical lists and scholarly reviews and articles, and at least unification where possible with secure parts of other older lists (ATU, UET II, KWU, LAK, Recueil) will be complete during February. "Comprehensiveness" is not aimed for in the sense of encoding anything that anyone has thought might be a sign, no matter how insecure. There is no need for the division into stages (N2698 section 3.1), and no need for the statement "Archaic may require an entire separate block." Additional signs, yes, but no principled separation has been justified in any way.

(d) [Added note after UTC: The claim that the position advocated here implies encoding ENSI as a single sign was careless, negligent and false, both as to the particular example, and as to the generalization intended, that it implies encoding as single signs of what everyone knows are combinations or sequences of separate signs. Negligent in not checking first. False because documents long been in the hands of the person making the statement specifically marked ENSI as not a single sign. False because the general position advocated here, taking the long scholarly tradition as the default guide until individual exceptions are made, means that ENSI would by default not be regarded as a single sign (neither Labat nor Borger lists ENSI as a single sign).

5. Why use the combination of the long scholarly tradition (sign lists) as the default criterion for what are the signs of cuneiform --
(a) Because that tradition is much more stable, reflects many kinds of knowledge accumulated over generations. It does not rely on only a single criterion in disregard of many other criteria.

(b) Because that tradition is confirmed by what happens where typography allows a distinction
between spacing between signs and spacing within signs (between components of a single sign).
Evidence from typography of Gudea statues and Codex Hammurabi confirm each other. Although the single sign U is written with much space around it as in Hammurabi Prologue xv.61, the component U as part of the sign UL is never so separated even when enormous space is available as in Hammurabi Prologue vii.6, and UL is not broken across end of line ("indent") even when that could yield more spacious typography, as it could have in Hammurabi Prologue vii.24. UL simply is not "U.GU4" even if it may have arisen etymologically that way. Recognizing components via a name U x GU4 is more accurate -- please see http://www.CuneiformSigns.org/ContainerTypes.htm

[Illustrations here for Codex Hammurabi lines cited just above]

(c) Because the scholarly tradition confirms the decisions made in correcting N2664 to N2664R (or WG2 N2698), and suggests a limited number of further additions, not additions without any stopping point as claimed in the revised proposal. Typography where there is enough extra space to see distinctions also confirms that scholarly tradition. The signs UL, AR, U3, and a number of others are thus confirmed as truly single signs, not merely exceptions noted to keep cuneiform professionals comfortable because they were in the basic school syllabary of Mesopotamia. In a reasonably careful survey of the Gudea statues and the Codex Hammurabi, I have found not a single occurrence of the sign AR as two signs IGI and RI separated by the kind of space which typically does separate independent signs. The tiny amount of space between components IGI and RI in the second example following does not signal a division of signs. In a line with so much extra space overall, only a very wide space could signal that. In addition, the two components, if they were that, were even more fused at the stage of Gudea (one wedge entirely lost).

[Illustrations here for Codex Hammurabi lines cited just above]

(d) Justification works quite normally by default in lines of cuneiform text which have enough space to show its effects, if we encode as single signs what the scholarly tradition normally treats as single signs. The sign EL, now regarded by all as a single sign, has more space around it and is more stretched in the Codex Hammurabi Prologue viii.60 than it is in viii.57 (note the different width of the single remaining sign at the end of each of these lines), but it does not lose the connection between its component parts. This reflects its acknowledged status as a single sign.

[Illustrations here for Codex Hammurabi lines cited just above]

(EL is tymologically perhaps from SAL followed by SI or something which early came to look like SI.)

(e) Because the scholarly tradition conforms to what linguistic analysis shows to be the productive units of cuneiform script. The most recent revision of N2664R is improved by including a series of the type TUR3 x SAL as single signs, because treating them as a sequence NUN followed by "LAGAR x SAL" would require encoding a list of signs "LAGAR x SAL" with varying infixed components, signs which do not exist independently. But the revision does not include the sign TUR3 itself (which is demonstrably the container in this sign of the type "container x infix"). Rather it still would force users into an incorrect encoding of TUR3 as NUN followed by LAGAR instead of the correct NUN x LAGAR proven by what Labat calls "Classical Sumerian" (2nd column in the next illustration) and by the archaic forms (1st column). That is to say, TUR3 x SAL is structurally (NUN x LAGAR) x SAL, not NUN.(LAGAR x SAL). The correct linguistic analysis identifying the combinatorially productive components of such signs thus confirms the conclusion drawn in the traditional scholarship recognizing TUR3 as a single sign. For a survey of some of these sets of signs and which ones have unambiguous linguistic analyses of their components, please see the web page http://www.CuneiformSigns.org/SignOverSign.htm . Please also note that the CDLI use of the notation "SIGN x SIGN" using "x" does not refer only to infixation into a completely surrounding container component, even if that is a salient subset of the "SIGN x SIGN" types. Rather, it refers also to components overlapping. I am exploring lineal historical ancestors and descendants of signs in the core of the "SIGN x SIGN" type, to discover the extent of the group, which strong linkages of components are retained across history, which not.

[Illustrations here for Labat signs mentioned, his numbers 87, 87a, 87b]

6. Particular notes on WG2 N2698, on page 7 unless otherwise noted:
[I reaffirm the appropriateness of the comparisons with Han CJK characters, the contrast between Sign and Component of Character being clearer there. Also the appropriateness of the comparisons with Latin "w" which we do not encode as the sequence "vv" with ligaturing, and which is not a ligature even though it arose that way. And the comparison with the Danish letter "æ", which is not a ligature even though it arose that way, and which we do not encode as the sequence "ae" with ligaturing. Please see the illustrations of this in paper L2/04-041. There is nothing misleading about these comparisons. This is indeed the issue for Cuneiform as well.]

Section 3.5 on TA* -- this is a special case under 3.3 Mergers and Splits, a late split. The phrase "to be used only when the new interpretation applies" is an implementation question, odd in this context. Fuller statements on implementation are needed, at least pointing to alternatives. In some cases the opposite decision may be made, as plausibly with the signs which later merged as KU, to name complex signs containing them, and to encode them, as their original forms making the distinction at all times, so that historical continuity is preserved. This may or may not be the chosen implementation, but the question does exist. It would be wise to discuss it before a final encoding proposal.

Sentence ending section 3.4 "In general, signs which shift from compound to complex or vice versa have been treated according to their Ur III manifestation." This contradicts section 3.3 on mergers and splits. By that principle, the status as single sign is to be encoded if it occurs at any time, including later, not merely earlier.

Under "complex signs" towards end of section 3.4, the phrase "which present a relative visual unity" applies equally to some signs which are by the proposers regarded as "compound signs". This is one source of the instability of criteria, the subjectivity of these judgements. Notice the slight space in one vs. the other example of AR above (called "IGI.RI"). That tiny difference in white space is irrelevant. Another source of instability of criteria arises through omitting clearly unitary or complex signs which are the lineal descendents or ancestors of ones incorrectly rejected as "compound". Evidence from Gudea and Hammurabi texts in lines whose typography has enough white space to make distinctions applies precisely to this issue of "visual unity". Ancient scribes treated many of these as single signs, by these typographic criteria.

Under "compound signs" middle of section 3.4, the phrase "generally treated by scholars as a single unit" is dangerously ambiguous about the kind of unit -- text unit (sequence of signs) or single sign? This makes it rather explicit that proposal WG2 N3698 is proposing to encode "wedge clusters" (components) as if they were full signs if a full sign exists which is similar . This is at the very least an encoding of components, not of signs. And not a consistent encoding of components, as discussed from various other points of view.

Section 3.9 Character order
This is actually an encoding by components, but it has mixed with it a renaming of many signs away from their traditional names, such as MUL (for constellations etc.) rather than the radically new "AN OVER AN AN" for a cluster of three AN components ('star'). These two issues can be separated. We can support the encoding by components, even by alphabetical order of the most usual naming of components, without any need to discard traditional names.

[Added note after the UTC meeting. There is a frequent assumption that the native tradition of the "DIRI" lists is in fact a list of compounds, or sequences of signs. That may not be correct. It may rather be a listing of "readings" which do not directly reflect components, whether those components are merely components of a sign or are a sequence of components which are themselves independent signs. Or the "DIRI" list may be a mixture of more than one structural category. In any case, native traditions like all meta-discussion about language and writing is statistically a less reliable witness to the structural units of language or writing than is study of the actual behavior of units in language or writing. More on these questions later.]

Results of a first-Level Triage on the first 200 signs in ZATU (Uruk period)
The remainder of ZATU will be completed to probably better than this level of validity by mid-February. Additional checking with the Lexical Lists and with reviews of ZATU and the lexical lists and with a number of scholarly articles will be completed before the end of February.

No additional encoded characters are needed (51% of the first 200)
a) 68 unifiable with later signs
b) 38 insecure
Additional encoded characters are needed (49% of the first 200)
c) 83 complex, known parts or types of modification, not unifiable
d) 17 simple (new components = signs)

Balancing of Criteria
This balancing will be both refined and made more explicit as the triage progresses. This is merely a sample to show that balancing has in fact occurred. What we have here is the approximate statistical relation of number of occurrences to presence in lexical lists or not, and to the four categories distinguished above. What this shows is that the recognizability of component parts of complex signs makes their inclusion on average more secure with a smaller number of attestations than for those signs having unique or new single components. Also that among those signs listed as "unidentified" (not unifiable with later signs), the weighting among those proposed for encoding after this first level triage is strongly towards those attested in higher frequencies. No criteria stand alone as determinative, rather a combination of all known factors bear on triage. Especially at the frontiers, many criteria are of course much more subtle than merely whether some experts have chosen to include a form in a list of "signs". Later updates of this triage will appear by mid-February 2004 at the web site http://www.CuneiformSigns.org/ZATUSignTriage.htm

Statistics, numbers of signs from different sources and with different frequencies in administrative texts:

C. Complex with known parts, not in lexical lists (note the large proportion with attestation in ATU or elsewhere)
Other 9, ATU 6, 1x 8, 2x 7, 3x 6, 7x 1, 10x 1

C. Complex with known parts, in lexical lists (note the large proportion with only one occurrence)
1x 27, 2x 3, 3x 5, 5x 3, 8x 1, 9x 2, 19x 1, 29x 1, 40x 1, 60x 1, 100x 1

D. Simple signs (single component), not in lexical lists (note the larger proportion with many occurrences)
2x 2, 5x 1, 12x 1, 28x 1, 34x 1

D. Simple signs (single component), in lexical lists (note the larger proportion with many occurrences)
1x 1, 2x 1, 9x 2, 10x 1, 12x 1, 15x 1, 17x 1, 23x 1, 72x 1

"Unidentified" signs (ZATU numbers 620 to 762) (note the especially large proportion with many occurrences)
ATU 4, 1x 11, 2x 3, 3x 2, 4x 1, 6x 4, 7x 1, 8x 1, 9x 6, 10x 1, 11x 1, 12x 1, 13x 1, 14x 2, 15x 2, 16x 2, 17x 2

20x 1, 21x 1, 28x 1, 30x 1, 45x 1, 48x 1, 59x 1

A correction note to the version of this partial first-level triage previously posted on web site (and in paper for UTC): What had been references to the "Professions" list should instead read "Lu" list. That correction has been made on the web page.