Title: Unicode for ten and eleven?
Shaun - May 26, 2008 06:30 PM (GMT)
Has anyone applied to have Unicode codes assigned to the numerals for ten and eleven?
The following page might be useful:
http://evertype.com/standards/csur/
uaxuctum - May 27, 2008 09:14 PM (GMT)
| QUOTE (Shaun @ May 26 2008, 06:30 PM) |
Has anyone applied to have Unicode codes assigned to the numerals for ten and eleven?
The following page might be useful: http://evertype.com/standards/csur/ |
I think I remember reading Michael Everson saying something somewhere about the dozenal digits—somewhere in the early discussions of Unicode (possibly at some Usenet newsgroup, but I can't remember now). They've certainly already discussed at length the issue of hexadecimal digits, although this was a different matter altogether since all proposals I know of for encoding hex digits were aimed at encoding the standard A to F symbols used in computing, and encoding Latin letters separately for their use as digits was outright rejected (even though each Latin letter is already encoded several times in different blocks, and Roman letter-numerals have their own codes, so personally I'm not really convinced by their arguments). Quite another thing would be to encode a set of transdecimal digits that are not merely Latin letters doubling as digits; for example, Pitman's digits.
In fact, I think Pitman's digits have all the right to be assigned their own codepoints in Unicode. They have a venerable-enough origin and have already been used to a sizeable extent in real life; not only by Pitman himself in his writings, but also for example in the publications of the DSGB, and the need to have them properly encoded has already been felt by the dozenal community—this forum (on which we have to rely on smiley-image insertion to be able to display them, with awkward and quite poor results), or the DSGB website (where documentation containing them has to be provided through manually-crafted PDFs instead of simple webpage text), or the impossibility of using them in e-mail communications (where the only choice is to use the unsatisfying, makeshift solution of hex-style letter-digits), being some examples of this real-life need. It's important to note that this need to have them encoded is completely independent of the neverending internal discussions within the dozenal community regarding personal preferences on what set of dozenal digits should eventually prevail. Pitman's digits have an established history of use as dozenal digits in real life (a variety of published materials by different people), being the only thing anywhere close to "standard" transdecimal non-letterlike non-punctuationlike digits, which is what really matters for inclusion in Unicode, so it doesn't matter what set of transdecimal digits the dozenal community at large finally decides to settle on in the future (if ever)—just like obsolete characters such as the long S that are no longer in current use are nonetheless encoded since, at the very least, they are needed for the proper digitalization of a range of older documents containing them.
Regarding an actual application to the Unicode Consortium proposing the inclusion of Pitman's digits in their standard, anyone can submit it and I've already thought about submitting it myself, but I think the DSGB, as the most notable current real-life user of these typographical symbols and the keeper of the largest collection of printed materials using them, would be the most appropriate to do so. Who else can provide better evidence that these symbols have been (and still are) in actual use, that they are not just someone's passing whim or private invention, and that there is a community of users with a real practical intercommunication need to have them properly encoded in a standard character set, so that we can at last easily use them on webpages, e-mails, forum posts, wordprocessor documents, digitized versions of DSGB journals, etc. without having to resort to clumsy adhoc workarounds?
Note, however, that the referenced webpage (the "ConScript Unicode Registry"), even though maintained by someone (M. Everson) closely affiliated with Unicode, doesn't actually constitute a part of the Unicode Standard, but is merely a coordination tool for conlangers so as not to step on fellows' toes when choosing codepoints from the Private Use Area (*) to encode their own conscripts, so that fonts and texts in their private scripts can be shared with the rest of the conlanging community more easily. However, this "Registry" is by no means binding nor official and anyone can perfectly use those codepoints for whatever other private use they see fit.
I think Pitman's digits should be encoded in the main, "public" area, because they are not merely someone's private-script characters, but characters that appear in the real-life publications of a community of users and are potentially of general interest. Not to mention they're just two additional characters, so they wouldn't take up too much of "codepoint real estate" and could be assigned, for example, a pair of the fourteen currently empty slots in the Number Forms block. One suggestion might be U+2150 for Pitman's digit ten [rotated 2] and U+2151 for Pitman's digit eleven [horizontally flipped 3] (as well as reserving the adjacent U+2152 for a possible future addition of a "neo-Pitman's" digit twelve [which could be a vertically flipped 4, for example], which among other things is necessary for dozenal-based positional ordinals (**) ). Since they are based on existing digit shapes, adding Pitman's digits to existing Unicode fonts once they get codepoints assigned, would be really very easy (it can even be done at home using simple character-retouching software).
(*) A region of the Basic Multilingual Plane, from U+E000 to U+F8FF, whose 6400 codepoints the Unicode Standard deliberately leaves permanently undefined, reserving them for private uses. Too many of them, I might add, since they are taking up a very significant chunk (over 10%) of "premium" codepoint real estate.
(**) Positional ordinals (a general mathematical concept not restricted nor particularly linked to dozenal) is a topic I've already talked about on a different, Spanish-speaking forum, and I'm planning to start a thread about them on this one to introduce them to the dozenal audience (since as far as I know this simple and really intuitive concept has incredibly been ignored by mathematicians, even though in a certain but subconscious and neglected way it is already in use and, if properly implemented, it could, among other things, elegantly solve once and for all a long-standing calendrical nuisance).
icarus - May 28, 2008 12:31 PM (GMT)
Folks,
This will be a topic in the DSA October general meeting. I'd propose that the Pitman and "Dwiggins" or DSA-Classic transdecimal digits be assigned codepoints. I've dealt with the 60+ years of the Duodecimal Bulletin rather closely in the past year; the "Dwiggins" chi-like "dek" and the flat-bottomed rotated "3" "el" have been in rather continuous use in the publication since the mid 1940s. The dek and el, as conceived in the Duodecimal Bulletin, aired in 1973 on US Saturday morning network television on the "Schoolhouse Rock!" "Little Twelvetoes". The gap in the usage of the "Dwiggins" dek and el in the Bulletin is attributable to the layout and composition process of 1980s and 1990s Bulletins. (The classic transdecimals will be restored in Volume 49; Number 2 to be published in September 2008).
William Addison Dwiggins was a notable graphic designer and typographer.
I think both the DSA and DSGB characters merit codepoints. Obtaining a codepoint in the mathematical area may be more challenging than in the "conlang" area, since the dozenal transdecimals are not well-established among general mathematicians.
Once the Duodecimal Bulletin has been fully digitally archived (we are 5/6 of the way through that process), Volume 49; number 2 is laid out and to press (August 2008), I am willing to collaborate with anyone interested in this Unicode issue.
In the meantime the Duodecimal Bulletin can support any symbology now, so this is no longer an impediment to widespread application of symbols to dozenal number. Any table or figure produced for the Bulletin can be converted between DSA-Classic and Pitman, or any other symbology by changing fonts, provided fonts are created for these symbologies in a certain way. We will produce fonts for any submitted paper that would be published in the Bulletin, to properly display the symbols the author desires to use in that paper. We may offer DSA Members the production of custom fonts for their symbols (this is still in the works.) This is an attempt to get symbology "out of the way" of the expression of dozenal thought and expression. Once a sufficient body of symbols are published, and given practical applications for the symbols, the market will decide which symbols will be used for dozenal and any "transdecimal" numerals. That said, the 60+ year usage of "Dwiggins" and the Pitman numerals currently merit codepoints as these have de facto usage and notable historical creators.
Shaun - May 28, 2008 01:24 PM (GMT)
Fonts - I adapted the Palatino font to allow for our ten and eleven, and also for negative digits as used in the reverse notation.
uaxuctum - May 28, 2008 02:13 PM (GMT)
| QUOTE (icarus @ May 28 2008, 12:31 PM) |
| Obtaining a codepoint in the mathematical area may be more challenging than in the "conlang" area, since the dozenal transdecimals are not well-established among general mathematicians. |
They are well-established in published materials among those interested in the field of duodecimal arithmetic; that is, they are well-established among the community of those concerned. Mathematicians who have never cared a single bit about dozenal should not be permitted to have a veto or to influence the decision by the sheer amount of their numbers; just like, for example, people who do not, have never and could never be bothered to care about the use of the Cherokee script—people who incidentally amount to the immense majority of humanity and could thus easily crush any attempt by the Cherokees to have their minority script encoded—should not be allowed to have a say as to whether the Cherokee syllable-letters should or shouldn't be in Unicode (they're already in, by the way; I've just used this example for the sake of argument).
Unicode already encodes things like the interrobang, which is not established in the punctuation conventions of
any language, and whose only history of use was restricted to a passing fad decades ago. What's more, Unicode has even approved the encoding of the inverted interrobang (the "gnaborretni"), whose only ever conceivable use would be in Spanish (plus a few minor neighboring languages like Galician and Asturian), a language with a different established orthographical convention for what these non-standard punctuation symbols could ever be of use (in Spanish you can combine an opening interrogation mark with a closing exclamation mark or viceversa to express what the gnaborretni-interrobang pair would express, and this usage, although not very common nowadays, is officially sanctioned in the
Diccionario Panhispánico de Dudas). So if the gnaborretni got in, whose inclusion might be rightfully criticised as completely ludicrous, the duodecimal digits have all the right to be in.
| QUOTE (icarus @ May 28 2008, 12:31 PM) |
| I'd propose that the Pitman and "Dwiggins" or DSA-Classic transdecimal digits be assigned codepoints. [...] the "Dwiggins" chi-like "dek" and the flat-bottomed rotated "3" "el" have been in rather continuous use in the publication since the mid 1940s. |
The problem I see here is whether the Unicode Consortium would accept the separate encoding of both sets. Typographically, these two sets combined could amount to just three distinct graphemes, since the rotated banker's 3 and the rotated or flipped 3 are mere glyphic variants of graphemically the same thing with the same meaning (just like a banker's 3 and a plain 3 are stylistic variants not encoded separately). Since the stated purpose of Unicode is to encode graphemes rather than glyphs, they could argue that the Pitman's and Dwiggins' digit sets are not disjunct, and decide to assign the same codepoint to both variant forms of digit eleven.
If the Unicode Consortium have gotten it their way with the much criticized Han Unification, we wouldn't have many arguments before them for a separate encoding of Pitman's and Dwiggins' elevens. Although they themselves have repeatedly compromised their stated intentions and, very unlike what they did with Chinese characters, they have separate encodings for the glyph variants of Eastern Arabic digits that are associated with different languages/regions (general vs. Persian/Urdu style). In fact, in this case they have incredibly chosen to encode the whole set of digits twice, even though the variants only affect three of the ten digits (the shapes of 6 and especially of 4 are noticeably different in each set, while the shape of 5 is only minimally divergent; all the others are exactly the same).
A much stronger objection is very likely to be raised against the separate encoding of any of the symbols the DSA has used for ten, since their shape is either just an X (U+0058 <LATIN CAPITAL LETTER X>, U+2169 <ROMAN NUMERAL TEN>, or U+03A7 <GREEK CAPITAL LETTER CHI>), an asterisk, or a struck-through X which can be readily obtained as a compound character using U+0335 <COMBINING SHORT STROKE OVERLAY> or U+0336 <COMBINING LONG STROKE OVERLAY> (or alternatively by using the appropriate typesetting tags for strike-through on HTML webpages and wordprocessor documents). The simple or slightly altered X shape is especially unlike to get encoded, since it will be difficult to convince them that subtly adorning an X with curved ends to try to differentiate its usage as a letter from its usage as a letter-numeral (both usages already separately encoded, since the latter is available as <ROMAN NUMERAL TEN>) amounts to much more than glyph decoration.
On the other hand, some might argue for a conceptual encoding of transdecimal digits (i.e., assigning codepoints simply to <TRANSDECIMAL DIGIT TEN> and <TRANSDECIMAL DIGIT ELEVEN> without implying any particular shape of those digits), so as to avoid the controversial issue of competing transdecimal digit sets by leaving the choice entirely to font designers. This would, however, conflict with the fact that Unicode already encodes several different sets of decimal digits separately (European, Eastern Arabic, Devanagari, etc.) rather than encoding them as abstract mathematical symbols independent of their particular appearances in different scripts.
icarus - May 29, 2008 06:06 PM (GMT)
two things - am on road. One, if we can get provisional codespaces for transdecimal, why stop at transdecimal eleven? Also, perhaps Dsa and dsgb should indeed decide once & for all re: digits. I know mr. Whillock was highly interested in that. This way, both publications would employ same symbols. As editor of Bulletin I am not necessarily wedded to dwiggins nor pitman; both are deficient. I guess in this way perfect is enemy of good. There is significant contingent in dsa which supports classic dsa numerals. I know dsgb stands behind pitman.
I guess this standoff points to reserving codepoints but not defining form.
Shaun - May 29, 2008 06:34 PM (GMT)
The key digit is that for "ten".
Both DSA and DSGB have a form of E or 3 and the precise shape - flat bottomed or round - is not that important if the basic shape for the glyph is acceptable.
If we can get agreement on a symbol for "ten" ... We've had plenty of ideas posted on this forum; can we select one of them that will do both for DSA and DSGB?
After all, we've only been debating the issue for *40 years or so ...
Shaun - May 29, 2008 06:38 PM (GMT)
Unicode - see the topic started by Endi in this forum.
uaxuctum - May 29, 2008 07:27 PM (GMT)
As I've explained before, whether or not the DSGB and DSA finally decide on a consensus character for ten (a totally new one, or an already-used one), it shouldn't matter as for the need to encode the "classic" Pitman's and DSA's characters, because these are necessary at least for the purpose of archival digitalization of historical publications where those characters were extensively used.
As I see it, Pitman's set is the most readily acceptable for encoding. Unlike some of DSA's choices, Pitman's digits easily look like digits, since they are based on the shapes of existing digits, and there are currently no other characters in Unicode that could properly be used for them. There is a character, used in some African languages, from the Latin Extended B block, numbered U+0190 and called <LATIN CAPITAL LETTER OPEN E> (formerly <LATIN CAPITAL LETTER EPSILON>), whose shape (Ɛ) may resemble that of Pitman's eleven. However, this character is clearly a letter (it has the property "Category: Letter, Uppercase [Lu]" and the function Character.isDigit() returns "No"), whereas Pitman's eleven should have the property: "Category: Number, Other [No]" (or a new "Category: Number, Transdecimal [Nt]") and the function Character.isDigit() should return "Yes". Also, and more importantly, while Pitman's eleven may conceivably be displayed with variant glyphic shapes based on the flat-topped banker's 3, such glyphs (especially the one with the flat part at the bottom, as in DSA's "classic" eleven) would not be acceptable for <LATIN CAPITAL LETTER OPEN E>, which means both characters are definitely distinct, even though some of their glyphic variants may appear indistinguishable in some fonts.
The DSA's digits are more problematic. On the one hand, once Pitman's eleven gets encoded it could easily double as DSA's eleven, since they are essentially the same thing; so I find it very unlikely that they would accept to encode each one separately. On the other hand, they may raise the objection that Dwiggins' ten looks too much like a letter rather than a digit, and that it can be thought of as a variety of <ROMAN NUMERAL TEN> (which has the property "Category: Number, Letter [Nl]", unlike <LATIN CAPITAL LETTER X> which is categorized as an uppercase letter); however, the function Character.isDigit() returns "No", because Roman numerals are not exactly digits, so this may be an argument in favour of a separate encoding (although they have outright rejected the separate encoding of letterlike hexadecimal A-F, so hex numbers have to use as digits characters that Unicode-aware software can't easily identify as digits, which is one of the inconveniences that the ones proposing their separate encoding as hexadecimal digits were trying to solve). In any case, I think the more distinct crossed form X might have a likelier chance of getting encoded (or it may be proposed to consider the crossed and the curly shapes as glyph variants).
uaxuctum - May 29, 2008 07:50 PM (GMT)
| QUOTE (icarus @ May 29 2008, 06:06 PM) |
two things - am on road. One, if we can get provisional codespaces for transdecimal, why stop at transdecimal eleven? Also, perhaps Dsa and dsgb should indeed decide once & for all re: digits. I know mr. Whillock was highly interested in that. This way, both publications would employ same symbols. As editor of Bulletin I am not necessarily wedded to dwiggins nor pitman; both are deficient. I guess in this way perfect is enemy of good. There is significant contingent in dsa which supports classic dsa numerals. I know dsgb stands behind pitman. I guess this standoff points to reserving codepoints but not defining form. |
I think reserving codepoints for a future set of transdecimal digits going further than eleven would be very nice (or simply assigning a defined meaning to those codepoints, but without any defined shape, so that they would function as "slots" where any set of transdecimal digits a font maker chooses could fit). However, from reading their forms, I know proposals submitted to Unicode have to be very clear and precise as to the identity and definition of the characters that are to be added. The idea of encoding completely "abstract characters", identified only semantically rather than graphically, although it could be a very convenient construct for certain purposes (such as to be able to write a document containing transdecimal numbers while allowing the user to choose which concrete set of digits will be used to show them), is not acceptable for Unicode, as far as I know.
icarus - May 29, 2008 08:49 PM (GMT)
Now that I am back I can reply a little more clearly.
I agree with uax that the transdecimals for 10, 11 can be reserved, that forms can be submitted. I think other practitioners have reserved a segment of codepoints (i.e. many of the "klingon" unused points), and that the DSA/DSGB should reserve in a similar fashion. It may be that these unused slots are simply getting the "klingon" numerals to be copacetic with the hexadecimal end digits.
Regarding "deficient" I mean that no proposal is going to be completely acceptable to everyone, and that waiting for the one proposal that everyone will love will mean nothing will ever happen. For instance, I've never liked the reverse 3 for eleven but completely accept and use the Pitman ten (independently arrived at the same form). Some folks crib about the pitman ten for reasons I think are antique (7 segment displays). However I will support any clear motion to move forward and will, so long as it's in line and tidy, and not excessively expensive, step forward and fund it. Note that I will do what orgs agree on.
BTW I do not think it's completely fair to discuss DSA digit ten without some input from a DSA member (other than me) who can support its rationale. (I am not a fan of X or chi as ten.) If there seems enough input and interest on this board I would contact active DSA members to probe whatever you all feel should be considered.
The best thing that could happen would be that at minimum two characters be reserved, one for transdecimal digit ten, other transdecimal digit eleven. As shaun states I think the two orgs agree on the digit eleven. The debate is over digit ten.
My office just finished a major project and after we invoice it tomorrow, handle a board meeting, and prep for a new employee, I will call some DSA folks about this.
icarus - May 30, 2008 07:04 PM (GMT)
Discussed unicode with some Members of DSA. We will write an editorial to invite input from the Membership in the next (August-September) Duodecimal Bulletin. This way we hope to obtain a wider base of input and support for the action. We'll discuss it at the 4 October 2008 general meeting at Nassau Community College on Long Island, NY. (All interested in dozenal are invited to meeting. College is a mile from the Carle Place stop on the Long Island Rail Road, which departs from Penn Station on Manhattan regularly).
The DSA does not support any particular symbols, but does use the "Dwiggins" or DSA-Classic symbols as a "lingua franca" in articles it publishes in the Duodecimal Bulletin. (In issues between 1980 and the most recent Volume 49 Number 1, the * asterisk and # "octothorpe" were used because these were more convenient, especially in the 80s and 90s. The Bulletin is returning to the "Classic" or "Dwiggins" numerals).
If this interest in Unicode codepoints is indeed an interest to unify around a pair of transdecimal digits for 10 and 11, the DSA is supportive of the effort. The Membership will thus be contacted via the Bulletin, as many are perhaps not active on this site, and are thus unaware of the discussion here.
Pending the October meeting the DSA should be in a better position to support further developments regarding the Unicode codepoints. (Contact editor@dozenal.org for direct comments.)
Endi - August 18, 2008 08:18 PM (GMT)
You may want to do a search for "eudcedit".
Shaun - August 19, 2008 08:23 AM (GMT)
| QUOTE (Endi @ Aug 18 2008, 08:18 PM) |
| You may want to do a search for "eudcedit". |
Interesting! But no good for my Mac!
Dan - August 22, 2008 04:15 AM (GMT)
It would sure be convenient if CSS would let us write something like <span style="text-flip: horizontal">3</span> and display it as Ɛ . Of couse, that would have the problem that it wouldn't degrade gracefully.
Fanguo - September 23, 2008 04:58 AM (GMT)
Actually, as long as you can establish that the character is in fact distinct, that multiple reputable sources use the character, and that a substantial user base needs it, it's easy to get characters added to Unicode.
I know a good bit about the proposal process, and I think these digits can be successfully proposed.
A reasonable place to put them would probably be U+218A for DIGIT TEN and U+218B for DIGIT ELEVEN, in the Number Forms block. (This also has the slight advantage of A=TEN and B=ELEVEN.) If they were added, they would probably have properties like this:
218A;DIGIT TEN;No;0;EN;;;10;10;N;;;;;
218B;DIGIT ELEVEN;No;0;EN;;;11;11;N;;;;;
By comparison, U+0030 DIGIT ZERO's properties are:
0030;DIGIT ZERO;Nd;0;EN;;0;0;0;N;;;;;
For those who aren't in the know, here's why some of the properties have to be different:
Nd/No: This is the general category. Nd indicates Number, Decimal, and No is Number, Other. *DIGIT TEN and *DIGIT ELEVEN are ineligible for Nd because of their use for a non-decimal numbering system.
0;0;0/;10;10/;11;11: This is the numerical value and type. The order is decimal; digit; numeric. The reason *DIGIT TEN and *DIGIT ELEVEN only have entries in the latter two are because they act as digits, but not decimal digits.
Of course, number parsers may have to be reworked for *DIGIT TEN and *DIGIT ELEVEN, but that's outside the scope of Unicode.