A comment on Hacker News led to 4½ new Unicode characters (2016)(unicodepowersymbol.com)
I posted a message on the Unicode mailing list, which eventually lead to an proposal to accept a large number of new characters that encodes symbols used in the old 8 and 16-bit micros.
My original question was specifically about the C64 character set, but we managed to get several others covered as well, including several symbols from the Atari ST character set.
The proposal was accepted, and the work continues to create a new proposal covering the character sets of even more old computers.
I'm quite happy that my modest question led to some real progress.
I'm disappointed to find that the Atari ST character set doesn't contain a bomb symbol. Obviously two entirely different things have been confused by my childhood mind.
The Atari ST did display bombs when an application crashed. But it was never part of the character set. It's a graphic that is displayed by the trap handler in the operating system. The number of bombs indicate the trap type, so three bombs means trap handler 3 (address error).
The symbols that do exist, but was not included in the proposal was the Atari logo and the J.R. Dobbs picture. Both of which are copyrighted, which is why they are included.
These type of comments are why I come to Hacker News and waste so much time here. Thank you.
Thank you for the appreciation. The Atari ST was a very important computer for me when I was growing up, and I love talking about it.
Early macs had a bomb symbol. It was what showed up for the equivalent of the BSOD.
Hmm. The Atari ST did display bombs when it crashed. The number of bombs was a hint about the cause. It always felt very alarming and had me scrambling for the 'off' switch, though.
This is absolutely fantastic. I was under the impression that the Unicode Consortium reviled box-drawing characters, but I find them incredibly useful for documenting code relating to grids.
If I'm mistaken about something, I would appreciate clarification. Is my mistake using box-drawing characters? Or is it thinking that they're disliked?
I've seen different opinions on the topic. I'm not sure there is anything single opinion of the Unicode Consortium here, but I'm not a member of the Consortium.
What we can see is that these characters have become very popular and useful, so it doesn't really matter whether the original intent was to move these things to a higher level protocol. Today they are here, and they are useful.
There was a discussion on the mailing list some time ago when there was a suggestion to add codes for underline, bold, italics etc. I can tell you that that is not a very popular idea.
Sometimes characters were added to Unicode just because they already existed in another character set. The idea is that Unicode should support lossless round-trip conversions to any other character set. Box drawing characters were part of the DOS character sets that were carried over to Windows.
real heroes don't wear capes
That is awesome! Nice work. :)
These are the worst symbols though.
⭘ should be on ⏽ should be off
Can you post a picture instead? Right now I can only see empty squares in your comment:
https://i.imgur.com/zXAwnbK.png — (I am using Safari 13.0.3 on macOS Catalina 10.15.1)
Edit: After seeing the image posted by @iruoy — https://i.imgur.com/OH3QTXQ.png — I have to disagree with you because the circle represents a zero (low voltage - FALSE) and the vertical line represents a one (high voltage - TRUE). If you studied computer science, logic, electrical engineering, or attended any basic course on electronics you would know that ones-and-zeroes are the written representation of an electric current, in fact, they are the basics of computing: Bits.
Here is more information about the power symbol:
I'm getting a rather strange rendering in Firefox 68.2.1 on Android 9
As the article mentions, the O was already in the unicode, it's definition was just updated, thus your font does already include it, while the I is a new symbol, and is too recent to be in your font and thus shown.
Works in latest Firefox on latest MacOS
No problems on firefox-esr (68.2.0esr) on Linux.
Yeah, the circle represents a zero - but it isn't a zeno, and the line represents a one - but it isn't a one. They're graphics that seem removed from the meanings they represent. Semantically they're removed too, since turning a device on seems to have complex, multi-byte significance.
And as graphics, the zero seems to represent "circuit connected" and the one seems to represent "circuit disconnected". Yet they have the exact opposite meanings!
They're meant for public consumption after all.
I did an image search on a few search engines to make sure "on off switch" still returned the same ⏽/⭘ switches I've seen forever. Pretty sure the back of your printer probably has one.
Where are these switches uncommon? Where are they flipped to the opposite meaning as you described? Where has the circle ever represented a connected circuit? Most circuit diagrams are squared off, so which one was circular? I'm so confused.
It's clearly supposed to be a metaphor for genitalia - with the line suggesting 'presence', and the circle suggesting absence. (Note the latin 'vagina', literally means sheath). It works better in the case of the numeral 1, which even has a glans.
Whoever downvoted this was apparently offended by the association. However, there is insight in the point you raised, and you stated it intellectually - not in a juvenile sense, which was in the eye of the beholder.
In anthropology and psychology, primitive symbols like a vertical line or a circle are often found to represent the male and female dichotomy. In fact, it's quite prevalent cross-culturally as basics of a visual language. 
Insight? It's false. The symbol for zero originated as a dot, indicating an unfilled place in the place value system, and evolved from there. 
Obviously, your source doesn't support your argument. Just because something once had one shape doesn't mean a new shape can't have new connotations. You're jumping from 'the zero was not always a hole', to 'the zero can never symbolize a hole'. Which is pretty strange.
This 1/0 man/woman white/black kind of binary is so ubiquitous in our culture it's kind of redundant to go and find examples - but it is in itself interesting that when you mention stuff like this, some people will always find a way to claim it is nonsense.
I think it's a desire for security amongst political turbulence - you say, this stuff is abstract and clean and without cultural baggage, so I can hide in it from the world, which is ugly and ambiguous and provokes uncomfortable reactions.
It's a kind of desire that's really common around engineering, mathsy people - I mean, part of the attraction of these subjects is you don't have to navigate anything sticky. So that's why it usually provokes a pretty extreme reaction if you profane the temple by bringing cultural stuff in.
A word or symbol does not represent every single thing that can be free-associated with it. If you want to show that "1" represents the Washington Monument, it is not sufficient (or even necessary) to point out that they're the same shape, you have to show some evidence that there once was a culture that used the symbol "1" whenever they wanted to say "2 15th St NW, Washington, DC."
Using '1' to symbolise the washington monument would be reasonable, if the washington monument was something people very commonly referred to. It would also be reasonable to describe that as phallic. It wouldn't be reasonable to use 1 to refer to an address in DC, because you're jumping from three obviously formally similar things, to the street address of one of those things.
It's a pretty good example of what people mean by free association - and why what I'm(1) doing isn't.
1- I mean, obviously I'm about the millionth person to make this particular observation.
You got a point there (pun intended :) about the symbol for zero.
I guess I was being generous with interpreting the parent comment, by mentioning the (possibly common and cross-cultural) visual association of a circle with womanhood, a straight vertical line with manhood.
I didn't mean to imply that the symbols for zero and one have direct historical origins in those ideas - like the parent comment might have suggested - but I do think there is a philosophical or artistic merit in drawing the analogy.
I never heard of this. Have you ever heard of the symbols also used for Mars and Venus indicating male and female?
According to that page, in some contexts, a circle means female and a square means male, but I've never heard of that before either.
Interesting, I've seen the symbols for Mars and Venus being used to represent genders, but not a square or triangle for male.
As for zero/circle/woman and one/vertical line/man.. I guess in my mind, they couldn't be more obvious, universal and simple - to put it crudely: the hole and the stick.
But also: off and on, dark and light, absence and presence.
(So far, I have found no historical evidence that supports the above theory, no basis in logic or fact. I'll leave it as idle speculation on a possible primordial/mythological way of thinking in pictures, with perhaps two of the most primitive symbols.)
I have always seen it this way too: a broken or completed circuit. But I've accepted that my initial understanding of the visual metaphor doesn't match its intent. I think misunderstandings like these result when we abandon written language to point and grunt at tiny pictures.
I've always looked at the line as a connection and the open circle as no connection. It's just what I came up with when I was young as a way to make sense of the markings.
Interesting. I always took ⭘ to be an open circuit and ⏽ to be a closed circuit, using the symbol for a switch as the rough guide. Perhaps they're poor symbols if people can look at them and come up with rational reasons for each interpretation to be correct.
To be extremely pedantic, I think they’re correct at an implementation level: for most chips, as long as they’re receiving power at all from a power supply, they’ll run, unless a reset line is being held high. In most consumer electronics, that reset line is held high by default (by power coming off the wall or from the battery), unless power is supplied to a NAND gate coming from (a flip-flop in front of) the power switch on the front of the computer.
To “turn on” a computer (using the push-button at the front, rather than by flipping a power-supply toggle-switch) is, then, actually to SET that flip-flop, feeding a high input to the NAND gate, which in turn will turn off the reset line to the CPU.
(And, vice-versa, if your computer has a “reset button” on the front like some old computers do, that one throws the flip-flop back to its RESET state, which puts the NAND gate back low, which brings the reset line to the CPU back high. Wiring for push-button toggles is weird!)
Hmm, in most embedded processors I know (e.g., the ESP8266/32 , RPi's BMC2835 , or Ti's MSP430 ), you send the line down for reset but hold it high for running (either by internal pull-up or by actually holding the line).
 https://www.raspberrypi.org/app/uploads/2012/04/Raspberry-Pi... In particular, see first page, D15
Why? It's clearly a binary switch and zero is traditionally false, i.e. Off.
In several cars the air vents are marked with filled circle and empty circle and I always struggle to remember which means which.
I was looking at the dials on a refrigerator, and realized I never really knew which way is which - does turning the dial towards "cold" make it colder, or does the opposite, revealing more of the indicator from the "cold" end make it colder?
There is a tapered blue line with numbers, so is the wider end colder, because blue is cold and more blue is colder...or is the other end colder, because it has smaller numbers, and lower temperatures are smaller numbers?
Filled circle means "vent is blocked, so no air can pass". Empty circle is an unblocked vent through which air can pass.
I think the problem is those meanings can plausibly be reversed, so it's hard to remember which way round it goes.
A better way to do it that I've seen is to represent air flow with wavy lines or similar on the 'open' symbol.
Pretty sure in the last car it was the opposite. Filled circle meant air was coming.
So you can see why this is confusing
This is a great story.
Next up: I'd love to see the sub/super proposal get some more attention and effort.
Having the combining character version of that would be fantastic. Eventually, It would be amazing if all the details of math rendering could make it into unicode.
> Eventually, It would be amazing if all the details of math rendering could make it into Unicode.
I'm not sure if I agree with you or not... Generally I'd say I do, but we're going to have a hard time "finding the line". Meaning what counts as "math"? Surely 1 + 1 is, as is ∇×𝐇, and we can start to do things like x⁰. However, what about a graph with nodes and edges (just as an example)? Is that "math"?
One things strikes me about strings of characters... you can select and copy/paste them (at least in my native alphabet of Latin) very reliably. This property is not present with Unicode in general.
It’s not easy to select characters in Arabic and many Vedic languages.
Microsoft published a technical note on using a Unicode linear encoding for math rendering several years back:
Office apps support it, apparently, as one among many representations for math zones. A blog post on it:
So the sequence [START]x[SEP]y[END] would be rendered with x above the y with a line separating.
1. A code-point that starts a fraction. 2. A code-point that separates the numerator from the denominator. 3. A code-point that ends the fraction.
[START][START]x[SEP]y[END][SEP]z[END] would be a fraction with a fraction inside.
The fraction slash (U+2044) was supposed to be a simpler version of that, but it's not very well supported. https://en.wikipedia.org/wiki/Unicode_subscripts_and_supersc...
"All details of math rendering" probably fall squarely into "markup" territory, thus outside Unicode. At least so far any efforts trying to standardize this led to this. Math is generally nothing I'd call plain text.
Should Unicode be able to represent Egyptian Hieroglyphs? The lack of similar facilities for Egyptian is why Unicode is useless for representing hieroglyphs, despite having a goodly number of signs encoded.
Or--dare we dream it?--chemistry notations.
This would be great. I need a way to cleanly document code implementing equations that contain letters in the superscripts and subscripts. x_i^j, for example, might be the position of Body i in Frame of Reference j. I can mentally pack it back together when it's just the one, but it would be nice if it could more closely resemble the original equation.
Alas, I was told that was "exactly what Unicode doesn't want to endorse."
I give this achievement 4½ stars.
I love how this was apparently accepted into the standard in 2016, and yet Chrome still displays an empty square instead. Unicode is such an unbelivable mess when it comes to support it's crazy. Windows displays it correctly for me in Word, but when pasted into Teams it comes out as a semicolon instead. Brilliant.
But it’s not a mess, there is a clear chain of responsibility: * consortium -> font developers -> app developers. If anything is broken, bitch to the first level (typically app devs) and wait for your grievance to percolate upwards as necessary; if it doesn’t happen, you know who to blame.
* most modern operating systems have hopefully sorted out their issues a long time ago.
If you have to complain, it's broken. As a consumer I don't care what the issue is and who the responsible party is.
Unicode is a clusterfuck exactly because the chain is too long and the implementation errors are too easy to make and the world is rife with incomplete implementations.
Every global standard is a clusterfuck at some level. That is just the nature of the beast.
Look at any widely adopted standard, be it exit signage, networking protocols, time standards anything... you’ll find a cluster.
If you tore down every one of those and tried to rebuild “from first principles”, eventually it too will become a clusterfuck.
Is it really so surprising that updating a universal standard takes time?
IMO, these processes should happen slowly.
It has nothing to do with the update speed and everything to do with the standard itself. Even if you were freezing Unicode now, you wouldn't encounter complete, correct implementations in the wild more than 50% of the time in the next 5 years.
Makes sense to just use image-replacement if you need emoji to show up reliably, like on a landing page.
Also spares you the problem of emoji implementation differences. https://www.mentalfloss.com/article/516048/22-emojis-look-co...
>* most modern operating systems have hopefully sorted out their issues a long time ago.
Except Android where the system fonts have to be updated by the manufacturer. What a fucking mess.
It has been 3 years, and Android & iOS still don't fully support Unicode 9. Any ideas who I should bug about that?
Google's Noto font hasn't released anything for 2 years. No idea if Apple's default font is open source.
I love how iOS doesn't understand this URL, and still doesn't have these characters. Too busy removing the Taiwanese flag, and implementing "Animojis".
I still think unicode should not have added emojis. The big guys are adding animated emojis now, and that's clearly out of scope for character sets. If they continue down that path unicode will eventually become SVG with animation. IMHO they should have stopped short of emojis.
I'm glad they decided to be inclusive since 100-200 million people were using emoji before they got added to unicode and without adding them all of those people's conversations and communication patterns would have been pissed on and likely those countries with high populations of people using them would never have switched to unicode because their customers would have demanded to keep them which would mean we'd have the nightmare of multiple encodings again. No thanks.
I love emoji and have since I started using them in 1999 (weren't added to unicode til 2010). Most of my friends and family love them, it makes their communication richer and more easily nuanced so I'm really glad they were added.
whether you like the emojis or not, they've certainly helped to drive unicode adoption. Good luck making the business case for why you're building support for some obscure international character set into your product, but tell somebody you're adding support for the latest emojis and all of a sudden having unicode 10 support is a big deal.
> whether you like the emojis or not, they've certainly helped to drive unicode adoption.
Especially since many of them are outside the BMP, so adoption forced people to go beyond UCS-2.
IMO Unicode shouldn't have ever added characters with color. Let color be the property of the markup. The original emoji were all monochrome, and mainly displayed on monochromatic flip-phone screens. The fact that there now exist emoji that are otherwise identical but have different color is absolutely idiotic.
> The original emoji were all monochrome
Emoji were in color before they were added to Unicode. Even the original iPhone 3G Emoji release pre-dated their addition to Unicode (Apple used the SoftBank encoding). They were even in color on flip phones as far back as 1999 https://emojipedia.org/softbank/1999/
In Unicode, they're represented by a combination of emoji character + colour character, not by new characters: http://www.unicode.org/reports/tr51/#Diversity
Seems a decent solution to me. Monochrome screens can just ignore the colour character.
That doesn't add colour to, say, the smiley face, it just changes the colour from the default, usually yellow.
As well as adding colour browsers display emoji at a larger point size. It is really annoying when some pre-emoji symbol (like the phases of the moon in TFA) gets emojified and where your document had some discreet use of symbols it now has huge colourful icons drawing the eye.
And afaict there is no way of turning off this behaviour globally.
There are at least 7 variations of the heart symbol ️ differing only in color ️🧡, not including the heart (playing card) which also has multiple variations.
Edit: apparently HN limits the number of emoji per post, I originally included several.
There are 27 variations of the capital letter A in Unicode (see the confusable list https://unicode.org/cldr/utility/confusables.jsp?a=A&r=None).
Redundancy has never been a problem with Unicode. After the decision was made to add the symbols of the original encoding to the list of Unicode characters, they had to add all heart variation, otherwise it wouldn't have been able to be backward compatible.
Or alternatively add a color modifier? I'm not sure that would have been a better solution.
> apparently HN limits the number of emoji per post, I originally included several.
I am surpeised HN allows any emoji an all. Pretty sure they have been totally banned in the past. It seems like the orange heart is now (or has always been) whitelisted. 🧡🧡🧡
Now? Emoji were animated even before they were added to Unicode https://emojipedia.org/softbank/2006/
Animoji is very different from using an animated gif in place of an emoji.
I didn't take it the GP was talking about Animoji, as those are motion-captured, rendered in 3D and sent as pre-rendered video files (GP mentioned something that could be represented as animated SVG). I don't think anyone is proposing a way to encode facial motion-capture into Unicode.
While I agree with your sentiment, can you define "emoji"?
A graphical or iconographic concept that cannot be unambiguously represented by a single character in any natural language and which does not have a single standardized rendering (e.g. the Power symbol does have a single standard rendering and is therefore not an emoji).
I'm no character or Unicode expert - that's just a wild stab in the dark, largely based on why I dislike emoji being in Unicode.
I still prefer ASCII style smilies to emojies ;-)
I promise to never do that on HN again!
My sister pointed out that I'm a little random about mixing them. I use both, based solely on what I feel like typing at the time. And I'm acutely aware that the emojis she sees are not the ones I do, but the ASCII ones are pretty much the same, since I use Android and she's an Apple-head.
Besides, I love the slobbering ASCII guy ;-p~~~~ None of the emojis let you amplify slobber according to need like that.
One emoji I wish I had a good ASCII alternative for is :thinking: U+1F914. There is :^) but only the skype generation will make sense of that.
I've changed my emoji writing style from the canonical ;) to ; ) just so that editors won't auto "correct" my proper ASCII emoji into an unholy yellow winking face.
That is a great tip! I wish I had thought of that : )
At a certain point it becomes much simpler to encode your "text" as SVG rather than use a convoluted and bloated standard like Unicode.
For 99% of all development, Unicode is a non-issue. Perhaps you have to consider Right-To-Left support and that is about it. Even then good RTL support is more about page layout than the actual mechanics of displaying the text.
If you have a better idea to enable people around the world to use computers in any language that doesnt look something like Unicode, I’m sure everybody will adopt it.
I get that I'm in a minority here, but I'm not even sure such a thing is necessary or a good idea. Generally speaking simple things are better than complex things, and from a computing perspective human written languages are ludicrously complicated. I'm not convinced that complicating all our software to support poop emoji is better than allowing human language to adapt to the medium in the manner it has done since language was first written down.
> For 99% of all development, Unicode is a non-issue.
Until it suddenly is an issue. Like, dealing with unpaired surrogates in Windows file paths.
I mean, we're talking about an encoding where you need a rather large database and quite a bit of code just to tell how many characters are actually in a string.
Replacing Unicode characters with SVG representations would be terrible for accessibility.
Seriously, try sending a link to this page to a friend on iMessage... they won't be able to click it.
Power up stars for the first person to successfully argue for removal of a UNICODE character.
Applicable Version: Unicode 2.0+
Once a character is encoded, it will not be moved or removed.
This policy ensures that implementers can always depend on each version of the Unicode Standard being a superset of the previous version. The Unicode Standard may deprecate the character (that is, formally discourage its use), but it will not reallocate, remove, or reassign the character.
Awww.. no fair
Scanning quickly through I found this one: ℻ (https://unicode-table.com/en/213B/).
Apparently it's semantically fac·sim·i·le. Which means (according to Google) "an exact copy, especially of written or printed material."
My argument is something like, why not just write FAX. Or is the counter that some fonts will specialize this character to something closer to the native language? That seems unlikely, and instead people will probably learn that FAX means "to make alike", from Latin. Or is it that we need to make it just a little bit above the baseline to indicate that it's special. Surely "FAX" isn't the only thing that should be allowed to be special, right? But then that's a whole can of worms. Anyway, I'm rambling...
This was the best I could do in the limited time I had (I really should be asleep by now).
On business cards, in letters, etc you write your phone numbers.
TEL 0123 45 67 89
FAX 0123 45 67 89-0
the "TEL" and "FAX" part should be superscript small capitals though. That's there these special symbols come in.
Some fonts also replace those with icons for phone/fax.
They're actually still in use in Germany today.
Well, fax machines are still in use here (most notably in mdeical offices, it seams), but superscript abbreviations? Never seen them in the last 40 years. I mean, yes, I have seen the letters FAX or TEL in front of the number, but never really explicitely as a new character (AFAI can tell).
This would also fit best with the experience that people crafting cards or letterheads never really know all the intricacies of Word, or Unicode, or whatever they use, and just "make it look good" - use tabstops instead of tables, simply type FAX and mark it as superscript, etc...
> This would also fit best with the experience that people crafting cards or letterheads never really know all the intricacies of Word, or Unicode, or whatever they use, and just "make it look good" - use tabstops instead of tables, simply type FAX and mark it as superscript, etc...
I mean, that’s widely known anyway. Look at most letterhead templates online, pretty much all of them are broken and quite painful in the way they’re built. Broken tables, tabstops, all combined painfully.
Often enough proper tables would simplify a lot, if combined with columns one can create amazing things. If one even adds automated hide/unhide elements (e.g. page count, automated Internetmarke or hiding it if unused, etc) one can create stuff that’d save hours of work every day.
Not just in Germany (where they are a legally accepted alternative to postal letters, which is a significant difference to emails!), also in Japan: https://www.noted.co.nz/money/money-business/japan-has-a-biz...
Actually, nowadays PDFs in email are also accepted instead of fax! (or emails signed with the ePerso S/MIME function).
So far a court, a health insurance company, the national pension fund, and some municipal administrations I've had contact with all have accepted PDFs attached to emails just fine :)
And if you ever need actual fax functionality, the most common home router (Fritz!Box) has a virtual fax machine integration, so you can send and receive faxes with an app or an email gateway from your own VoIP SIP "landline" connection.
I assume it was intended to indicate after a telephone number that a telefax was connected to it. For example when writing contact details on a business card.
Probably right on the rationale, but I cannot remember anyone using FAX as a symbol. But, not the only obsolete bit of technology in Unicode though.
It is in regards to a facsimile machine, its operation, and the output from it. FAX doesn't refer to every meaning of facsimile out there.
https://www.google.ca/search?q=define%3Afacsimile+machine "a device that can send or receive pictures and text over a telephone line."
This search is more appropriate.
Most likely, some legacy character set had a ℻ character, so Unicode had to include it for compatibility reasons.
I imagine the only way that will ever happen is a situation like the gun to water pistol transition, where enough implementers agreed to change the representation.
Imagine if we didnt just have a snowman, but also happened to have a cute bear.
> There was some discussion around ⏾ as several “moon” characters already existed. None of them [...] convey the semantic meaning of “Sleep” – so ⏾ was accepted.
I'm not convinced ⏾ conveys that meaning either, unless it's explicitly used alongside the other new symbols. And if it is used alongside those symbols, a couple of the existing moons could also work just fine.
(⏾ is appearing as tofu on HN for me, but I'm just going to roll with it.)
It's disappointing that OS-included font support for these has apparently lagged. iOS 13.2 supports emoji from Unicode 12, but doesn't ship with a font that includes these symbols. Anyone know if any OSes do support them?
iOS 13.3 support these symbols.
 The 13.3 Beta I'm currently running on my 6S Plus renders them fine.
By the way, if you're viewing them on the unicodepowersymbol.com website, they show up because of a web font included on the page. So here they are on a site without the font: ⏻⏼⭘⏽⏾
Ah, that makes sense. I retract my statement – those symbols are still MIA on 13.3.
Does anyone know how to add these to Ubuntu 19.10? When I search for Power in Characters, it has the 'Power On' character but you can't see the symbol 
Now if only we could get Hacker News to support emojis
Edit: This comment ended in man-shrugging, but Hacker News stripped the emoji.
They used to be supported until they were abused.
Maybe only allow them if you have a certain reputation/points like downvoting and flagging has? Then if it’s abused, remove that privilege for that person (IIUC, flag rights can be revoked)
You get a single emoji on HN 🧡
> What other useful and/or important symbols are missing from Unicode?
Seems like a good time to ask this question again. Any new answers?
The USB, Wi-Fi and Bluetooth symbols.
A spray can to represent spray paint, insect spray, or spray lubricant.
There's a power plug but no power socket.
There's no staples or stapler.
There are no appliances—no oven, microwave, toaster, mixer, washer, or vacuum cleaner.
There's no traffic cone.
> The USB, Wi-Fi and Bluetooth symbols.
I believe those are all trademarked. Unicode tries to avoid those.
> A spray can to represent spray paint, insect spray, or spray lubricant.
Interesting idea! I wonder if there'd be any interest in getting wider coverage of other bitmap paint tools: rectangular selection, lasso, paint bucket...
> There's a power plug but no power socket.
That's problematic from a localization perspective, as electrical sockets vary widely from country to country, and some of them may be difficult to recognize as a socket at a small size. For example, some European countries use a socket which is made of three circular pins in a straight line -- it'd just look like an ellipsis.
Besides, there isn't a lot of symbolic meaning that's conveyed by a socket that couldn't be expressed just as well with a plug.
> There's no staples or stapler.
Maybe. There isn't a lot of symbolic meaning to these either, though.
> There are no appliances—no oven, microwave, toaster, mixer, washer, or vacuum cleaner.
A lot of those will just look like white boxes at text size, and -- again -- they don't have a lot of symbolic meaning.
You might be able to make a case for an upright vacuum cleaner, though, since that's visually distinctive and is associated with cleaning.
> There's no traffic cone.
Oh, I like that idea. It's got some symbolic meanings, too, like "warning" and "under construction". There is already a construction sign (U+1F6A7), though.
To be fair, a lot of existing unicode points don't really have a semantic meaning either. Eggplant only got a semantic meaning after becoming a part of unicode.
It means aubergine, doesn't it? At least originally. It's distinctive.
For cleaning, a broom is probably better than a vacuum cleaner.
A broom already exists 🧹
"Aubergine" and "eggplant" refer to the same vegetable. The "semantic meaning" is probably the NSFW one.
The Bluetooth logo is two runes, which perhaps could have a zero width joiner between them in iOS emoji abuse style.
I guess the power socket might prove contentious, since they look very different across countries... power plugs, although variable, can be identified more easily since they look more alike and have a power cord attached.
The traffic cone would be good for VLC! They could give rename their app so the name matches the icon :-) That got me thinking: are there any common apps that use emojis in their name?
Rename VLC to “TRAFFIC CONE GLYPH”?
A dim/dark light bulb to complement the bright light bulb.
The mouse pointer symbol.
More than one breed of dog.
There's a few sets of punctuation marks that look quite interesting. Wikipedia currently uses SVGs to represent the characters. https://en.wikipedia.org/wiki/Punctuation#%22Love_point%22_a...
I miss 7-segments displays. It could be 8 Unicode characters, for instance (empty, plus one per segment), or some other scheme.
A hiking/backpack emoji would be useful at times, but then a lot of other activities could make it in: Rollerblades, music instruments, etc.
One pet peeve of mine is the "person shrugging"  emoji looking identical to "woman shrugging"  with most fonts. I prefer to use genderless emoji, but that does NOT look genderless. I also dislike skin color selectors, but again, that's personnal preferrence.
Prince's love symbol: https://parkerhiggins.net/2013/01/writing-the-prince-symbol-...
Discussed at the time: https://news.ycombinator.com/item?id=11958682
Input into a decision like this might be one of the best approximations of achieving immortality.
That post inspired me to submit some Unicode characters as well! I found 11 Hakka & Taiwanese characters when trying to scrape & parse the Bible as plain-text, and wrote a blog post about my experience of the submission process.
Great work, but we are still missing a "tinfoil hat" symbol in Unicode (or as Emoji). This is a major letdown! :D
Even the existing repurposed one doesn't work on my up-to-date Linux Mint installation.
The "advisor" to the effort is doing interesting work on nano-grids, timely because of the PG&E shutoffs.
> We had the very generous help of Bruce Nordman, who was involved in the original IEEE 1621 standard.
Previous discussion: https://news.ycombinator.com/item?id=11958682
Is there a practical upper limit of Unicode set capacity? Kind of IPv4 limits with all reservations and local/multicast quirks.
Unicode has 17 planes. Each plane has 65,536 code points, so the total capacity is 1,114,112 code points. In practice it's a bit less, thanks to surrogates, private areas, and a bunch of "non-character" code points. That still leaves close to a million code points.
Last time I checked, just over 13% of the available public space was allocated. Most of the planes remain unused.
And combinations are used, so e.g. a new emoji only takes zero to one new code point, not one code point per variation. (zero because if I remember right the emojis for families are just something like "woman + boy + girl + man", all existing characters, joined by Zero-Width Joiners)
Why 17 and not a round number like 16? That would give a nice “round” one mebicodepoints
BMP + 16 planes.
You can blame UTF-16 for this mess. Unicode was originally meant to be able to encode two billions (2^31) characters. It bent over backwards to accommodate the limits of the bastard child that is UTF-16.
Maybe they did it because Windows was backwards and used UCS-2, and later, UTF-16? If somehow, Windows managed to switch to UTF-8, I’m sure they (Microsoft) would mess it up and keep the 4 byte limit (imposed by Unicode) there even if it’s later removed (for backwards compatibility). What Microsoft really needs to do, IMO, is rewrite the Windows API to use UTF-8 or UTF-32. Make a `wwchar` type or something...
I really like the smiling poop and everything but [personally] I've only ever needed a feed icon⸮
I have some good ideas for emojis, but I'm guessing that's even harder to get through...
Poop is in there, and the love hotel.
Those have a compatibility story though
Very cool. Never really considered how new characters were added. Thanks for sharing!
Wonderful :) Now can we please have this:?
We already have the interrobang, no? So why not this?
>> ⏻ To The People!
That line at the end made me smile.
led, not lead.
Fixed. Thanks! Those are surprisingly hard to spot sometimes.
Moderation software idea: globally replace lede/lead with led in any discussion about lede vs lead. Would bury the led once and for all.
Is there a reason this is being posted now, or does this need a (2016) in the title?
I'm guessing it seemed like it would be even more awesome than https://news.ycombinator.com/item?id=21686643.
What is a half character?
For power off button they reused Heavy Circle.
From the article:
> "So, ⭘ is our ½ character "
Oh the irony of this comment not working. :)
Works for me :P
I can't tell if the rendering I see is correct or not. Can you describe the correct rendering and some incorrect yet probable one(s)?
It is a "heavy circle".
This page may help: https://decodeunicode.org/en/u+02B58
a character is a character. You can't say its only a half
The symbol was already present, but it was given new documented semantic content ("meaning"). Previously it was just a circle, now it's also the symbol for "power off".
>You can't say its only a half
The character is only half a character.
There, I just did.
I know a guy, who was fighting hard with unicode consortium for adding two characters with meaning "begining of blob, end of blob". Imagine how simpler the coding would be without the need to escape anything. Unfortunately he didnt succeed. They were more busy adding smileys.
You'd have to escape those new symbols if they occur within the blob.
Just add a `\` /s
That's too lazy, we need unicode symbols for escaped start-blob and escaped end-blob /s
Not sure why /s. Also, you only need escaped-end-blob. start-blob can safely be used inside a blob. So it's only 3 new codepoints, which is genuinely an interesting proposal.
Because that doesn't round trip. You need to be able to distinguish whether the blob originally had "end-blob" or "escaped-end-blob". So now you need another character for double-escaping, and so on, and so on.
To avoid that issue you're back to either adding a backslash, or doubling up the character inside the blob... but if you're doing that you could have just used " all along. No need for new characters!
If you really don't want to change the contents of the blob, and can't length-prefix, then you could also use a new UUID as your delimiter each time you embed a blob.
No. It depends only on developers. If they would start embeding it into format, they would break the rules. If not, it would work.
If you're telling developers not to embed that character anywhere then you only need one end of blob character. So not 'no', I'm still right in saying that having two end characters is a non-solution.
And you still didn't explain why the existing ASCII control codes don't solve your problem. The suitable ones are also not supposed to appear inside text.
What's wrong with U+0001 and U+0004?
They have different purposes. See ISO 1745 aka ECMA-16. Basically it goes:
(SOH header STX body ETX)⁺ EOT
Or two private use characters, if you want to be ASCII-clean or 8-bit-ish-clean.
It's an interesting idea, but you still wouldn't be able to encode the byte sequence for "end of blob" as part of the blob data... without escaping it.
They shouldnt occur in blob as they would be designated just for this. All other characters can occur in real world, those two could get there only by mistake.
What about a blob containing a page which contains blobs itself?
What if I want to put them in a comment like this, to discuss them? Well, you might say I shouldn't do it and should just refer to their code points, but basically that's a form of escaping as well.
How much blob is blob? If it is binary, then there is a chance of clash, no matter how you dare them bytes to do that. If it is always-valid utf-8, then use 11111111 and 11111110 (or any other out-of-band markers available for source encoding) to open and close your message. Not sure what’s the point for it though.
but what if you need to send a word document containing an explanation about how to use the blob characters?
Turtles all the way down....
Are we talking putting a sequence of 8-bit bytes (octets) into Unicode characters by mapping all 256 possible byte values to the first 256 Unicode code points?
If so, if I'm not mistaken, this is actually less efficient than just using Base64 encoding.
When you put those Unicode characters into UTF-8, the first 128 code points are going to require one byte (with a leading 0 bit). The other 128 of them are going to require two bytes. So that's 50% overhead (assuming the blob's bytes are evenly distributed) because half of the values have 0% overhead and the other half have 100% overhead.
Meanwhile, Base64 sticks 6 bits in each encoded character. In 4 characters, you can fit 3 bytes of your raw info. So that's only 33% overhead.
Also, it was done already. See ISO 2111 aka ECMA-24.