Hacker News

196 Comments:
lokedhs said 5 days ago:

I posted a message on the Unicode mailing list, which eventually lead to an proposal to accept a large number of new characters that encodes symbols used in the old 8 and 16-bit micros.

My original question was specifically about the C64 character set, but we managed to get several others covered as well, including several symbols from the Atari ST character set.

The proposal was accepted, and the work continues to create a new proposal covering the character sets of even more old computers.

https://www.unicode.org/L2/L2019/19025-terminals-prop.pdf

I'm quite happy that my modest question led to some real progress.

jimnotgym said 5 days ago:

I'm disappointed to find that the Atari ST character set doesn't contain a bomb symbol. Obviously two entirely different things have been confused by my childhood mind.

lokedhs said 5 days ago:

The Atari ST did display bombs when an application crashed. But it was never part of the character set. It's a graphic that is displayed by the trap handler in the operating system. The number of bombs indicate the trap type, so three bombs means trap handler 3 (address error).

The symbols that do exist, but was not included in the proposal was the Atari logo and the J.R. Dobbs picture. Both of which are copyrighted, which is why they are included.

ghostbrainalpha said 5 days ago:

These type of comments are why I come to Hacker News and waste so much time here. Thank you.

lokedhs said 5 days ago:

Thank you for the appreciation. The Atari ST was a very important computer for me when I was growing up, and I love talking about it.

bradleyjg said 5 days ago:

Early macs had a bomb symbol. It was what showed up for the equivalent of the BSOD.

gmac said 5 days ago:

Hmm. The Atari ST did display bombs when it crashed. The number of bombs was a hint about the cause. It always felt very alarming and had me scrambling for the 'off' switch, though.

http://www.atari-forum.com/viewtopic.php?t=15500

slavik81 said 5 days ago:

This is absolutely fantastic. I was under the impression that the Unicode Consortium reviled box-drawing characters, but I find them incredibly useful for documenting code relating to grids.

slavik81 said 5 days ago:

If I'm mistaken about something, I would appreciate clarification. Is my mistake using box-drawing characters? Or is it thinking that they're disliked?

lokedhs said 5 days ago:

I've seen different opinions on the topic. I'm not sure there is anything single opinion of the Unicode Consortium here, but I'm not a member of the Consortium.

What we can see is that these characters have become very popular and useful, so it doesn't really matter whether the original intent was to move these things to a higher level protocol. Today they are here, and they are useful.

There was a discussion on the mailing list some time ago when there was a suggestion to add codes for underline, bold, italics etc. I can tell you that that is not a very popular idea.

mark-r said 5 days ago:

Sometimes characters were added to Unicode just because they already existed in another character set. The idea is that Unicode should support lossless round-trip conversions to any other character set. Box drawing characters were part of the DOS character sets that were carried over to Windows.

adhoc32 said 5 days ago:

real heroes don't wear capes

Baeocystin said 5 days ago:

That is awesome! Nice work. :)

eddieone said 5 days ago:

These are the worst symbols though.

⭘ should be on ⏽ should be off

guessmyname said 5 days ago:

Can you post a picture instead? Right now I can only see empty squares in your comment:

https://i.imgur.com/zXAwnbK.png(I am using Safari 13.0.3 on macOS Catalina 10.15.1)

---

Edit: After seeing the image posted by @iruoy https://i.imgur.com/OH3QTXQ.png I have to disagree with you because the circle represents a zero (low voltage - FALSE) and the vertical line represents a one (high voltage - TRUE). If you studied computer science, logic, electrical engineering, or attended any basic course on electronics you would know that ones-and-zeroes are the written representation of an electric current, in fact, they are the basics of computing: Bits.

Here is more information about the power symbol:

https://en.wikipedia.org/wiki/Bit

https://en.wikipedia.org/wiki/Power_symbol

http://www.decisionsciencenews.com/2016/10/28/off-switch-rem...

non-entity said 5 days ago:

I'm getting a rather strange rendering in Firefox 68.2.1 on Android 9

http://imgur.com/a/Q2isAPT

dwild said 4 days ago:

As the article mentions, the O was already in the unicode, it's definition was just updated, thus your font does already include it, while the I is a new symbol, and is too recent to be in your font and thus shown.

parliament32 said 5 days ago:

No problems on firefox-esr (68.2.0esr) on Linux.

iruoy said 5 days ago:

Works in latest Firefox on latest MacOS

https://i.imgur.com/OH3QTXQ.png

said 5 days ago:
[deleted]
squiggleblaz said 5 days ago:

Yeah, the circle represents a zero - but it isn't a zeno, and the line represents a one - but it isn't a one. They're graphics that seem removed from the meanings they represent. Semantically they're removed too, since turning a device on seems to have complex, multi-byte significance.

And as graphics, the zero seems to represent "circuit connected" and the one seems to represent "circuit disconnected". Yet they have the exact opposite meanings!

They're meant for public consumption after all.

ckrailo said 5 days ago:

I did an image search on a few search engines to make sure "on off switch" still returned the same ⏽/⭘ switches I've seen forever. Pretty sure the back of your printer probably has one.

Where are these switches uncommon? Where are they flipped to the opposite meaning as you described? Where has the circle ever represented a connected circuit? Most circuit diagrams are squared off, so which one was circular? I'm so confused.

pasabagi said 5 days ago:

It's clearly supposed to be a metaphor for genitalia - with the line suggesting 'presence', and the circle suggesting absence. (Note the latin 'vagina', literally means sheath). It works better in the case of the numeral 1, which even has a glans.

lioeters said 5 days ago:

Whoever downvoted this was apparently offended by the association. However, there is insight in the point you raised, and you stated it intellectually - not in a juvenile sense, which was in the eye of the beholder.

In anthropology and psychology, primitive symbols like a vertical line or a circle are often found to represent the male and female dichotomy. In fact, it's quite prevalent cross-culturally as basics of a visual language. [citation needed]

whatshisface said 5 days ago:

Insight? It's false. The symbol for zero originated as a dot, indicating an unfilled place in the place value system, and evolved from there. [0]

[0] https://www.newscientist.com/article/2147450-history-of-zero...

pasabagi said 5 days ago:

Obviously, your source doesn't support your argument. Just because something once had one shape doesn't mean a new shape can't have new connotations. You're jumping from 'the zero was not always a hole', to 'the zero can never symbolize a hole'. Which is pretty strange.

This 1/0 man/woman white/black kind of binary is so ubiquitous in our culture it's kind of redundant to go and find examples - but it is in itself interesting that when you mention stuff like this, some people will always find a way to claim it is nonsense.

I think it's a desire for security amongst political turbulence - you say, this stuff is abstract and clean and without cultural baggage, so I can hide in it from the world, which is ugly and ambiguous and provokes uncomfortable reactions.

It's a kind of desire that's really common around engineering, mathsy people - I mean, part of the attraction of these subjects is you don't have to navigate anything sticky. So that's why it usually provokes a pretty extreme reaction if you profane the temple by bringing cultural stuff in.

whatshisface said 5 days ago:

A word or symbol does not represent every single thing that can be free-associated with it. If you want to show that "1" represents the Washington Monument, it is not sufficient (or even necessary) to point out that they're the same shape, you have to show some evidence that there once was a culture that used the symbol "1" whenever they wanted to say "2 15th St NW, Washington, DC."

pasabagi said 5 days ago:

Using '1' to symbolise the washington monument would be reasonable, if the washington monument was something people very commonly referred to. It would also be reasonable to describe that as phallic. It wouldn't be reasonable to use 1 to refer to an address in DC, because you're jumping from three obviously formally similar things, to the street address of one of those things.

It's a pretty good example of what people mean by free association - and why what I'm(1) doing isn't.

1- I mean, obviously I'm about the millionth person to make this particular observation.

lioeters said 4 days ago:

You got a point there (pun intended :) about the symbol for zero.

I guess I was being generous with interpreting the parent comment, by mentioning the (possibly common and cross-cultural) visual association of a circle with womanhood, a straight vertical line with manhood.

I didn't mean to imply that the symbols for zero and one have direct historical origins in those ideas - like the parent comment might have suggested - but I do think there is a philosophical or artistic merit in drawing the analogy.

perl4ever said 4 days ago:

I never heard of this. Have you ever heard of the symbols also used for Mars and Venus indicating male and female?

https://en.wikipedia.org/wiki/Gender_symbol

According to that page, in some contexts, a circle means female and a square means male, but I've never heard of that before either.

lioeters said 4 days ago:

Interesting, I've seen the symbols for Mars and Venus being used to represent genders, but not a square or triangle for male.

As for zero/circle/woman and one/vertical line/man.. I guess in my mind, they couldn't be more obvious, universal and simple - to put it crudely: the hole and the stick.

But also: off and on, dark and light, absence and presence.

(So far, I have found no historical evidence that supports the above theory, no basis in logic or fact. I'll leave it as idle speculation on a possible primordial/mythological way of thinking in pictures, with perhaps two of the most primitive symbols.)

JasonFruit said 5 days ago:

I have always seen it this way too: a broken or completed circuit. But I've accepted that my initial understanding of the visual metaphor doesn't match its intent. I think misunderstandings like these result when we abandon written language to point and grunt at tiny pictures.

jaywalk said 5 days ago:

I've always looked at the line as a connection and the open circle as no connection. It's just what I came up with when I was young as a way to make sense of the markings.

nirvdrum said 5 days ago:

Interesting. I always took ⭘ to be an open circuit and ⏽ to be a closed circuit, using the symbol for a switch as the rough guide. Perhaps they're poor symbols if people can look at them and come up with rational reasons for each interpretation to be correct.

derefr said 5 days ago:

To be extremely pedantic, I think they’re correct at an implementation level: for most chips, as long as they’re receiving power at all from a power supply, they’ll run, unless a reset line is being held high. In most consumer electronics, that reset line is held high by default (by power coming off the wall or from the battery), unless power is supplied to a NAND gate coming from (a flip-flop in front of) the power switch on the front of the computer.

To “turn on” a computer (using the push-button at the front, rather than by flipping a power-supply toggle-switch) is, then, actually to SET that flip-flop, feeding a high input to the NAND gate, which in turn will turn off the reset line to the CPU.

(And, vice-versa, if your computer has a “reset button” on the front like some old computers do, that one throws the flip-flop back to its RESET state, which puts the NAND gate back low, which brings the reset line to the CPU back high. Wiring for push-button toggles is weird!)

LolWolf said 5 days ago:

Hmm, in most embedded processors I know (e.g., the ESP8266/32 [0], RPi's BMC2835 [1], or Ti's MSP430 [2]), you send the line down for reset but hold it high for running (either by internal pull-up or by actually holding the line).

---

[0] https://tttapa.github.io/ESP8266/Chap06%20-%20Uploading.html

[1] https://www.raspberrypi.org/app/uploads/2012/04/Raspberry-Pi... In particular, see first page, D15

[2] https://e2e.ti.com/support/microcontrollers/msp430/f/166/t/3...

Wildgoose said 5 days ago:

Why? It's clearly a binary switch and zero is traditionally false, i.e. Off.

alecmg said 5 days ago:

In several cars the air vents are marked with filled circle and empty circle and I always struggle to remember which means which.

perl4ever said 4 days ago:

I was looking at the dials on a refrigerator, and realized I never really knew which way is which - does turning the dial towards "cold" make it colder, or does the opposite, revealing more of the indicator from the "cold" end make it colder?

There is a tapered blue line with numbers, so is the wider end colder, because blue is cold and more blue is colder...or is the other end colder, because it has smaller numbers, and lower temperatures are smaller numbers?

Slartie said 5 days ago:

Filled circle means "vent is blocked, so no air can pass". Empty circle is an unblocked vent through which air can pass.

gmac said 5 days ago:

I think the problem is those meanings can plausibly be reversed, so it's hard to remember which way round it goes.

A better way to do it that I've seen is to represent air flow with wavy lines or similar on the 'open' symbol.

alecmg said 3 days ago:

Pretty sure in the last car it was the opposite. Filled circle meant air was coming.

So you can see why this is confusing

3JPLW said 6 days ago:

This is a great story.

Next up: I'd love to see the sub/super proposal get some more attention and effort.

https://github.com/stevengj/subsuper-proposal

moultano said 5 days ago:

Having the combining character version of that would be fantastic. Eventually, It would be amazing if all the details of math rendering could make it into unicode.

nixpulvis said 5 days ago:

> Eventually, It would be amazing if all the details of math rendering could make it into Unicode.

I'm not sure if I agree with you or not... Generally I'd say I do, but we're going to have a hard time "finding the line". Meaning what counts as "math"? Surely 1 + 1 is, as is ∇×𝐇, and we can start to do things like x⁰. However, what about a graph with nodes and edges (just as an example)? Is that "math"?

One things strikes me about strings of characters... you can select and copy/paste them (at least in my native alphabet of Latin) very reliably. This property is not present with Unicode in general.

chris_wot said 5 days ago:

It’s not easy to select characters in Arabic and many Vedic languages.

WorldMaker said 5 days ago:

Microsoft published a technical note on using a Unicode linear encoding for math rendering several years back:

http://www.unicode.org/notes/tn28/UTN28-PlainTextMath-v3.pdf

Office apps support it, apparently, as one among many representations for math zones. A blog post on it:

https://blogs.msdn.microsoft.com/murrays/2016/09/07/unicodem...

billpg said 5 days ago:

We need...

  1. A code-point that starts a fraction.    
  2. A code-point that separates the numerator from the denominator.    
  3. A code-point that ends the fraction.    
So the sequence [START]x[SEP]y[END] would be rendered with x above the y with a line separating.

[START][START]x[SEP]y[END][SEP]z[END] would be a fraction with a fraction inside.

slavik81 said 5 days ago:

The fraction slash (U+2044) was supposed to be a simpler version of that, but it's not very well supported. https://en.wikipedia.org/wiki/Unicode_subscripts_and_supersc...

ygra said 5 days ago:

"All details of math rendering" probably fall squarely into "markup" territory, thus outside Unicode. At least so far any efforts trying to standardize this led to this. Math is generally nothing I'd call plain text.

wl said 5 days ago:

Should Unicode be able to represent Egyptian Hieroglyphs? The lack of similar facilities for Egyptian is why Unicode is useless for representing hieroglyphs, despite having a goodly number of signs encoded.

logfromblammo said 5 days ago:

Or--dare we dream it?--chemistry notations.

slavik81 said 5 days ago:

This would be great. I need a way to cleanly document code implementing equations that contain letters in the superscripts and subscripts. x_i^j, for example, might be the position of Body i in Frame of Reference j. I can mentally pack it back together when it's just the one, but it would be nice if it could more closely resemble the original equation.

Alas, I was told that was "exactly what Unicode doesn't want to endorse."

ramshorns said 5 days ago:
said 5 days ago:
[deleted]
gambiting said 5 days ago:

I love how this was apparently accepted into the standard in 2016, and yet Chrome still displays an empty square instead. Unicode is such an unbelivable mess when it comes to support it's crazy. Windows displays it correctly for me in Word, but when pasted into Teams it comes out as a semicolon instead. Brilliant.

toyg said 5 days ago:

But it’s not a mess, there is a clear chain of responsibility: * consortium -> font developers -> app developers. If anything is broken, bitch to the first level (typically app devs) and wait for your grievance to percolate upwards as necessary; if it doesn’t happen, you know who to blame.

* most modern operating systems have hopefully sorted out their issues a long time ago.

LunaSea said 5 days ago:

If you have to complain, it's broken. As a consumer I don't care what the issue is and who the responsible party is.

Unicode is a clusterfuck exactly because the chain is too long and the implementation errors are too easy to make and the world is rife with incomplete implementations.

spookthesunset said 5 days ago:

Every global standard is a clusterfuck at some level. That is just the nature of the beast.

Look at any widely adopted standard, be it exit signage, networking protocols, time standards anything... you’ll find a cluster.

If you tore down every one of those and tried to rebuild “from first principles”, eventually it too will become a clusterfuck.

Wowfunhappy said 5 days ago:

Is it really so surprising that updating a universal standard takes time?

IMO, these processes should happen slowly.

LunaSea said 5 days ago:

It has nothing to do with the update speed and everything to do with the standard itself. Even if you were freezing Unicode now, you wouldn't encounter complete, correct implementations in the wild more than 50% of the time in the next 5 years.

hombre_fatal said 5 days ago:

Makes sense to just use image-replacement if you need emoji to show up reliably, like on a landing page.

Also spares you the problem of emoji implementation differences. https://www.mentalfloss.com/article/516048/22-emojis-look-co...

mschuster91 said 5 days ago:

>* most modern operating systems have hopefully sorted out their issues a long time ago.

Except Android where the system fonts have to be updated by the manufacturer. What a fucking mess.

edent said 5 days ago:

It has been 3 years, and Android & iOS still don't fully support Unicode 9. Any ideas who I should bug about that?

Google's Noto font hasn't released anything for 2 years. No idea if Apple's default font is open source.

nixpulvis said 5 days ago:

I love how iOS doesn't understand this URL, and still doesn't have these characters. Too busy removing the Taiwanese flag, and implementing "Animojis".

phkahler said 5 days ago:

I still think unicode should not have added emojis. The big guys are adding animated emojis now, and that's clearly out of scope for character sets. If they continue down that path unicode will eventually become SVG with animation. IMHO they should have stopped short of emojis.

greggman2 said 5 days ago:

I'm glad they decided to be inclusive since 100-200 million people were using emoji before they got added to unicode and without adding them all of those people's conversations and communication patterns would have been pissed on and likely those countries with high populations of people using them would never have switched to unicode because their customers would have demanded to keep them which would mean we'd have the nightmare of multiple encodings again. No thanks.

I love emoji and have since I started using them in 1999 (weren't added to unicode til 2010). Most of my friends and family love them, it makes their communication richer and more easily nuanced so I'm really glad they were added.

notatoad said 5 days ago:

whether you like the emojis or not, they've certainly helped to drive unicode adoption. Good luck making the business case for why you're building support for some obscure international character set into your product, but tell somebody you're adding support for the latest emojis and all of a sudden having unicode 10 support is a big deal.

microtherion said 5 days ago:

> whether you like the emojis or not, they've certainly helped to drive unicode adoption.

Especially since many of them are outside the BMP, so adoption forced people to go beyond UCS-2.

Grue3 said 5 days ago:

IMO Unicode shouldn't have ever added characters with color. Let color be the property of the markup. The original emoji were all monochrome, and mainly displayed on monochromatic flip-phone screens. The fact that there now exist emoji that are otherwise identical but have different color is absolutely idiotic.

kalleboo said 5 days ago:

> The original emoji were all monochrome

Emoji were in color before they were added to Unicode. Even the original iPhone 3G Emoji release pre-dated their addition to Unicode (Apple used the SoftBank encoding). They were even in color on flip phones as far back as 1999 https://emojipedia.org/softbank/1999/

icebraining said 5 days ago:

In Unicode, they're represented by a combination of emoji character + colour character, not by new characters: http://www.unicode.org/reports/tr51/#Diversity

Seems a decent solution to me. Monochrome screens can just ignore the colour character.

buckminster said 5 days ago:

That doesn't add colour to, say, the smiley face, it just changes the colour from the default, usually yellow.

As well as adding colour browsers display emoji at a larger point size. It is really annoying when some pre-emoji symbol (like the phases of the moon in TFA) gets emojified and where your document had some discreet use of symbols it now has huge colourful icons drawing the eye.

And afaict there is no way of turning off this behaviour globally.

zyx321 said 5 days ago:

There are at least 7 variations of the heart symbol ️ differing only in color ️🧡, not including the heart (playing card) which also has multiple variations.

Edit: apparently HN limits the number of emoji per post, I originally included several.

https://emojipedia.org/white-heart/

bhaak said 5 days ago:

There are 27 variations of the capital letter A in Unicode (see the confusable list https://unicode.org/cldr/utility/confusables.jsp?a=A&r=None).

Redundancy has never been a problem with Unicode. After the decision was made to add the symbols of the original encoding to the list of Unicode characters, they had to add all heart variation, otherwise it wouldn't have been able to be backward compatible.

Or alternatively add a color modifier? I'm not sure that would have been a better solution.

ahakki said 4 days ago:

> apparently HN limits the number of emoji per post, I originally included several.

I am surpeised HN allows any emoji an all. Pretty sure they have been totally banned in the past. It seems like the orange heart is now (or has always been) whitelisted. 🧡🧡🧡

kalleboo said 5 days ago:

Now? Emoji were animated even before they were added to Unicode https://emojipedia.org/softbank/2006/

RandallBrown said 5 days ago:

Animoji is very different from using an animated gif in place of an emoji.

kalleboo said 5 days ago:

I didn't take it the GP was talking about Animoji, as those are motion-captured, rendered in 3D and sent as pre-rendered video files (GP mentioned something that could be represented as animated SVG). I don't think anyone is proposing a way to encode facial motion-capture into Unicode.

nixpulvis said 5 days ago:

While I agree with your sentiment, can you define "emoji"?

NateEag said 5 days ago:

A graphical or iconographic concept that cannot be unambiguously represented by a single character in any natural language and which does not have a single standardized rendering (e.g. the Power symbol does have a single standard rendering and is therefore not an emoji).

I'm no character or Unicode expert - that's just a wild stab in the dark, largely based on why I dislike emoji being in Unicode.

jimnotgym said 5 days ago:

I still prefer ASCII style smilies to emojies ;-)

I promise to never do that on HN again!

ksaj said 5 days ago:

My sister pointed out that I'm a little random about mixing them. I use both, based solely on what I feel like typing at the time. And I'm acutely aware that the emojis she sees are not the ones I do, but the ASCII ones are pretty much the same, since I use Android and she's an Apple-head.

Besides, I love the slobbering ASCII guy ;-p~~~~ None of the emojis let you amplify slobber according to need like that.

schindlabua said 5 days ago:

One emoji I wish I had a good ASCII alternative for is :thinking: U+1F914. There is :^) but only the skype generation will make sense of that.

said 5 days ago:
[deleted]
commandlinefan said 5 days ago:

I've changed my emoji writing style from the canonical ;) to ; ) just so that editors won't auto "correct" my proper ASCII emoji into an unholy yellow winking face.

jimnotgym said 5 days ago:

That is a great tip! I wish I had thought of that : )

AnIdiotOnTheNet said 5 days ago:

At a certain point it becomes much simpler to encode your "text" as SVG rather than use a convoluted and bloated standard like Unicode.

spookthesunset said 5 days ago:

For 99% of all development, Unicode is a non-issue. Perhaps you have to consider Right-To-Left support and that is about it. Even then good RTL support is more about page layout than the actual mechanics of displaying the text.

If you have a better idea to enable people around the world to use computers in any language that doesnt look something like Unicode, I’m sure everybody will adopt it.

AnIdiotOnTheNet said 5 days ago:

I get that I'm in a minority here, but I'm not even sure such a thing is necessary or a good idea. Generally speaking simple things are better than complex things, and from a computing perspective human written languages are ludicrously complicated. I'm not convinced that complicating all our software to support poop emoji is better than allowing human language to adapt to the medium in the manner it has done since language was first written down.

> For 99% of all development, Unicode is a non-issue.

Until it suddenly is an issue. Like, dealing with unpaired surrogates in Windows file paths.

I mean, we're talking about an encoding where you need a rather large database and quite a bit of code just to tell how many characters are actually in a string.

ewoodrich said 5 days ago:

Replacing Unicode characters with SVG representations would be terrible for accessibility.

nixpulvis said 5 days ago:

Seriously, try sending a link to this page to a friend on iMessage... they won't be able to click it.

ggm said 5 days ago:

Power up stars for the first person to successfully argue for removal of a UNICODE character.

taejo said 5 days ago:

Encoding Stability

Applicable Version: Unicode 2.0+

Once a character is encoded, it will not be moved or removed.

This policy ensures that implementers can always depend on each version of the Unicode Standard being a superset of the previous version. The Unicode Standard may deprecate the character (that is, formally discourage its use), but it will not reallocate, remove, or reassign the character.

http://www.unicode.org/policies/stability_policy.html#Encodi...

ggm said 5 days ago:

Awww.. no fair

nixpulvis said 5 days ago:

Scanning quickly through I found this one: ℻ (https://unicode-table.com/en/213B/).

Apparently it's semantically fac·sim·i·le. Which means (according to Google) "an exact copy, especially of written or printed material."

My argument is something like, why not just write FAX. Or is the counter that some fonts will specialize this character to something closer to the native language? That seems unlikely, and instead people will probably learn that FAX means "to make alike", from Latin. Or is it that we need to make it just a little bit above the baseline to indicate that it's special. Surely "FAX" isn't the only thing that should be allowed to be special, right? But then that's a whole can of worms. Anyway, I'm rambling...

This was the best I could do in the limited time I had (I really should be asleep by now).

kuschku said 5 days ago:

On business cards, in letters, etc you write your phone numbers.

Usually as

TEL 0123 45 67 89

FAX 0123 45 67 89-0

the "TEL" and "FAX" part should be superscript small capitals though. That's there these special symbols come in.

Some fonts also replace those with icons for phone/fax.

They're actually still in use in Germany today.

cyxxon said 5 days ago:

Well, fax machines are still in use here (most notably in mdeical offices, it seams), but superscript abbreviations? Never seen them in the last 40 years. I mean, yes, I have seen the letters FAX or TEL in front of the number, but never really explicitely as a new character (AFAI can tell).

This would also fit best with the experience that people crafting cards or letterheads never really know all the intricacies of Word, or Unicode, or whatever they use, and just "make it look good" - use tabstops instead of tables, simply type FAX and mark it as superscript, etc...

kuschku said 5 days ago:

> This would also fit best with the experience that people crafting cards or letterheads never really know all the intricacies of Word, or Unicode, or whatever they use, and just "make it look good" - use tabstops instead of tables, simply type FAX and mark it as superscript, etc...

I mean, that’s widely known anyway. Look at most letterhead templates online, pretty much all of them are broken and quite painful in the way they’re built. Broken tables, tabstops, all combined painfully.

Often enough proper tables would simplify a lot, if combined with columns one can create amazing things. If one even adds automated hide/unhide elements (e.g. page count, automated Internetmarke or hiding it if unused, etc) one can create stuff that’d save hours of work every day.

mschuster91 said 5 days ago:

Not just in Germany (where they are a legally accepted alternative to postal letters, which is a significant difference to emails!), also in Japan: https://www.noted.co.nz/money/money-business/japan-has-a-biz...

kuschku said 5 days ago:

Actually, nowadays PDFs in email are also accepted instead of fax! (or emails signed with the ePerso S/MIME function).

So far a court, a health insurance company, the national pension fund, and some municipal administrations I've had contact with all have accepted PDFs attached to emails just fine :)

And if you ever need actual fax functionality, the most common home router (Fritz!Box) has a virtual fax machine integration, so you can send and receive faxes with an app or an email gateway from your own VoIP SIP "landline" connection.

mongol said 5 days ago:

I assume it was intended to indicate after a telephone number that a telefax was connected to it. For example when writing contact details on a business card.

flomo said 5 days ago:

Probably right on the rationale, but I cannot remember anyone using FAX as a symbol. But, not the only obsolete bit of technology in Unicode though.

ksaj said 5 days ago:

It is in regards to a facsimile machine, its operation, and the output from it. FAX doesn't refer to every meaning of facsimile out there.

https://www.google.ca/search?q=define%3Afacsimile+machine "a device that can send or receive pictures and text over a telephone line."

This search is more appropriate.

said 5 days ago:
[deleted]
amyjess said 5 days ago:

Most likely, some legacy character set had a ℻ character, so Unicode had to include it for compatibility reasons.

notatoad said 5 days ago:

I imagine the only way that will ever happen is a situation like the gun to water pistol transition, where enough implementers agreed to change the representation.

jakeogh said 5 days ago:

Imagine if we didnt just have a snowman, but also happened to have a cute bear.

Wowfunhappy said 5 days ago:

> There was some discussion around ⏾ as several “moon” characters already existed. None of them [...] convey the semantic meaning of “Sleep” – so ⏾ was accepted.

I'm not convinced ⏾ conveys that meaning either, unless it's explicitly used alongside the other new symbols. And if it is used alongside those symbols, a couple of the existing moons could also work just fine.

(⏾ is appearing as tofu on HN for me, but I'm just going to roll with it.)

re said 6 days ago:

It's disappointing that OS-included font support for these has apparently lagged. iOS 13.2 supports emoji from Unicode 12[1], but doesn't ship with a font that includes these symbols. Anyone know if any OSes do support them?

[1] https://emojipedia.org/apple/ios-13.2/new/

cosmie said 5 days ago:

iOS 13.3[1] support these symbols.

[1] The 13.3 Beta I'm currently running on my 6S Plus renders them fine.

re said 5 days ago:

By the way, if you're viewing them on the unicodepowersymbol.com website, they show up because of a web font included on the page. So here they are on a site without the font: ⏻⏼⭘⏽⏾

Also here: https://en.wikipedia.org/wiki/Power_symbol#Unicode

cosmie said 5 days ago:

Ah, that makes sense. I retract my statement – those symbols are still MIA on 13.3.

aembleton said 5 days ago:

Does anyone know how to add these to Ubuntu 19.10? When I search for Power in Characters, it has the 'Power On' character but you can't see the symbol [1]

1. https://imgur.com/a/4qMgPkv

yoloClin said 5 days ago:

Now if only we could get Hacker News to support emojis

Edit: This comment ended in man-shrugging, but Hacker News stripped the emoji.

athenot said 5 days ago:

They used to be supported until they were abused.

colejohnson66 said 5 days ago:

Maybe only allow them if you have a certain reputation/points like downvoting and flagging has? Then if it’s abused, remove that privilege for that person (IIUC, flag rights can be revoked)

ahakki said 4 days ago:

You get a single emoji on HN 🧡

said 5 days ago:
[deleted]
Buttons840 said 5 days ago:

> What other useful and/or important symbols are missing from Unicode?

Seems like a good time to ask this question again. Any new answers?

sjwright said 5 days ago:

The USB, Wi-Fi and Bluetooth symbols.

A spray can to represent spray paint, insect spray, or spray lubricant.

There's a power plug but no power socket.

There's no staples or stapler.

There are no appliances—no oven, microwave, toaster, mixer, washer, or vacuum cleaner.

There's no traffic cone.

duskwuff said 5 days ago:

> The USB, Wi-Fi and Bluetooth symbols.

I believe those are all trademarked. Unicode tries to avoid those.

> A spray can to represent spray paint, insect spray, or spray lubricant.

Interesting idea! I wonder if there'd be any interest in getting wider coverage of other bitmap paint tools: rectangular selection, lasso, paint bucket...

> There's a power plug but no power socket.

That's problematic from a localization perspective, as electrical sockets vary widely from country to country, and some of them may be difficult to recognize as a socket at a small size. For example, some European countries use a socket which is made of three circular pins in a straight line -- it'd just look like an ellipsis.

Besides, there isn't a lot of symbolic meaning that's conveyed by a socket that couldn't be expressed just as well with a plug.

> There's no staples or stapler.

Maybe. There isn't a lot of symbolic meaning to these either, though.

> There are no appliances—no oven, microwave, toaster, mixer, washer, or vacuum cleaner.

A lot of those will just look like white boxes at text size, and -- again -- they don't have a lot of symbolic meaning.

You might be able to make a case for an upright vacuum cleaner, though, since that's visually distinctive and is associated with cleaning.

> There's no traffic cone.

Oh, I like that idea. It's got some symbolic meanings, too, like "warning" and "under construction". There is already a construction sign (U+1F6A7), though.

Doxin said 5 days ago:

To be fair, a lot of existing unicode points don't really have a semantic meaning either. Eggplant only got a semantic meaning after becoming a part of unicode.

Symbiote said 5 days ago:

It means aubergine, doesn't it? At least originally. It's distinctive.

For cleaning, a broom is probably better than a vacuum cleaner.

A broom already exists 🧹

majewsky said 5 days ago:

"Aubergine" and "eggplant" refer to the same vegetable. The "semantic meaning" is probably the NSFW one.

Symbiote said 5 days ago:

ᚼᛒ

The Bluetooth logo is two runes, which perhaps could have a zero width joiner between them in iOS emoji abuse style.

joosters said 5 days ago:

I guess the power socket might prove contentious, since they look very different across countries... power plugs, although variable, can be identified more easily since they look more alike and have a power cord attached.

The traffic cone would be good for VLC! They could give rename their app so the name matches the icon :-) That got me thinking: are there any common apps that use emojis in their name?

colejohnson66 said 5 days ago:

Rename VLC to “TRAFFIC CONE GLYPH”?

sjwright said 5 days ago:

A dim/dark light bulb to complement the bright light bulb.

The mouse pointer symbol.

More than one breed of dog.

MayeulC said 4 days ago:

I miss 7-segments displays. It could be 8 Unicode characters, for instance (empty, plus one per segment), or some other scheme.

A hiking/backpack emoji would be useful at times, but then a lot of other activities could make it in: Rollerblades, music instruments, etc.

One pet peeve of mine is the "person shrugging" [1] emoji looking identical to "woman shrugging" [2] with most fonts. I prefer to use genderless emoji, but that does NOT look genderless. I also dislike skin color selectors, but again, that's personnal preferrence.

[1] https://emojipedia.org/shrug/

[2] https://emojipedia.org/woman-shrugging/

slavik81 said 5 days ago:

There's a few sets of punctuation marks that look quite interesting. Wikipedia currently uses SVGs to represent the characters. https://en.wikipedia.org/wiki/Punctuation#%22Love_point%22_a...

dang said 5 days ago:
oska said 5 days ago:

Input into a decision like this might be one of the best approximations of achieving immortality.

peterburkimsher said 5 days ago:

That post inspired me to submit some Unicode characters as well! I found 11 Hakka & Taiwanese characters when trying to scrape & parse the Bible as plain-text, and wrote a blog post about my experience of the submission process.

https://news.ycombinator.com/item?id=17968110

kensai said 5 days ago:

Great work, but we are still missing a "tinfoil hat" symbol in Unicode (or as Emoji). This is a major letdown! :D

ericfrederich said 5 days ago:

Even the existing repurposed one doesn't work on my up-to-date Linux Mint installation.

http://www.fileformat.info/info/unicode/char/2B58/browsertes...

a3n said 5 days ago:

The "advisor" to the effort is doing interesting work on nano-grids, timely because of the PG&E shutoffs.

> We had the very generous help of Bruce Nordman, who was involved in the original IEEE 1621 standard.

http://nordman.lbl.gov/

kuharich said 5 days ago:
imhoguy said 5 days ago:

Is there a practical upper limit of Unicode set capacity? Kind of IPv4 limits with all reservations and local/multicast quirks.

kijin said 5 days ago:

Unicode has 17 planes. Each plane has 65,536 code points, so the total capacity is 1,114,112 code points. In practice it's a bit less, thanks to surrogates, private areas, and a bunch of "non-character" code points. That still leaves close to a million code points.

Last time I checked, just over 13% of the available public space was allocated. Most of the planes remain unused.

detaro said 5 days ago:

And combinations are used, so e.g. a new emoji only takes zero to one new code point, not one code point per variation. (zero because if I remember right the emojis for families are just something like "woman + boy + girl + man", all existing characters, joined by Zero-Width Joiners)

colejohnson66 said 5 days ago:

Why 17 and not a round number like 16? That would give a nice “round” one mebicodepoints

kijin said 5 days ago:

BMP + 16 planes.

You can blame UTF-16 for this mess. Unicode was originally meant to be able to encode two billions (2^31) characters. It bent over backwards to accommodate the limits of the bastard child that is UTF-16.

https://en.wikipedia.org/wiki/Plane_(Unicode)

colejohnson66 said 4 days ago:

Maybe they did it because Windows was backwards and used UCS-2, and later, UTF-16? If somehow, Windows managed to switch to UTF-8, I’m sure they (Microsoft) would mess it up and keep the 4 byte limit (imposed by Unicode) there even if it’s later removed (for backwards compatibility). What Microsoft really needs to do, IMO, is rewrite the Windows API to use UTF-8 or UTF-32. Make a `wwchar` type or something...

kingludite said 5 days ago:

I really like the smiling poop and everything but [personally] I've only ever needed a feed icon⸮

blt said 5 days ago:

I have some good ideas for emojis, but I'm guessing that's even harder to get through...

said 5 days ago:
[deleted]
phkahler said 5 days ago:

Poop is in there, and the love hotel.

kalleboo said 5 days ago:

Those have a compatibility story though

said 5 days ago:
[deleted]
joshdance said 5 days ago:

Very cool. Never really considered how new characters were added. Thanks for sharing!

Havoc said 5 days ago:

Any idea why 4/5 don't display until I enable javascript?

unhammer said 5 days ago:

Wonderful :) Now can we please have this:?

https://www.xefer.com/2008/03/interrocolon

colejohnson66 said 5 days ago:

We already have the interrobang, no? So why not this?

deaps said 5 days ago:

>> ⏻ To The People!

That line at the end made me smile.

said 6 days ago:
[deleted]
pdq said 6 days ago:

led, not lead.

dang said 5 days ago:

Fixed. Thanks! Those are surprisingly hard to spot sometimes.

pvg said 5 days ago:

Moderation software idea: globally replace lede/lead with led in any discussion about lede vs lead. Would bury the led once and for all.

notatoad said 6 days ago:

Is there a reason this is being posted now, or does this need a (2016) in the title?

CaliforniaKarl said 6 days ago:

It indeed was covered here at the time...

https://news.ycombinator.com/item?id=11958682

https://news.ycombinator.com/item?id=11952765

... so yes, as per convention it should have a (2016). The best way to get the message to the mods is via email (using the Contact link at the bottom of the page).

dang said 5 days ago:

I'm guessing it seemed like it would be even more awesome than https://news.ycombinator.com/item?id=21686643.

2016 added.

nsxwolf said 5 days ago:

What is a half character?

andai said 5 days ago:

For power off button they reused Heavy Circle.

grzm said 5 days ago:

From the article:

> "So, ⭘ is our ½ character "

LilBytes said 5 days ago:

Oh the irony of this comment not working. :)

nixpulvis said 5 days ago:

Works for me :P

s_gourichon said 5 days ago:

I can't tell if the rendering I see is correct or not. Can you describe the correct rendering and some incorrect yet probable one(s)?

oftenwrong said 5 days ago:

It is a "heavy circle".

This page may help: https://decodeunicode.org/en/u+02B58

saagarjha said 5 days ago:

a character is a character. You can't say its only a half

unwind said 5 days ago:

The symbol was already present, but it was given new documented semantic content ("meaning"). Previously it was just a circle, now it's also the symbol for "power off".

cyborgx7 said 5 days ago:

>You can't say its only a half

The character is only half a character.

There, I just did.

stiray said 5 days ago:

I know a guy, who was fighting hard with unicode consortium for adding two characters with meaning "begining of blob, end of blob". Imagine how simpler the coding would be without the need to escape anything. Unfortunately he didnt succeed. They were more busy adding smileys.

bonoboTP said 5 days ago:

You'd have to escape those new symbols if they occur within the blob.

said 5 days ago:
[deleted]
nixpulvis said 5 days ago:

Just add a `\` /s

joosters said 5 days ago:

That's too lazy, we need unicode symbols for escaped start-blob and escaped end-blob /s

majewsky said 5 days ago:

Not sure why /s. Also, you only need escaped-end-blob. start-blob can safely be used inside a blob. So it's only 3 new codepoints, which is genuinely an interesting proposal.

Dylan16807 said 5 days ago:

Because that doesn't round trip. You need to be able to distinguish whether the blob originally had "end-blob" or "escaped-end-blob". So now you need another character for double-escaping, and so on, and so on.

To avoid that issue you're back to either adding a backslash, or doubling up the character inside the blob... but if you're doing that you could have just used " all along. No need for new characters!

If you really don't want to change the contents of the blob, and can't length-prefix, then you could also use a new UUID as your delimiter each time you embed a blob.

stiray said 5 days ago:

No. It depends only on developers. If they would start embeding it into format, they would break the rules. If not, it would work.

Dylan16807 said 5 days ago:

If you're telling developers not to embed that character anywhere then you only need one end of blob character. So not 'no', I'm still right in saying that having two end characters is a non-solution.

And you still didn't explain why the existing ASCII control codes don't solve your problem. The suitable ones are also not supposed to appear inside text.

chungy said 5 days ago:

What's wrong with U+0001 and U+0004?

kps said 5 days ago:

They have different purposes. See ISO 1745 aka ECMA-16. Basically it goes:

  (SOH header STX body ETX)⁺ EOT
Dylan16807 said 5 days ago:

Or two private use characters, if you want to be ASCII-clean or 8-bit-ish-clean.

gregmac said 5 days ago:

It's an interesting idea, but you still wouldn't be able to encode the byte sequence for "end of blob" as part of the blob data... without escaping it.

stiray said 5 days ago:

They shouldnt occur in blob as they would be designated just for this. All other characters can occur in real world, those two could get there only by mistake.

viraptor said 5 days ago:

What about a blob containing a page which contains blobs itself?

bonoboTP said 5 days ago:

What if I want to put them in a comment like this, to discuss them? Well, you might say I shouldn't do it and should just refer to their code points, but basically that's a form of escaping as well.

wruza said 5 days ago:

How much blob is blob? If it is binary, then there is a chance of clash, no matter how you dare them bytes to do that. If it is always-valid utf-8, then use 11111111 and 11111110 (or any other out-of-band markers available for source encoding) to open and close your message. Not sure what’s the point for it though.

said 5 days ago:
[deleted]
rtpg said 5 days ago:

but what if you need to send a word document containing an explanation about how to use the blob characters?

Turtles all the way down....

adrianmonk said 5 days ago:

Are we talking putting a sequence of 8-bit bytes (octets) into Unicode characters by mapping all 256 possible byte values to the first 256 Unicode code points?

If so, if I'm not mistaken, this is actually less efficient than just using Base64 encoding.

When you put those Unicode characters into UTF-8, the first 128 code points are going to require one byte (with a leading 0 bit). The other 128 of them are going to require two bytes. So that's 50% overhead (assuming the blob's bytes are evenly distributed) because half of the values have 0% overhead and the other half have 100% overhead.

Meanwhile, Base64 sticks 6 bits in each encoded character. In 4 characters, you can fit 3 bytes of your raw info. So that's only 33% overhead.

kps said 5 days ago:

Also, it was done already. See ISO 2111 aka ECMA-24.