2015-09-28

This ex-libertarian endorses Sanders

I have less than billionaires, or even the wealthy millionaires, but more than most people. What I have, I built "myself": this is to say, with much personal effort; but also with critical, non-negligible components of luck, and considerable help and work done by the right other people. My own personal work has been indispensable – but on its own, it wouldn't have been enough.

From my little perch, it seems to me that claiming one's little empire and yelling "I built all of this myself!" is nothing short of hypocrisy and egocentrism. There's no way one person builds an empire. Hundreds, or even thousands of people build it. The leadership provided by one person or several is critical, but you are not yourself building the empire. Other people are building it for you, with your partial guidance.

For there to be thousands of people who can help you build your empire, there has to be infrastructure before your business even starts. There have to be schools for your future employees to be educated. There have to be roads and telecommunications. There have to be hospitals and doctors. There has to be peace and order. There has to be a community.

Standing on top of this empire, and yelling to the world how "I built all of this myself! I want all of the rewards for me, and only me! I am an island! Any taxation is stealing!" seems to me so utterly... blind; hypocritical; narrow-minded; self-absorbed; and oblivious to other people's contributions.

So, yeah. I definitely believe that US politics would benefit a lot from moving in the direction of Sanders.

Full disclosure:

I'm originally from a high-tax European country, where taxation and bureaucracy felt oppressive. I was libertarian, and moved to Costa Rica, where I currently benefit from no taxation on foreign income. I have spent years in multiple countries where I saw the results of low taxation and low public investment. I'm now applying to move to the US, and if accepted, I expect to pay significant income tax there, in exchange for a better lifestyle, mostly due to those taxes.

I used to be libertarian before I succeeded (relatively speaking: I can't afford a jet, or anything like that). Now that I have – it seems fairly obvious how much of other people's work it takes. Not just to succeed; but to have an environment in which it is worth succeeding.

2015-09-11

When monospace fonts aren't: The Unicode character width nightmare

Some things haven't changed since the 1970s. Programming is still done in text files; and though we have syntax highlighting and code completion, source code is still best displayed in monospace.

Other aspects of computing work best with monospace, also. The Unix shells; PowerShell; the Windows Command Prompt. Email is still sent with a copy in plaintext, which has to be wrapped on a monospace boundary. Not least, this persists because HTML email is excessively difficult to render securely, and there are user agents that still work better with plaintext.

In all of these situations, the problem presents itself that the originator has to anticipate how text will be rendered in advance. You cannot just send text and expect the recipient to flow it. You have to predict the effects of Tab characters correctly, and word wrap the text in advance, often not knowing the software that will be used for display. In terminal emulation, e.g. xterm via SSH, when the server sends the client a character to render, the server and the client need to agree by how many positions to advance the cursor. If they disagree, the whole screen can become corrupted.

As long as you stick to precomposed Unicode characters, and Western scripts, things are relatively straightforward. Whether it's A or Å, S or Š – so long as there are no combining marks, you can count a single Unicode code point as one character width. So the following works:
	aeioucsz
	áéíóúčšž
Nice and neat, right?

Unfortunately, problems appear with Asian characters. When displayed in monospace, many Asian characters occupy two character widths. How do we know which ones?

Our problems would be solved if the Unicode standard included this information. Unfortunately – as far as I can tell – the Unicode Consortium takes the stance that display issues are completely the renderer's problem, and makes no effort to include information about monospace character widths. (Edit – incorrect: see update below.)

If you're on Unix, you may have access to wcwidth. However: "This function was removed from the final ISO/IEC 9899:1990/Amendment 1:1995 (E), and the return value for a non-printable wide character is not specified." What this means is that the results of wcwidth are system-specific.

In 2007, Markus Kuhn implemented a generic version of wcwidth, which we now use in the graphical SSH terminal console in Bitvise SSH Client. However, this is more than 8 years old at this point, and is based on Unicode 5.0, whereas the current latest version is 8.0.

So I had the idea that maybe we could "just" extract up-to-date information from Windows. It's 2015, the following should render well, right?
	aeioucsz
	áéíóúčšž
	台北1234		(leading characters should be 2 spaces each)
	abcdefgh
	QRS12		(fullwidth latin; should be 2 spaces each)
	abcdefgh
	アイウ1234		(halfwidth kana; should be 1 space each)
	abcdefgh
It turns out – no. Perhaps you have an operating system with proper monospace fonts, which displays all of the above lined up. On my Windows 8.1, the problem looks like this:

IE Chrome Firefox Notepad VS 2015

Note how nothing lines up: not in Internet Explorer; not in Chrome; not in Firefox; not in Notepad; not in the latest version of Visual Studio – the environment in which Windows is developed (Edit: apparently not - see comments). Half-width kana are displayed kinda correctly by the Consolas font used in Notepad and Visual Studio; but that's it.

It turns out, when locale is set to English (United States), Windows just doesn't seem to use monospace fonts for Asian characters. Indeed, setting the Windows locale to Chinese (Simplified) produces this:


This is better; but now, the half-width kana are borked. sigh

Note that the above isn't a Windows problem only. This is how the same text displays on Android:


It boggles my mind that it's 2015, and we still don't have a single, authoritative answer to this question: how many character positions should each Unicode character occupy in a monospace font?

Discussion

Because I'm providing examples of incorrect character rendering, this may offer the misleading impression that this is just a font problem.

This isn't just a font problem. It's that there's no standard monospace character width information, independent of font used.

The above incorrect renderings involve systems using non-monospace fallback fonts. However:
  • Even if you only have a fallback font that's not mono, you can coerce it into the right character positions if you know the character widths. The above examples could work correctly – although the renderings might be less than perfect – if software knew the intended character widths.
  • Even if you do not have a fallback font, and are just displaying placeholder boxes – you still need to know character widths to render the rest of the text properly, and for Tab characters to work.
Operating systems could work around this problem by providing better font support. We now have terabyte hard drives, so there's no reason all cultures shouldn't be simultaneously supported. However, that still leaves the underlying issue – that we need standardized monospace character widths.

Update and additional information

It turns out that Unicode does in fact provide character width information for East-Asian characters. It's just not as neat as one number. When is it ever? :)

The information is in EastAsianWidth.txt, which is part of the Unicode character database. The data provides an East_Asian_Width property, which is explained in this technical report.

This is basically what is needed... with some unfortunate limitations:
  • Hundreds of characters are categorized as ambiguous width (property value A). These characters include anything from U+00A1 (inverted exclamation mark, ¡) to U+2010 (hyphen, ‐) to U+FFFD (replacement character, �). Many of these characters (but not all!) have different widths depending on system locale. For example, U+00F7 (division character, ÷) has a width of 1 on Windows under English (United States), but a width of 2 under Chinese (Simplified, China).
  • In some cases, width can differ even between different fonts under the same locale. For example, on Windows under Chinese (Simplified, China), U+FFFD (replacement character) renders as narrow (1 position) with a raster font, and wide (2 positions) as TrueType.
  • Some characters categorized as one width are still displayed as another width by certain systems. For example, U+20A9 (Won sign, ₩) has width property value H (half-width), but is displayed as wide (two positions) by Windows under locale Chinese (Simplified, China). It is displayed as narrow under locale English (United States).
There are also scripts like Devanagari that just don't seem to have a monospace representation. I was unable to get Windows to display Devanagari characters in console. They do display in Notepad, but they don't obey any kind of monospace font rules, at all.

There are other efforts to provide information on character widths, including the utf8proc library that's part of Julia. Interestingly, this library derives its information by extracting it from Unifont. Unifont, in turn, is an impressive open source Unicode font with a huge coverage of characters.

2015-09-03

C++ Relocator proposal

Last month, I spent two weeks working on the following formal proposal for a new C++ feature:

Relocator: Efficiently moving objects

After incorporating much feedback in the C++ Proposals forum, I believe this proposal represents not only my ideas; but close to a consensus of everyone who expressed interest in this feature. I believe the document is fairly polished. I have submitted it as proposal P0023 via Ville Voutilainen, chair of the Evolution section of the ISO C++ working group (WG21).

It so happens that Ville is also the person who most vocally disagreed with my observations last month about problems in C++ standardization – to the extent of us colliding in a somewhat fiery altercation:

Ossification and Hawaii: Impressions of a C++ working group

When I submitted this proposal, Ville reiterated his position that I need to find a champion to represent it in the next WG21 meeting in October in Hawaii. This is how the Working Group goes about its work.

So far – despite considerable positive feedback – no one has volunteered to actually go to Hawaii for this. It seems most likely that I, also, will not be in a position to make the trip.

I think this will provide a data point about whether, and to what extent, ISO C++ has a problem.

I make the following prediction:

If I don't go to Hawaii; and no one else goes to champion this proposal; then it will not even be considered, despite support and interest in the C++ Proposals forum, and relative lack of opposition.

I'm not saying that the proposal should be accepted. I'm saying: if no one goes to Hawaii for it, it will not even be considered. It will be as though the work was never done; and the proposal never existed.

And if this happens; as I think it will – then, not fortunately at all, it will reinforce the idea that decent proposals, with decent support, are being lost, simply due to the nature of the standardization process.

Acetaminophen (paracetamol, Tylenol) more toxic than thought

Study:
Real-time monitoring of oxygen uptake in hepatic bioreactor shows CYP450-independent mitochondrial toxicity of acetaminophen and amiodarone

Prediction of drug-induced toxicity is complicated by the failure of animal models to extrapolate human response, especially during assessment of repeated dose toxicity for cosmetic or chronic drug treatments. [...] Importantly, exposure to widely used analgesic, acetaminophen, caused an immediate, reversible, dose-dependent loss of oxygen uptake followed by a slow, irreversible, dose-independent death, with a TC50 of 12.3 mM. Transient loss of mitochondrial respiration was also detected below the threshold of acetaminophen toxicity.
I posted previously about a paper suggesting that acetaminophen may cause autism and ADHD in vulnerable children.