The doghouse: Vagaries of the Excel COM interface

Wow. It's been a while since I've seen a more fragrant turd than the COM interface exposed by Excel which would, theoretically, allow you to interact with workbooks from e.g. PowerShell.

Theoretically is the key word, I'm sad to say. In practice, it's so buggy and poorly designed that you might save more time if you did what you wanted by hand.

Let's start with the simplest possible script:
  $xl = New-Object -com Excel.Application
  $wb = $xl.Workbooks.Open("C:\Temp\file.xls")

What does this do?

Each invocation of this script creates a copy of Excel which then just hangs around. Indefinitely. Like this:

If you just created the COM object, Excel would exit. But since you opened a workbook, it does not. Not even closing the PowerShell window exits these instances. You have to actually go and kill them using a task manager application.

Okay. Let's try modifying the script to close the workbook:

  $xl = New-Object -com Excel.Application
  $wb = $xl.Workbooks.Open("C:\Temp\file.xls")

Does this work?

Har har har. No:

But according to the docs, there's supposed to be a method named Close. Why is it not there? Hmmm.

Let's try this:

  $xl = New-Object -com Excel.Application
  $wb = $xl.Workbooks.Open("C:\Temp\file.xls")

Yeah. We have to wait unspecified amounts of time in unspecified places. This increases the chances that the script will succeed.

This gives you an impression of the type of quality and robustness we are dealing with here.

But the Excel processes are still hanging around. What else do we need to do for them to quit?

  $xl = New-Object -com Excel.Application
  $wb = $xl.Workbooks.Open("C:\Temp\file.xls")

Ah! Now the processes are cleaning up. It's like a magic incantation: we have to tell Excel three times for it to leave.

Note that the $false parameter to Workbook.Close is necessary. Otherwise, Excel opens an interactive dialog asking if we want to save changes to the unmodified file, opened by script.

Non-performance, non-concurrency

I eventually got a script running that would process workbooks and extract information from specific columns. As I type this, the script is chugging along, at a leisurely pace of... 8 cells per second.

It's 2015; we have multi-core computers running at GHz speeds. We have gigabytes of RAM, and solid-state hard drives. And the architecture of PowerShell, combined with the Excel COM interface, allows me to extract information out of workbooks at about the speed of a dot-matrix printer in 1990.

I saw in Process Explorer that the Excel instance spawned by the script is using nearly 100% of a single core. Computers have multiple cores, so I thought I'd tweak the script to run several concurrent instances, and get the job done 4x or 8x faster.

Nope. After launching 5 concurrent instances, instead of a single Excel process consuming 11% total CPU, I had five Excel processes consuming 2% total CPU each. So not only is it slow; but there's also a global lock somewhere that prevents concurrency.


"auto" considered (often) harmful

Edit: Thanks to Drammon, Simon Brand, and Nicola Gigante for important corrections (see comments).

It's now in vogue to write C++ code like this:

  auto const& container = Function();
  for (auto const& element : container)
      auto const& member = element.AccessMember();

const& is necessary because auto strips references and const/volatile qualifiers. This is good: it is apparent when you're not copying a whole object, even if its type is hidden from view.

But please don't do this (too much).

The value of strong typing in C++ is not only in ensuring consistency at compile-time, it's also to document what the code is doing.

The above snippet requires that the reader knows:
  1. what Function returns
  2. what container.begin() or begin(container) returns
  3. what element.AccessMember() returns
in order to just know that the code is doing this:
  Container const& container = Function();
  for (Container::Element const& element : container)
      Member const& member = element.AccessMember();

This reduces readability of the code. By using auto this way, you're throwing away an important self-documenting property of the language.

There's an argument that this improves maintainability because if the return value of Function changes to a type that behaves the same, you need to make fewer code changes.

But maintainability is not just about reducing code changes; it's about ensuring their correctness. A developer needs to understand the changes they are making, and that a change is propagated correctly throughout the program. auto makes code harder to understand, and hides places affected by changes.

auto is definitely needed with lambdas:
  auto lambda = [&] (Seq x) -> bool

In this case, auto is the right thing to do – otherwise, you're wrapping the unnamed, compiler-specific raw lambda type into an std::function, and doubly declaring the function signature.

If you find that the classes you are using require really obtuse syntax to use explicit types:
  std::map<unsigned int, std::string> const& container = Function();
  for (std::map<unsigned int, std::string>::value_type const& element : container)

... then maybe that's the fault of an unfriendly design of the library you are using. In cases like that, I much prefer that, instead of auto, we use a suitable type alias:
  using MyMap = std::map<unsigned int, std::string>;

  MyMap const& container = Function();
  for (MyMap::value_type const& element : container)


Our bitching and moaning: Why the Middle East cannot have peace

In a nutshell: Terrorism and wars in the Middle East are not about Islam vs. the West. Instead, there cannot be peace in the Middle East as long as major world powers – the US and its allies on the one hand, and Russia on the other – have strong and opposite preferences about the price of oil; and of course, as long as the price is controlled by who has more influence over large oil producing countries.

It's easy to see ourselves as enlightened, and to perceive Middle Easterners as these backwards people with fundamentalist beliefs, who can't stop fighting each other and us, regardless of our "noble" attempts to "free" them. We try to "liberate" them from the Taliban, Hussein, and al-Assad – and they attack us!

Except we have never intervened in the Middle East to liberate; or to help build anyone's peaceful country. We only intervene to stir the shit, but secular values require peace to come about. As we stir shit, unrest and extremism prosper. After World War II, the Middle East had perfectly good trends towards democracy and secularism, and might have been peaceful and enlightened today. But then we helped!

We could bring peace to the Middle East if we invested into 30 years of governing and rebuilding each country. But that would be colonialism. So instead we shoot things up and leave. Preferably, from air.

We have our allies in the Middle East, one of which is Saudi Arabia. Just today, The Guardian wrote about how a Saudi court sentenced a poet to death for renouncing Islam. There is no shortage of this: just two months ago, they sentenced a young man to death by crucifixion for protesting the government.

Our allies in our war against terrorism. So much better than terrorists!

The explanation is not as clear-cut, or as favorable to us, as we would like. We are allies with Saudi Arabia because they help prop up the petro dollar; support our military presence in the Middle East; and are willing to sacrifice economically by helping the US and its allies. The price of gas at US pumps wouldn't be under $2 right now if the Saudi weren't pumping oil to sink the price, which costs them both in short-term revenue and long-term reserves. The official reason is to "protect global market share", which makes as little sense as it sounds. The price of oil is inelastic, so the supply only needs to be reduced by 1-3% for a 10% increase in price. But they can't exactly come out and say it's due to US pressure, or that a big reason is to clip the wings of Putin - whose strength is backed by Russian oil and gas exports, and who has been invading countries with whom the West would have wanted to be allies.

Why does the US want to topple al-Assad – a secular, but brutal dictator? To free the people? Or because he's allied with Russia, and provides their only Mediterranean base in the Syrian port of Tartus?

It's a geopolitical game in which you can't always pick all your friends if you're invested in the outcome. Saudi Arabia decapitates protesters, but they want to be our friend? Saudi Arabia is our friend now.

This pragmatism does not extend just to Saudi Arabia. Consider the endemic rape of young boys in Afghanistan, where the US army silently condones allied local men using young boys as sex slaves.

Of course, instead of all this screwing over the Middle East, which does nothing to liberate anyone, or to bring peace to any country, we could also invest in nuclear energy and renewables, to wean our Western economies from oil dependency. But that would require either not so much panic against nuclear power – because terrorism and wars in the Middle East are not better, but most people don't connect the two – or a breakthrough in energy storage, so that renewables could sustain constant power supply.

For example, the Germans will panic against nuclear energy, and force a plan to shut down all their nuclear plants. But they need energy from somewhere, and the Middle East controls the price of this energy. So then the entire Middle East is at war, and Germany has to accept Syrian refugees. And then they moan about the state of the world, and blame Americans, but fail to see that the refugees they're taking in are a not-so-indirect consequence of their opposition to nuclear power.

On the other side of the Atlantic, Americans will bitch and moan about nuclear plants requiring so much government investment, because Americans are individualist and everything must be privately run. But then they also blame Obama if the price at the gas pump is high. So Obama, like George W. before him, takes the full might of the US military to bear on the Middle East, and fixes global oil prices. And then Americans bitch and moan about the expense of paying for all this military.

We can have peace in the world. But first, we must come to terms with our want to push problems away. The attitudes "not with my tax dollars", and "not in my backyard", may seem to provide short-term relief. But in the long run, they only make big problems bigger. Not least our problem with climate – which has helped create the Syrian crisis.


The advantages of Seq, and the demerits of std::string const&

A few weeks ago, I was reviewing some code, and found something similar to this:
  std::string host = ...;

  int port = 23;
  size_t pos = host.find(":");
  if (pos != host.npos)
      port = atoi(host.substr(pos + 1).c_str());
      host = host.substr(0, pos);

Looks reasonable enough. This code parses a parameter in “host:port” format. It's perfectly decent where it appears. It will run rarely, once per process invocation, so performance is irrelevant. It’s an acceptable way to achieve what’s intended.

But suppose this code was in a tight inner loop. Suppose it was part of a parser that needs to digest hundreds of megabytes of data, and performance is relevant.

In that case, this code does two suboptimal things:
  • A heap allocation to copy the “port” portion of the string into. Why not just read the number from the original string?
  • Another heap allocation to copy the “host” portion of the string. Again, why not just read from the original string?

Seq as an improvement over std::string const&

There are two purposes for which C++ programmers commonly use strings:
  1. To store and own character content. This is what string and wstring do.
  2. To pass character content without passing ownership. This is what string const& does.
I argue that passing string const& is almost always a mistake. It chains the string provider in ways that aren't necessary for the consumer to read the string. All you really need is to pass a pointer and a length. You need a lightweight Seq object.

A Seq object, essentially, is this:
  struct Seq
      byte const* p { nullptr };
      size_t      n { 0 };

In practice, a useful Seq implementation will also contain numerous methods that a user can use to read from the Seq. My implementation has these, among others:
  struct Seq
      uint   ReadByte             (...)
      uint   ReadHexEncodedByte   (...) 
      uint   ReadUtf8Char         (...) 
      Seq    ReadBytes            (...) 
      Seq    ReadUtf8_MaxBytes    (...) 
      Seq    ReadUtf8_MaxChars    (...) 
      Seq    ReadToByte           (...) 
      Seq    ReadToFirstOf        (...) 
      Seq    ReadToFirstOfType    (...) 
      Seq    ReadToFirstNotOf     (...) 
      Seq    ReadToFirstNotOfType (...)
      Seq    ReadToString         (...) 
      Seq    ReadLeadingNewLine   (...) 
      uint64 ReadNrUInt64         (...) 
      int64  ReadNrSInt64         (...) 
      uint32 ReadNrUInt32         (...) 
      uint16 ReadNrUInt16         (...) 
      byte   ReadNrByte           (...) 
      uint64 ReadNrUInt64Dec      (...) 
      int64  ReadNrSInt64Dec      (...) 
      uint32 ReadNrUInt32Dec      (...) 
      uint16 ReadNrUInt16Dec      (...) 
      byte   ReadNrByteDec        (...) 
      double ReadDouble           (...) 
      Time   ReadIsoStyleTimeStr  (...)

You get the idea. All the basic primitives you'd need to read character content belong in Seq.

The basic benefit of Seq is that it's lightweight, containing only a pointer and a length, and can point not just to a whole string, but also a substring. It does not require unnecessary functionality, like a whole string object, just to pass a sequence of characters without ownership.

A secondary, but even more central benefit is that it serves as a focal point for a powerful set of string reading methods that leverage each other, allowing for both elegant and efficient string reading.

Using Seq, the earlier "host:port" example can be rewritten like this:
  Seq hostPort = ...;

  Seq host = hostPort.ReadToByte(:);
  uint32 port = 23;
  if (hostPort.n)
      port = hostPort.Drop(1).ReadNrUInt32Dec();

This is not more complex than the string version. Yet this version does its task without unnecessary heap allocations, and would be much more efficient if implemented where performance matters.

So, it's like std::string_view?

The proposed C++ extension std::string_view implements a similar concept. Main differences:
  • std::string_view is mainly the lightweight reference. It lacks a powerful library of string reading methods. Seq, as in the example above, shows emphasis on stream-like reading. Read methods consume part of Seq and return the part that was read as another Seq. A fully useful Seq implementation covers the basic primitives of string reading in an elegant way.
  • std::string_view is an std::long_inconvenient_name. However, this is understandable given a standard library designed by dark warlocks whose mystical powers derive from conjuring, and causing the world to use, long inconvenient names. :)
I emphasize the use of Seq as a default for string passing and reading, not special case. This is encouraged by giving it a practical name, and building a library of string reading methods around it.

std::string_view could do the same, but it needs more power than just remove_prefix and remove_suffix.


Can we stop with this idiocy of private courts?

There are smart people out there – people in many ways a lot like me, i.e. borderline idiot savants – who are attracted to the idea that the world needs to be saved through some kind of easy, adversarial revolution; rather than through a huge amount of incremental and cooperative effort. I suspect this is because cooperation seems boring; imposing one's will on others with violence seems fun; and plain old effort is hard and boring. Frequently, these are white middle-class Americans who do not recognize just how damn good they're having it, and how much worse things are in many other parts of the world.

Not in all parts for all things, of course. There are specific things that are genuinely better in other places. But overall, things are pretty great in the US for the middle and upper class. And yet, some of these same people can't stop going on about how awful everything is; and how everything could be much better if we just overthrew the entire system, and replaced it with something completely different. Like, for example: Let us fix real problems in our justice system by replacing it with private courts!

This is all based on the libertarian delusion, the foundation of which is to pretend that some obvious facts of life do not exist; and then coming up with solutions that might work in a fictional world that conforms to those assumptions. In this way, libertarianism and communism are the same mistake expressed in different ways. Both are ideologies that make bullheaded assumptions – about man, about the world – and then try to shoehorn people into it.

The fact of life that people are ignoring here is that not everyone has equal power. If we are to measure power and influence, some people have not just thousands, but millions of times more than others.

Private courts are essentially arbitrage. Arbitrage can work for equally powerful parties. But that's the only situation where it works.

What happens in arbitrage is, if the bigger party has equal choice in what arbitrators they're willing to deal with, arbitrage overwhelmingly favors the bigger party, because the bigger party controls a much bigger chunk of who gets all the arbitrage business than the smaller parties.

Allowing large businesses to dictate terms of dispute resolution effectively prevents class action lawsuits, which are an important way to hold large businesses accountable over systemic abuse.

As the small party in arbitrage, you have no choice. You either go with the corrupt arbitrators chosen by large corporations, where decisions always favor the large businesses; or you don't get service. If there are any payouts, they are such that it doesn't hurt the corporation, and they can continue systemic abuse as a business model, because paying out small amounts from time to time is cheaper.

You either get to agree to use their courts, or you don't get service. Good luck.


On Daylight Saving Time

Around this time of year, complaints pop up about the need to adjust clocks by an hour. I am usually one of the complainers, so I wondered: why do we still change clocks to DST and back, when apparently, most people hate it? If most of us prefer a single time throughout the year - what are the obstacles to making this happen?

After some research, I suspect the answer is:
  • We do in fact prefer more daylight during evenings in summer. Because of this, we might prefer permanent DST.
  • However, we can only borrow the evening daylight from mornings during summer. In winter, there's no light to borrow, unless we want mornings to be black.
In northern places, like Seattle, permanent DST would mean sunrise at 9 am at end of December and early January. Even as far south as Dallas, with permanent DST, sunrise in January would be at 8:30 am.

It appears that permanent DST would be great for those of us who get up late (more daylight!); but it would make winter mornings dreary for people who need to be at work or school at 8 am or 9 am.

Conversely, permanent standard time would make summer evenings end early, and in exchange we'd get the sun waking us up at 4:15 am.

It follows that there is wisdom in the current arrangement, resulting from forces of nature. sigh

Interestingly, the British did try permanent DST from 1968 - 1971. Apparently, mornings were dreary. Though note that Britain is further north than most of the US.

Russia – even further north – also tried year-round DST in 2011, but moved to year-round standard time in 2014 after people wearied of dark winter mornings. And in summer, their sunsets are at 9 pm, anyway.


Modern-day pyramid building

A commonly understood truth is that consumer spending is good for the economy. A common misunderstanding is that this is true for all forms of consumer spending: even $500 million superyachts; private jets; and the building and maintenance of luxurious villas that most of the year, no one lives in.

A superyacht is absolutely no better than an Egyptian tomb. It is to build a huge and expensive thing for one person (and their close friends), instead of spending the time and resources building something more widely useful. To think this is better than a pyramid is to fool ourselves with economic sophistry. Anyone who looks at this from a 1,000 year perspective will consider us foolish.

To say that superyachts and private jets are "good for the economy" is equivalent to the Broken window fallacy. Superyachts, jets, and villas that no one lives in, are extremely expensive broken windows.

Progressive consumption tax

I do not suggest defining some arbitrary categories of products as "luxury goods", and taxing them out of existence. We shouldn't be in the business of defining what is luxury and what isn't.

Instead, I suggest a progressive personal consumption tax, where all personal spending is taxed at 0% for the first X$ per year until some threshold, then 5%, then 10%, and so on. Optimally, a smooth formula should be used such that the tax percentage just keeps growing, indefinitely. The more you spend on personal consumption, the higher the consumption tax you pay.

The tax rate would automatically raise to some level resembling 900% if your annual personal consumption is in the hundreds of millions. It doesn't matter what you spend it on – one superyacht, one hundred supercars, or 1,000 paintings.

This does require spending to be differentiated as to whether it's personal spending, or an investment, or a business expense. However, this does not invent new complexity that is not needed for income tax administration already.

But what of the lost jobs?

When you impose a tax on luxury goods that causes some jobs to go away, of course that produces tangible, noticeable pain when workers producing those goods are no longer needed. When the economy restructures to reabsorb those people, this is diffuse, distributed, and subtle. You notice the pain, but not the relief. This may lead a person to think that damage was done by discouraging production of luxuries.

But this is not so. Even outside of tax collected, the very fact that pointless things are not being produced is a benefit to the economy. The tax is meant to reduce this type of consumption, which means it will cause job losses in the short term. But longer term, the money will go somewhere else, and jobs lost will be reabsorbed. The real benefit is in the long run (generations), not the short run (next few tax years).

But consumption moves abroad

Unfortunately, it does not do much to progressively tax consumption when people can quite simply board their jets, and take their yachting to another country, without taxation. A consumption tax is subject to tax avoidance even more so than taxation of income. Who hasn't yet taken advantage of state or country borders to benefit from shopping in a city with no sales tax? When it comes to superyachts, jurisdiction shopping is a no-brainer. Of course the yachts are going to be built and used where taxes are minimal.

To the extent that consumption can move abroad, any progressive taxation of personal consumption therefore has to be accompanied with measures to discourage avoidance. Most Western economies already tax worldwide income. Perhaps taxation of worldwide spending might not be so inconceivable.


Spec author "fixes mistake", wastes everyone's man-year

I came across the Referrer Policy proposal, which is already implemented in Chrome and Firefox, and allows sites to provide more privacy for their users by restricting referrer information sent to other websites when users follow links.

For example, with default browser behavior, if you are browsing the following page:
... then if you click any links on that page which take you to a third-party site, for example:
... your browser will kindly send to that site the full address of the "Big Dicks" page you came from.

This has some unfortunate privacy implications, so finally, browsers (except Microsoft's, of course) are allowing sites to exert more control over what referrer information is sent with outgoing links.

One of the nice new policies a site can choose is origin-when-cross-origin. Or is it? In 2014, the First Public Working Draft of the spec made a "mistake", and defined this policy without the third dash. In 2015, this was noticed, and the spec author decided to "fix" it, adding a third dash in later Editor's Drafts.

This has resulted in a situation where Firefox versions 36 - 40 implement the previous spelling (two dashes), and versions 41+ implement the new spelling (three dashes). As of today, Mozilla Developer's Network still documents the old, two-dash spelling. FxSiteCompat (not affiliated with Mozilla) documents the fix, and states "The legacy wrong value will no longer be supported in the future."

Meanwhile, announcements like this one continue to link to the old version of the spec, and the old version is still what you will find if you look up the spec on W3.org. If you want to use this feature, you'll spend an hour figuring it out. And if you don't, you are likely to use the old version instead of the new one – possibly leading to your referrer policy breaking in the future.

Ahh – the great results of "fixing" things that already shipped, and perhaps were not even broken. :-)


This ex-libertarian endorses Sanders

I have less than billionaires, or even the wealthy millionaires, but more than most people. What I have, I built "myself": this is to say, with much personal effort; but also with critical, non-negligible components of luck, and considerable help and work done by the right other people. My own personal work has been indispensable – but on its own, it wouldn't have been enough.

From my little perch, it seems to me that claiming one's little empire and yelling "I built all of this myself!" is nothing short of hypocrisy and egocentrism. There's no way one person builds an empire. Hundreds, or even thousands of people build it. The leadership provided by one person or several is critical, but you are not yourself building the empire. Other people are building it for you, with your partial guidance.

For there to be thousands of people who can help you build your empire, there has to be infrastructure before your business even starts. There have to be schools for your future employees to be educated. There have to be roads and telecommunications. There have to be hospitals and doctors. There has to be peace and order. There has to be a community.

Standing on top of this empire, and yelling to the world how "I built all of this myself! I want all of the rewards for me, and only me! I am an island! Any taxation is stealing!" seems to me so utterly... blind; hypocritical; narrow-minded; self-absorbed; and oblivious to other people's contributions.

So, yeah. I definitely believe that US politics would benefit a lot from moving in the direction of Sanders.

Full disclosure:

I'm originally from a high-tax European country, where taxation and bureaucracy felt oppressive. I was libertarian, and moved to Costa Rica, where I currently benefit from no taxation on foreign income. I have spent years in multiple countries where I saw the results of low taxation and low public investment. I'm now applying to move to the US, and if accepted, I expect to pay significant income tax there, in exchange for a better lifestyle, mostly due to those taxes.

I used to be libertarian before I succeeded (relatively speaking: I can't afford a jet, or anything like that). Now that I have – it seems fairly obvious how much of other people's work it takes. Not just to succeed; but to have an environment in which it is worth succeeding.


When monospace fonts aren't: The Unicode character width nightmare

Some things haven't changed since the 1970s. Programming is still done in text files; and though we have syntax highlighting and code completion, source code is still best displayed in monospace.

Other aspects of computing work best with monospace, also. The Unix shells; PowerShell; the Windows Command Prompt. Email is still sent with a copy in plaintext, which has to be wrapped on a monospace boundary. Not least, this persists because HTML email is excessively difficult to render securely, and there are user agents that still work better with plaintext.

In all of these situations, the problem presents itself that the originator has to anticipate how text will be rendered in advance. You cannot just send text and expect the recipient to flow it. You have to predict the effects of Tab characters correctly, and word wrap the text in advance, often not knowing the software that will be used for display. In terminal emulation, e.g. xterm via SSH, when the server sends the client a character to render, the server and the client need to agree by how many positions to advance the cursor. If they disagree, the whole screen can become corrupted.

As long as you stick to precomposed Unicode characters, and Western scripts, things are relatively straightforward. Whether it's A or Å, S or Š – so long as there are no combining marks, you can count a single Unicode code point as one character width. So the following works:
Nice and neat, right?

Unfortunately, problems appear with Asian characters. When displayed in monospace, many Asian characters occupy two character widths. How do we know which ones?

Our problems would be solved if the Unicode standard included this information. Unfortunately – as far as I can tell – the Unicode Consortium takes the stance that display issues are completely the renderer's problem, and makes no effort to include information about monospace character widths. (Edit – incorrect: see update below.)

If you're on Unix, you may have access to wcwidth. However: "This function was removed from the final ISO/IEC 9899:1990/Amendment 1:1995 (E), and the return value for a non-printable wide character is not specified." What this means is that the results of wcwidth are system-specific.

In 2007, Markus Kuhn implemented a generic version of wcwidth, which we now use in the graphical SSH terminal console in Bitvise SSH Client. However, this is more than 8 years old at this point, and is based on Unicode 5.0, whereas the current latest version is 8.0.

So I had the idea that maybe we could "just" extract up-to-date information from Windows. It's 2015, the following should render well, right?
	台北1234		(leading characters should be 2 spaces each)
	QRS12		(fullwidth latin; should be 2 spaces each)
	アイウ1234		(halfwidth kana; should be 1 space each)
It turns out – no. Perhaps you have an operating system with proper monospace fonts, which displays all of the above lined up. On my Windows 8.1, the problem looks like this:

IE Chrome Firefox Notepad VS 2015

Note how nothing lines up: not in Internet Explorer; not in Chrome; not in Firefox; not in Notepad; not in the latest version of Visual Studio – the environment in which Windows is developed (Edit: apparently not - see comments). Half-width kana are displayed kinda correctly by the Consolas font used in Notepad and Visual Studio; but that's it.

It turns out, when locale is set to English (United States), Windows just doesn't seem to use monospace fonts for Asian characters. Indeed, setting the Windows locale to Chinese (Simplified) produces this:

This is better; but now, the half-width kana are borked. sigh

Note that the above isn't a Windows problem only. This is how the same text displays on Android:

It boggles my mind that it's 2015, and we still don't have a single, authoritative answer to this question: how many character positions should each Unicode character occupy in a monospace font?


Because I'm providing examples of incorrect character rendering, this may offer the misleading impression that this is just a font problem.

This isn't just a font problem. It's that there's no standard monospace character width information, independent of font used.

The above incorrect renderings involve systems using non-monospace fallback fonts. However:
  • Even if you only have a fallback font that's not mono, you can coerce it into the right character positions if you know the character widths. The above examples could work correctly – although the renderings might be less than perfect – if software knew the intended character widths.
  • Even if you do not have a fallback font, and are just displaying placeholder boxes – you still need to know character widths to render the rest of the text properly, and for Tab characters to work.
Operating systems could work around this problem by providing better font support. We now have terabyte hard drives, so there's no reason all cultures shouldn't be simultaneously supported. However, that still leaves the underlying issue – that we need standardized monospace character widths.

Update and additional information

It turns out that Unicode does in fact provide character width information for East-Asian characters. It's just not as neat as one number. When is it ever? :)

The information is in EastAsianWidth.txt, which is part of the Unicode character database. The data provides an East_Asian_Width property, which is explained in this technical report.

This is basically what is needed... with some unfortunate limitations:
  • Hundreds of characters are categorized as ambiguous width (property value A). These characters include anything from U+00A1 (inverted exclamation mark, ¡) to U+2010 (hyphen, ‐) to U+FFFD (replacement character, �). Many of these characters (but not all!) have different widths depending on system locale. For example, U+00F7 (division character, ÷) has a width of 1 on Windows under English (United States), but a width of 2 under Chinese (Simplified, China).
  • In some cases, width can differ even between different fonts under the same locale. For example, on Windows under Chinese (Simplified, China), U+FFFD (replacement character) renders as narrow (1 position) with a raster font, and wide (2 positions) as TrueType.
  • Some characters categorized as one width are still displayed as another width by certain systems. For example, U+20A9 (Won sign, ₩) has width property value H (half-width), but is displayed as wide (two positions) by Windows under locale Chinese (Simplified, China). It is displayed as narrow under locale English (United States).
There are also scripts like Devanagari that just don't seem to have a monospace representation. I was unable to get Windows to display Devanagari characters in console. They do display in Notepad, but they don't obey any kind of monospace font rules, at all.

There are other efforts to provide information on character widths, including the utf8proc library that's part of Julia. Interestingly, this library derives its information by extracting it from Unifont. Unifont, in turn, is an impressive open source Unicode font with a huge coverage of characters.


C++ Relocator proposal

Last month, I spent two weeks working on the following formal proposal for a new C++ feature:

Relocator: Efficiently moving objects

After incorporating much feedback in the C++ Proposals forum, I believe this proposal represents not only my ideas; but close to a consensus of everyone who expressed interest in this feature. I believe the document is fairly polished. I have submitted it as proposal P0023 via Ville Voutilainen, chair of the Evolution section of the ISO C++ working group (WG21).

It so happens that Ville is also the person who most vocally disagreed with my observations last month about problems in C++ standardization – to the extent of us colliding in a somewhat fiery altercation:

Ossification and Hawaii: Impressions of a C++ working group

When I submitted this proposal, Ville reiterated his position that I need to find a champion to represent it in the next WG21 meeting in October in Hawaii. This is how the Working Group goes about its work.

So far – despite considerable positive feedback – no one has volunteered to actually go to Hawaii for this. It seems most likely that I, also, will not be in a position to make the trip.

I think this will provide a data point about whether, and to what extent, ISO C++ has a problem.

I make the following prediction:

If I don't go to Hawaii; and no one else goes to champion this proposal; then it will not even be considered, despite support and interest in the C++ Proposals forum, and relative lack of opposition.

I'm not saying that the proposal should be accepted. I'm saying: if no one goes to Hawaii for it, it will not even be considered. It will be as though the work was never done; and the proposal never existed.

And if this happens; as I think it will – then, not fortunately at all, it will reinforce the idea that decent proposals, with decent support, are being lost, simply due to the nature of the standardization process.