2015-11-25

The doghouse: Vagaries of the Excel COM interface

Wow. It's been a while since I've seen a more fragrant turd than the COM interface exposed by Excel which would, theoretically, allow you to interact with workbooks from e.g. PowerShell.

Theoretically is the key word, I'm sad to say. In practice, it's so buggy and poorly designed that you might save more time if you did what you wanted by hand.

Let's start with the simplest possible script:
 
  $xl = New-Object -com Excel.Application
  $wb = $xl.Workbooks.Open("C:\Temp\file.xls")

What does this do?

Each invocation of this script creates a copy of Excel which then just hangs around. Indefinitely. Like this:


If you just created the COM object, Excel would exit. But since you opened a workbook, it does not. Not even closing the PowerShell window exits these instances. You have to actually go and kill them using a task manager application.

Okay. Let's try modifying the script to close the workbook:

 
  $xl = New-Object -com Excel.Application
  $wb = $xl.Workbooks.Open("C:\Temp\file.xls")
  $wb.Close($false)

Does this work?

Har har har. No:


But according to the docs, there's supposed to be a method named Close. Why is it not there? Hmmm.

Let's try this:

 
  $xl = New-Object -com Excel.Application
  $wb = $xl.Workbooks.Open("C:\Temp\file.xls")
  [System.Threading.Thread]::Sleep(1000)
  $wb.Close($false)


Yeah. We have to wait unspecified amounts of time in unspecified places. This increases the chances that the script will succeed.

This gives you an impression of the type of quality and robustness we are dealing with here.

But the Excel processes are still hanging around. What else do we need to do for them to quit?

 
  $xl = New-Object -com Excel.Application
  $wb = $xl.Workbooks.Open("C:\Temp\file.xls")
  [System.Threading.Thread]::Sleep(1000)
  $wb.Close($false)
  $xl.Workbooks.Close()
  $xl.Quit()

Ah! Now the processes are cleaning up. It's like a magic incantation: we have to tell Excel three times for it to leave.

Note that the $false parameter to Workbook.Close is necessary. Otherwise, Excel opens an interactive dialog asking if we want to save changes to the unmodified file, opened by script.

Non-performance, non-concurrency

I eventually got a script running that would process workbooks and extract information from specific columns. As I type this, the script is chugging along, at a leisurely pace of... 8 cells per second.

It's 2015; we have multi-core computers running at GHz speeds. We have gigabytes of RAM, and solid-state hard drives. And the architecture of PowerShell, combined with the Excel COM interface, allows me to extract information out of workbooks at about the speed of a dot-matrix printer in 1990.

I saw in Process Explorer that the Excel instance spawned by the script is using nearly 100% of a single core. Computers have multiple cores, so I thought I'd tweak the script to run several concurrent instances, and get the job done 4x or 8x faster.

Nope. After launching 5 concurrent instances, instead of a single Excel process consuming 11% total CPU, I had five Excel processes consuming 2% total CPU each. So not only is it slow; but there's also a global lock somewhere that prevents concurrency.

2015-11-22

"auto" considered (often) harmful

Edit: Thanks to Drammon, Simon Brand, and Nicola Gigante for important corrections (see comments).

It's now in vogue to write C++ code like this:
 

  auto const& container = Function();
  for (auto const& element : container)
  {
      auto const& member = element.AccessMember();
      ...
  }

const& is necessary because auto strips references and const/volatile qualifiers. This is good: it is apparent when you're not copying a whole object, even if its type is hidden from view.

But please don't do this (too much).

The value of strong typing in C++ is not only in ensuring consistency at compile-time, it's also to document what the code is doing.

The above snippet requires that the reader knows:
  1. what Function returns
  2. what container.begin() or begin(container) returns
  3. what element.AccessMember() returns
in order to just know that the code is doing this:
 
  Container const& container = Function();
  for (Container::Element const& element : container)
  {
      Member const& member = element.AccessMember();
      ...
  }

This reduces readability of the code. By using auto this way, you're throwing away an important self-documenting property of the language.

There's an argument that this improves maintainability because if the return value of Function changes to a type that behaves the same, you need to make fewer code changes.

But maintainability is not just about reducing code changes; it's about ensuring their correctness. A developer needs to understand the changes they are making, and that a change is propagated correctly throughout the program. auto makes code harder to understand, and hides places affected by changes.

auto is definitely needed with lambdas:
 
  auto lambda = [&] (Seq x) -> bool
      {
          ...
      };

In this case, auto is the right thing to do – otherwise, you're wrapping the unnamed, compiler-specific raw lambda type into an std::function, and doubly declaring the function signature.

If you find that the classes you are using require really obtuse syntax to use explicit types:
 
  std::map<unsigned int, std::string> const& container = Function();
  for (std::map<unsigned int, std::string>::value_type const& element : container)
      ...

... then maybe that's the fault of an unfriendly design of the library you are using. In cases like that, I much prefer that, instead of auto, we use a suitable type alias:
 
  using MyMap = std::map<unsigned int, std::string>;

  MyMap const& container = Function();
  for (MyMap::value_type const& element : container)
      ...

2015-11-21

Our bitching and moaning: Why the Middle East cannot have peace

In a nutshell: Terrorism and wars in the Middle East are not about Islam vs. the West. Instead, there cannot be peace in the Middle East as long as major world powers – the US and its allies on the one hand, and Russia on the other – have strong and opposite preferences about the price of oil; and of course, as long as the price is controlled by who has more influence over large oil producing countries.

It's easy to see ourselves as enlightened, and to perceive Middle Easterners as these backwards people with fundamentalist beliefs, who can't stop fighting each other and us, regardless of our "noble" attempts to "free" them. We try to "liberate" them from the Taliban, Hussein, and al-Assad – and they attack us!

Except we have never intervened in the Middle East to liberate; or to help build anyone's peaceful country. We only intervene to stir the shit, but secular values require peace to come about. As we stir shit, unrest and extremism prosper. After World War II, the Middle East had perfectly good trends towards democracy and secularism, and might have been peaceful and enlightened today. But then we helped!

We could bring peace to the Middle East if we invested into 30 years of governing and rebuilding each country. But that would be colonialism. So instead we shoot things up and leave. Preferably, from air.

We have our allies in the Middle East, one of which is Saudi Arabia. Just today, The Guardian wrote about how a Saudi court sentenced a poet to death for renouncing Islam. There is no shortage of this: just two months ago, they sentenced a young man to death by crucifixion for protesting the government.

Our allies in our war against terrorism. So much better than terrorists!

The explanation is not as clear-cut, or as favorable to us, as we would like. We are allies with Saudi Arabia because they help prop up the petro dollar; support our military presence in the Middle East; and are willing to sacrifice economically by helping the US and its allies. The price of gas at US pumps wouldn't be under $2 right now if the Saudi weren't pumping oil to sink the price, which costs them both in short-term revenue and long-term reserves. The official reason is to "protect global market share", which makes as little sense as it sounds. The price of oil is inelastic, so the supply only needs to be reduced by 1-3% for a 10% increase in price. But they can't exactly come out and say it's due to US pressure, or that a big reason is to clip the wings of Putin - whose strength is backed by Russian oil and gas exports, and who has been invading countries with whom the West would have wanted to be allies.

Why does the US want to topple al-Assad – a secular, but brutal dictator? To free the people? Or because he's allied with Russia, and provides their only Mediterranean base in the Syrian port of Tartus?

It's a geopolitical game in which you can't always pick all your friends if you're invested in the outcome. Saudi Arabia decapitates protesters, but they want to be our friend? Saudi Arabia is our friend now.

This pragmatism does not extend just to Saudi Arabia. Consider the endemic rape of young boys in Afghanistan, where the US army silently condones allied local men using young boys as sex slaves.

Of course, instead of all this screwing over the Middle East, which does nothing to liberate anyone, or to bring peace to any country, we could also invest in nuclear energy and renewables, to wean our Western economies from oil dependency. But that would require either not so much panic against nuclear power – because terrorism and wars in the Middle East are not better, but most people don't connect the two – or a breakthrough in energy storage, so that renewables could sustain constant power supply.

For example, the Germans will panic against nuclear energy, and force a plan to shut down all their nuclear plants. But they need energy from somewhere, and the Middle East controls the price of this energy. So then the entire Middle East is at war, and Germany has to accept Syrian refugees. And then they moan about the state of the world, and blame Americans, but fail to see that the refugees they're taking in are a not-so-indirect consequence of their opposition to nuclear power.

On the other side of the Atlantic, Americans will bitch and moan about nuclear plants requiring so much government investment, because Americans are individualist and everything must be privately run. But then they also blame Obama if the price at the gas pump is high. So Obama, like George W. before him, takes the full might of the US military to bear on the Middle East, and fixes global oil prices. And then Americans bitch and moan about the expense of paying for all this military.

We can have peace in the world. But first, we must come to terms with our want to push problems away. The attitudes "not with my tax dollars", and "not in my backyard", may seem to provide short-term relief. But in the long run, they only make big problems bigger. Not least our problem with climate – which has helped create the Syrian crisis.

2015-11-15

The advantages of Seq, and the demerits of std::string const&

A few weeks ago, I was reviewing some code, and found something similar to this:
 
  std::string host = ...;

  int port = 23;
  size_t pos = host.find(":");
  if (pos != host.npos)
  {
      port = atoi(host.substr(pos + 1).c_str());
      host = host.substr(0, pos);
  }

Looks reasonable enough. This code parses a parameter in “host:port” format. It's perfectly decent where it appears. It will run rarely, once per process invocation, so performance is irrelevant. It’s an acceptable way to achieve what’s intended.

But suppose this code was in a tight inner loop. Suppose it was part of a parser that needs to digest hundreds of megabytes of data, and performance is relevant.

In that case, this code does two suboptimal things:
  • A heap allocation to copy the “port” portion of the string into. Why not just read the number from the original string?
  • Another heap allocation to copy the “host” portion of the string. Again, why not just read from the original string?

Seq as an improvement over std::string const&

There are two purposes for which C++ programmers commonly use strings:
  1. To store and own character content. This is what string and wstring do.
  2. To pass character content without passing ownership. This is what string const& does.
I argue that passing string const& is almost always a mistake. It chains the string provider in ways that aren't necessary for the consumer to read the string. All you really need is to pass a pointer and a length. You need a lightweight Seq object.

A Seq object, essentially, is this:
 
  struct Seq
  {
      byte const* p { nullptr };
      size_t      n { 0 };
  };

In practice, a useful Seq implementation will also contain numerous methods that a user can use to read from the Seq. My implementation has these, among others:
 
  struct Seq
  {
      ...
      uint   ReadByte             (...)
      uint   ReadHexEncodedByte   (...) 
      uint   ReadUtf8Char         (...) 
      Seq    ReadBytes            (...) 
      Seq    ReadUtf8_MaxBytes    (...) 
      Seq    ReadUtf8_MaxChars    (...) 
      Seq    ReadToByte           (...) 
      Seq    ReadToFirstOf        (...) 
      Seq    ReadToFirstOfType    (...) 
      Seq    ReadToFirstNotOf     (...) 
      Seq    ReadToFirstNotOfType (...)
      Seq    ReadToString         (...) 
      Seq    ReadLeadingNewLine   (...) 
      uint64 ReadNrUInt64         (...) 
      int64  ReadNrSInt64         (...) 
      uint32 ReadNrUInt32         (...) 
      uint16 ReadNrUInt16         (...) 
      byte   ReadNrByte           (...) 
      uint64 ReadNrUInt64Dec      (...) 
      int64  ReadNrSInt64Dec      (...) 
      uint32 ReadNrUInt32Dec      (...) 
      uint16 ReadNrUInt16Dec      (...) 
      byte   ReadNrByteDec        (...) 
      double ReadDouble           (...) 
      Time   ReadIsoStyleTimeStr  (...)
      ...
  };

You get the idea. All the basic primitives you'd need to read character content belong in Seq.

The basic benefit of Seq is that it's lightweight, containing only a pointer and a length, and can point not just to a whole string, but also a substring. It does not require unnecessary functionality, like a whole string object, just to pass a sequence of characters without ownership.

A secondary, but even more central benefit is that it serves as a focal point for a powerful set of string reading methods that leverage each other, allowing for both elegant and efficient string reading.

Using Seq, the earlier "host:port" example can be rewritten like this:
 
  Seq hostPort = ...;

  Seq host = hostPort.ReadToByte(:);
  uint32 port = 23;
  if (hostPort.n)
      port = hostPort.Drop(1).ReadNrUInt32Dec();

This is not more complex than the string version. Yet this version does its task without unnecessary heap allocations, and would be much more efficient if implemented where performance matters.

So, it's like std::string_view?

The proposed C++ extension std::string_view implements a similar concept. Main differences:
  • std::string_view is mainly the lightweight reference. It lacks a powerful library of string reading methods. Seq, as in the example above, shows emphasis on stream-like reading. Read methods consume part of Seq and return the part that was read as another Seq. A fully useful Seq implementation covers the basic primitives of string reading in an elegant way.
  • std::string_view is an std::long_inconvenient_name. However, this is understandable given a standard library designed by dark warlocks whose mystical powers derive from conjuring, and causing the world to use, long inconvenient names. :)
I emphasize the use of Seq as a default for string passing and reading, not special case. This is encouraged by giving it a practical name, and building a library of string reading methods around it.

std::string_view could do the same, but it needs more power than just remove_prefix and remove_suffix.

2015-11-13

Can we stop with this idiocy of private courts?

There are smart people out there – people in many ways a lot like me, i.e. borderline idiot savants – who are attracted to the idea that the world needs to be saved through some kind of easy, adversarial revolution; rather than through a huge amount of incremental and cooperative effort. I suspect this is because cooperation seems boring; imposing one's will on others with violence seems fun; and plain old effort is hard and boring. Frequently, these are white middle-class Americans who do not recognize just how damn good they're having it, and how much worse things are in many other parts of the world.

Not in all parts for all things, of course. There are specific things that are genuinely better in other places. But overall, things are pretty great in the US for the middle and upper class. And yet, some of these same people can't stop going on about how awful everything is; and how everything could be much better if we just overthrew the entire system, and replaced it with something completely different. Like, for example: Let us fix real problems in our justice system by replacing it with private courts!

This is all based on the libertarian delusion, the foundation of which is to pretend that some obvious facts of life do not exist; and then coming up with solutions that might work in a fictional world that conforms to those assumptions. In this way, libertarianism and communism are the same mistake expressed in different ways. Both are ideologies that make bullheaded assumptions – about man, about the world – and then try to shoehorn people into it.

The fact of life that people are ignoring here is that not everyone has equal power. If we are to measure power and influence, some people have not just thousands, but millions of times more than others.

Private courts are essentially arbitrage. Arbitrage can work for equally powerful parties. But that's the only situation where it works.

What happens in arbitrage is, if the bigger party has equal choice in what arbitrators they're willing to deal with, arbitrage overwhelmingly favors the bigger party, because the bigger party controls a much bigger chunk of who gets all the arbitrage business than the smaller parties.

Allowing large businesses to dictate terms of dispute resolution effectively prevents class action lawsuits, which are an important way to hold large businesses accountable over systemic abuse.

As the small party in arbitrage, you have no choice. You either go with the corrupt arbitrators chosen by large corporations, where decisions always favor the large businesses; or you don't get service. If there are any payouts, they are such that it doesn't hurt the corporation, and they can continue systemic abuse as a business model, because paying out small amounts from time to time is cheaper.

You either get to agree to use their courts, or you don't get service. Good luck.