The doghouse: Vagaries of the Excel COM interface

Wow. It's been a while since I've seen a more fragrant turd than the COM interface exposed by Excel which would, theoretically, allow you to interact with workbooks from e.g. PowerShell.

Theoretically is the key word, I'm sad to say. In practice, it's so buggy and poorly designed that you might save more time if you did what you wanted by hand.

Let's start with the simplest possible script:
 
  $xl = New-Object -com Excel.Application
  $wb = $xl.Workbooks.Open("C:\Temp\file.xls")

What does this do?

Each invocation of this script creates a copy of Excel which then just hangs around. Indefinitely. Like this:


If you just created the COM object, Excel would exit. But since you opened a workbook, it does not. Not even closing the PowerShell window exits these instances. You have to actually go and kill them using a task manager application.

Okay. Let's try modifying the script to close the workbook:

 
  $xl = New-Object -com Excel.Application
  $wb = $xl.Workbooks.Open("C:\Temp\file.xls")
  $wb.Close($false)

Does this work?

Har har har. No:


But according to the docs, there's supposed to be a method named Close. Why is it not there? Hmmm.

Let's try this:

 
  $xl = New-Object -com Excel.Application
  $wb = $xl.Workbooks.Open("C:\Temp\file.xls")
  [System.Threading.Thread]::Sleep(1000)
  $wb.Close($false)


Yeah. We have to wait unspecified amounts of time in unspecified places. This increases the chances that the script will succeed.

This gives you an impression of the type of quality and robustness we are dealing with here.

But the Excel processes are still hanging around. What else do we need to do for them to quit?

 
  $xl = New-Object -com Excel.Application
  $wb = $xl.Workbooks.Open("C:\Temp\file.xls")
  [System.Threading.Thread]::Sleep(1000)
  $wb.Close($false)
  $xl.Workbooks.Close()
  $xl.Quit()

Ah! Now the processes are cleaning up. It's like a magic incantation: we have to tell Excel three times for it to leave.

Note that the $false parameter to Workbook.Close is necessary. Otherwise, Excel opens an interactive dialog asking if we want to save changes to the unmodified file, opened by script.

Non-performance, non-concurrency

I eventually got a script running that would process workbooks and extract information from specific columns. As I type this, the script is chugging along, at a leisurely pace of... 8 cells per second.

It's 2015; we have multi-core computers running at GHz speeds. We have gigabytes of RAM, and solid-state hard drives. And the architecture of PowerShell, combined with the Excel COM interface, allows me to extract information out of workbooks at about the speed of a dot-matrix printer in 1990.

I saw in Process Explorer that the Excel instance spawned by the script is using nearly 100% of a single core. Computers have multiple cores, so I thought I'd tweak the script to run several concurrent instances, and get the job done 4x or 8x faster.

Nope. After launching 5 concurrent instances, instead of a single Excel process consuming 11% total CPU, I had five Excel processes consuming 2% total CPU each. So not only is it slow; but there's also a global lock somewhere that prevents concurrency.

Comments

Popular posts from this blog

When monospace fonts aren't: The Unicode character width nightmare

"Unreachable" beauty standards

VS 2015 projects: "One or more errors occurred"