BusyBeaver and the state of near-native code in browsers

Curious how cryptography performs in asm.js vs. Google Native Client? Check it out for yourself:
A description of my path to these examples follows.

I was impressed recently with what I read about Google Native Client, which is embedded in Chrome, and Firefox's asm.js and Emscripten. I was further impressed by how I can play Prince of Persia in my browser, and it works via DOSBox inside JavaScript, and it's actually perfectly playable, aside from key lag. :-)

I've recently been working on BusyBeaver, a new password hashing, key derivation, and proof of work function. Existing commonly used password hashing approaches suffer from extreme brute forcing susceptibility on hardware as widely available as an AMD GPU. BusyBeaver is my attempt to improve upon PBKDF2 and scrypt in its resistance to optimization via execution on a GPU, FPGA, or most anything other than the intended usage platform - a 64-bit, general purpose CPU.

It seems I've done a decent enough job with BusyBeaver, as far as resisting GPU optimization is concerned, so I had the idea to see if I can use Emscripten, or Google Native Client, to have it run in browsers. It would, after all, be useful to a website that wants to offload the CPU cost of rigorous password hashing to the person who is intended to benefit from it — the user.

Oh boy.

Well, first of all, I was able to get BusyBeaver to compile with Emscripten, and it does run in Firefox, IE, and Chrome... in descending order of performance. It will become clear why I say this.

It takes a while for this to execute, but... BAM!

The test vectors run! And they are correct! Yay!

Note that this is an achievement in its own right. This is C++ code, compiled without modification into JavaScript, and it runs in a browser. That's pretty awesome.

But. There are buts.
  • The native EXE compiled with GCC is 100 kB. With Visual Studio 2013, it's 210 kB. With Emscripten, the .js file is 700 kB. Yikes. That's a lot of bandwidth to pay for a key derivation function.
  • In Firefox, the module loads in 150 ms on my computer. That's thanks to the SpiderMonkey optimizations for asm.js. It's probably much more on mobiles... But those run Chrome or Safari.
  • In Firefox, an execution of BusyBeaver with 50,000 operations, which takes about 6 ms in native code, runs for... 500 ms. Mmmm... okay. I guess that asm.js's claims of a factor of 2 slowdown over native compilation with clang are circumstantial, or hyperbolic, or clang must really suck.
  • Even so, I guess the figures in Firefox could work in practice. But how about Chrome?
  • In Chrome, the page is still loading...
  • Okay, so Chrome takes about 30 seconds to load the test page and run one execution of BusyBeaver. Once the page is loaded, subsequent runs take about 2 seconds. Sigh. Clearly, Chrome developers didn't implement the JavaScript optimizations for asm.js that Firefox did - which I guess is because they favor their Native Client.
  • Hey, how about IE? Well, would you look at that. The page loads in about 8 seconds, and then subsequent executions take about 4 seconds each. Not too much different from Chrome; just with a different - in this case better - initial analysis vs. subsequent performance tradeoff.
  • In Chrome on my Samsung Galaxy S4, the script took 311 seconds to load. It then took 13 seconds to calculate a subsequent digest.

Google Native Client

I also adapted BusyBeaver for Google Native Client. Results?
  • The portable executable compiles to 100 kB. Compare to 700 kB with Emscripten.
  • A single digest is calculated in 10 ms. Compare to 500 ms with Firefox and asm.js.
In my opinion - asm.js has:
  • no readability, compatibility, or safety advantage over a portable binary format; and is
  • an order of magnitude less efficient, both in performance and code size.
If you're going to have something as sophisticated as Emscripten, and a proposed standard like asm.js, why not just adopt a proper binary format?

Until then, we have statements like these, coming from Mozilla and Opera:
  • "NaCl seems to be 'yearning for the bad old days, before the web', to paraphrase Tim Berners-Lee."
    -- Håkon Wium Lie, Opera CTO
  • "We don't want to go back to a place where it's just binary delivery over the internet. We've seen people try to do it before."
    -- Chris Blizzard, Mozilla


Against the hating of cheaters

Cheating, as in relationships, is not an upstanding thing to do. It is exemplary of an immature, confused, untrustworthy, undependable character. Let me be clear: I don't think it's wrong to fuck around. But it's important to be open about it. Relationships and sex are a big deal to people. If you're going to be doing that with more than one person, make sure that everyone involved is aware, and understands.

Now, that being said - there are those who hate cheaters. I must say I dislike that more than cheating. When it comes to cheating, the well-adjusted person is one who neither cheats, nor hates with a passion. But if I had to choose between a cheater, and someone really sour about it...

The cheater, you see, is only selfish in their seeking of pleasure. But the hater is ignorant and bitter; coercive and entitled in imposing their views. These views tend to be naively monogamous, as if to imply: My expectations should be met. This should just work. I shouldn't have to try to find the right person for it.

Cheating happens within a relationship. If you get cheated on, then you, also, fucked up. You accepted a task: to pick a person, and create a mutually functional relationship with them. You could choose anyone in the world. This is the person you chose; this is the relationship you built.

The pain of being cheated on is emotional. It comes from within, from a clash between expectations and reality. If you didn't take the relationship seriously, there is no pain in the first place. But if you did, then there can be great pain. Those who would rather blame their partner than deal with their emotions try to rationalize reasons to see cheating as an objective wrong, rather than a trigger for their pain.

One way that people try to make cheating an objective wrong is by arguing that cheaters expose their partners to health risks. Yes, technically. But I disagree with a majority of what this is meant to imply.

If you're alive, chances are you've had the flu a number of times, and you're not holding grudges against people about it. You've probably infected people with your flu, and you aren't expecting to be blamed. It's only in the sexual realm that we have this insane expectation that an infection is living proof of shameful sin, instead of something that just happens to people, and that we're trying to solve with medicine.

It's also not your partner's job to protect you. It's your job to protect you. Safety is not something you check off and then forget about. It's something to keep in mind, even with your long-term partner.

Finally, you chose your partner. By choosing your partner, you're choosing all the baggage that comes with them, STDs or not. If you're a poor judge of character; if you create a relationship in which someone cannot be honest, is being coerced, where you don't communicate; if your partner keeps secrets from you, and you don't detect this; all of that is your blindness. All of that is not only on them, but also on you.

A hater of cheating wants to have their cake, and eat it too. They want to have sex, but not incur any of the risks associated with bodily activity. They don't want to take responsibility for choosing a partner. They make themselves vulnerable while building a relationship whose parameters are downright adversarial. Then, in order to feel safer in this choice, they'll consider themselves owner of their partner's genitals.

The cheater seeks only pleasure. The cheater hater seeks to clutch love in their fist. They build relationships on top of taboos that discourage communication. They seek to coerce, to ignore; to make their partner into something they're not. Then, they blame their partner for dishonesty and betrayal.


Visual C++ ternary operator considered harmful

We've recently begun upgrading our code from Visual Studio 2005 to a newer version, and we ran into the following doozy.

Imagine that you have classes with the following relationships:

// Lightweight reference to data owned by another object.
struct Light { Light(); Light(Light const&); };

// Owns data freed on release. Constructor copies data. Conversion to Light does not copy.
struct Heavy { Heavy(); Heavy(Heavy const&); Heavy(Light const&); operator Light() const; };

In a reasonable world, you would then expect the following code, using the C++ conditional operator:

Light DataOrDefault(Heavy const* h)
    { return h ? *h : Light(); }          // BAD BAD BAD!

... to work like this code using a simple if statement:

Light DataOrDefault(Heavy const* h)
    { if (h) return *h; return Light(); }       // OK

... or like this code, using a template:

template <class T> inline T Cond(bool expr, T a, T b)
    { return expr ? a : b; }

Light DataOrDefault(Heavy const* h)
    { return Cond<Light>(h, *h, Light()); }     // OK

But it does not. Do you know how it works? This is how it works.

Light DataOrDefault(Heavy const* h)
    // Actual behavior of: h ? *h : Light()
    Heavy t;
    if (h)
        t = *h;
        t = Light();
    return t;

Yes. The compiler decides that the common type of the two expressions is Heavy. A temporary Heavy object is then constructed to hold this result. A copy of the data is constructed; the copy is assigned to Light; and then it is promptly destroyed, so that Light now points to invalid data.

In GCC, this produces an error. But in Visual Studio, there is no warning, even with -Wall:

Which of the two behaviors is standards-compliant, I do not know. (Added: In comments, Lee Killough argues persuasively that GCC is compliant; VC++ is not.)

On the MSDN page for the conditional operator, you can find this notice:

The compiler authors are themselves warning you this behavior is bad, and telling you not to use this operator.

Let it be clear: my concern is not making the ambiguous code work. It's to prevent it from compiling in the first place. It should at least cause a warning, so that ambiguities can be found and rooted out, and so that new ambiguities will not be implemented.

I can visually check the code, but can I write it in a way that, if it was unsafe, it would produce a warning? What if there's a conversion I didn't think of? The code would just compile silently.


Due to namespace issues, I dislike macros, but this lambda-based macro is the best I've come up with:

#define If(EXPR, T, A, B) \
    (([&]()> T { if (EXPR) return (A); return (B); })())

You might be surprised at the performance. Using Visual Studio 2010, in the Debug configuration, it's only slightly slower than the ternary operator. In the Release version, it is several times faster. (And I did attempt to make sure that the useful effect of the code is not being optimized away.)

Further discussion

Some argue the real danger is the implicit conversions between Heavy and Light. I find this not to be the case. The conversions are safe and intuitive in other situations. The following, for example, will not build:

template <class T> inline T Cond(bool expr, T a, T b)
    { return expr ? a : b; }

Light DataOrDefault(Heavy const* h)
    { return Cond(h, *h, Light()); }    // Compiler error: T is ambiguous

Here, the compiler observes that the template type T could be either Light or Heavy. It refuses to make a potentially unsafe and incorrect assumption, and requires the type to be specified:

Light DataOrDefault(Heavy const* h)
    { return Cond<Light>(h, *h, Light()); }     // OK

In the case of the ternary operator, MSVC simply goes ahead with one of the possible paths; often one that is very counter-intuitive. There is no error, no warning.

If we just swap the operands as follows:

Light DataOrDefault(Heavy const* h)
    { return !h ? Light() : *h; }

... the compiler decides the proper type to use is Light, and the code will work fine.

2015-06-05: Ongoing edits in response to comments.
2015-06-06: Used tohtml.com/cpp for syntax highlighting instead of images.


The reasons for the wars

Much of the public may still believe that the wars being fought in the Middle East are to protect the US from "terrorism". This is, of course, a sham. The rest of the public, however, appears to increasingly subscribe to the simplistic explanation that the wars are just to benefit a few "evil corporations".

The wars are about more than that. The wars do have a motivation in US national security (and even the national security of EU and other satellites of the US), it's just a deeper interpretation of "national security" than what the public would accept without extensive flattery and well, lying.

The wars are about US global economic dominance. They have nice side effects in terms of juicy bones that can be tossed to allies, but for the most part, they're about keeping the US economy running. Other than military hardware, there are few things the US still actually manufactures at this point, so inventing reasons to keep the military busy is a huge source of employment. Hypocrisy looms large in the US: huge chunks of the population are employed by the war industry more or less directly, at the same time as they espouse anti-government rhetoric and vote Republican.

Even more important than that, though, is keeping a firm military footprint in the Middle East to ensure that a large and substantial proportion of oil worldwide is sold in no other currency but US dollars. This ensures worldwide demand for USD, even in the face of such confidence-destroying events as the financial crisis, and even with the US hardly manufacturing anything these days outside of entertainment and intellectual property (which two thirds of the world do not respect).

The flip side of the coin is that allowing the petro-dollar to be abandoned would be a huge boost to China and Russia, which are not the world's most liberal regimes; and arguably, for better or worse, aren't ultimately whom you want to benefit.

It's not only about money either, but about control. When Russia is ever ready to invade and annex neighboring countries, it's crucially important that the US is in a position to tank the global oil market via its allies in the Middle East, making it costly for Russia to be expansionist. Being able to reduce Russia's revenues brings more sting to economic embargoes, and allows the West to oppose Russia without risking thermonuclear war.

Wars in the Middle East are of course not about humanitarianism, freedom, or democracy - at least, not for the people who live there. They are about realpolitik; about long-term strategic interests of the US, and of the West in general. This does ultimately benefit regular people in the US and the West; even to the extent of propping up our economies; protecting Europe's freedom from invasion by Russia; and the same freedom of Japan, South Korea, and Taiwan from China. But it is ugly and awful, and much of the public would not go along with it, if the whole thing was explained to it clearly.

And then, of course, there's Israel.


You get what you pay for: Infographic

One of my biggest frustrations is that I can't get this point across to people. For some reason, people think that driven, principled, high quality individuals should be attracted to politics simply out of the goodness of their hearts!

The US problem is not Republicans vs. Democrats. It's that corporations buy your politicians because you begrudge paying them in any way proportional to good service. You get what you pay for, and the pittance you pay attracts few honest people, but primarily those who would abuse and sell power. It's not possible to fund a re-election campaign otherwise!

My previous posts about this:
  1. Study: Congress literally doesn’t care what you think (May 2015)
  2. Thoughts in favor of much better compensation for elected officials (July 2007)


Acetaminophen may cause autism, ADHD in vulnerable children

I came across the following paper, which suggests convincing - though perhaps not yet fully conclusive - evidence that the incidence of autism might be greatly increased by acetaminophen exposure in genetically susceptible children:

Evidence that Increased Acetaminophen use in Genetically Vulnerable Children Appears to be a Major Cause of the Epidemics of Autism, Attention Deficit with Hyperactivity, and Asthma

Cuba has some of the lowest rates of autism in the world. In comparison, vaccination rates are some of the world's highest; but, acetaminophen is not available without prescription. In fact: their medical system hardly ever uses acetaminophen, preferring a different antipyretic to treat fever.

In the US, on the other hand, there are physicians with the bizarre practice of prescribing acetaminophen prophylactically, every day for 5 days prior to a child's immunization.

The following is another paper suggesting that acetaminophen may be a culprit:

Did acetaminophen provoke the autism epidemic?

Article about a related study in TIME, February 2014:

Tylenol During Pregnancy Linked to Higher Risk of ADHD


You get what you pay for.

In 2007, I posted:

You get what you pay for?
Thoughts in favor of much better compensation for elected officials

Now check out this study:

Study: Congress literally doesn’t care what you think

Specifically - check out this graphic:


That is the entire problem of US politics.

The smart people in your country are leveraging $5.8 billion over a decade to receive in return a thousand times more in favorable regulation and subsidies. It all ends up being paid to them at your expense. It's literally taken out of your paycheck.

The only reason this can happen is because you just don't want to pay your politicians well. They run a $10 trillion dollar country, and you're too stingy to pay them anything remotely resembling that responsibility.

So corporations pay instead. With your money. And it ends up costing you a thousand times more.

We can argue about getting money out of politics all day, but that's not really how this works. You need to get money into politics. If politicians are properly compensated by you, they will be harder to buy by others.

What they want is power. They wouldn't bend to corporations if they didn't happen to also need money.


BusyBeaver key derivation function

June 27, 2015:

BusyBeaver is now available in your browser!

I present BusyBeaver, a password-based key derivation function (PBKDF) which I believe to be original and new. BusyBeaver attempts to improve on lessons of the past, which I would summarize thusly:
  • Recklessly foolish systems store passwords in plaintext. Anyone who peeks at database has everyone's passwords.
  • Naive systems try to protect passwords by storing their one-way hashes. It turns out it's cheap and easy to mount dictionary attacks.
  • Less naive systems store salted one-way hashes of passwords. It turns out low-iteration hashes are easy to brute-force.
  • PKCS#5 specifies PBKDF2, a salted one-way hash, repeated many times. Along comes Bitcoin, proving just how efficiently standard hashes can be brute-forced with GPUs, FPGAs, and ASICs.
  • Colin Percival creates scrypt, a PBKDF that attempts to increase the cost of brute-forcing by being memory-hungry. Along comes Litecoin, showing how GPU, FPGA, and ASIC optimizations are still quite feasible, and their improvements still measure in orders of magnitude.
BusyBeaver attempts to improve on this by taking advantage of what CPUs do well - serial execution of arbitrary code, 64-bit instructions, cache prefetch - in order to thwart optimizations that are made easier by the order of operations being static. BusyBeaver does not try to be as memory hungry as scrypt; top-end GPUs that might be used in brute-forcing now include so much memory per core as to make topping this expensive for legitimate users.

BusyBeaver operates as follows:
  1. Takes as input a salt and a password.
  2. Uses provided inputs to generate two 64 kB blocks of data using SHA-512 and HMAC-SHA512.
  3. Interprets the two blocks as byte code for a virtual machine where all inputs are valid as instructions and arguments. To avoid side channel attacks based on cache timing, all memory read/write offsets are controlled by the first block, which is generated from salt information only. Instructions perform calculations and changes on the second block, using operations that preserve entropy (jumps, additions, rotations, substitutions, XOR, and multiplication/modulo).
  4. The executed algorithm is unpredictable, but deterministically dependent on input.
  5. The algorithm is exceedingly unlikely to run into infinite loops, so much so that if it does, it's not a problem. Any infinite loops are local to very rare combinations of salt and password, and the attacker's cost of detecting them is greater than the maximum possible benefit.
  6. Processing stops when the requested number of operations have been executed.
  7. The final digest is produced as HMAC-SHA512 dependent on salt, password, and the final processed 64kB block.
BusyBeaver's C++ source code can be downloaded here:

BusyBeaver-20150703.zip (31 kB)

It is platform-agnostic, includes a test program, and comes with a permissive, free software license.

My implementation is parameterized by the number of operations to be executed. I find that a good value is on the order of 50,000 - enough to ensure that execution is not dominated by SHA-512, and nearly all bytes of the second 64 kB block have been replaced with different values.

The full KDF, parameterized with 50,000 operations, executes in about 6 ms on my machine as 64-bit code (of which 20% is SHA-512), and 18 ms as 32-bit code (of which 25% is SHA-512).

June 25, 2015:

We have verified that, at least on AMD R9 290X, BusyBeaver appears to resist attempts at GPU optimization. A colleague implemented and tested BusyBeaver with OpenCL, attempting to see if the insights that allow an efficient GPU implementation of scrypt, as used in LiteCoin mining, would also work on BusyBeaver. In the time allotted to this task, he was unable to implement a GPU version of BusyBeaver that would perform substantially better - or even significantly better - than what is achievable on CPU.

The crucial requirement to efficiently implement an algorithm on the GPU is to make calculations fit within the 32 kB of local memory available to each Compute Unit. In the scrypt case, normally more than 32 kB of memory would be required. However, scrypt has weaknesses (or features, if that's what you prefer) that allow the memory requirement to be reduced in exchange for computing more information on the fly. The parameters for scrypt, as used in LiteCoin, further require a low memory footprint, and the combination makes it possible to run the brunt of the algorithm within 32 kB.

With BusyBeaver, such optimizations are more difficult. Whereas scrypt only reads from the large memory buffer, BusyBeaver also writes to it. Whereas scrypt reads from the memory in large fixed chunks, BusyBeaver reads and writes in unpredictable sizes - often as small as one byte. In the time allotted to the task, my colleague has not been able to figure out how to meaningfully reduce the memory requirement. This means that each Compute Unit has to use global memory, which with this pattern of memory access is quite slow.

I expect that BusyBeaver would run orders of magnitude faster on a GPU with 128 kB, or at least 96 kB, of memory local to each Compute Unit. However, if we are to draw conclusions from the history of level 1 cache sizes for the CPU, we can suspect that there are tradeoffs involved with making this memory bigger. If similar pressures exist on the GPU that are keeping level 1 caches small on the CPU, major increases in local memory for general purpose GPUs may be unlikely. However, if GPUs were to introduce an intermediate category of memory, similar to level 2 cache on the CPU, that might also make them much more effective at running BusyBeaver.

I would love to see research into, and would welcome feedback from, any attempts to implement BusyBeaver efficiently on an FPGA. For the time being, I have confidence that BusyBeaver does favor the CPU more than alternative algorithms. We are therefore going ahead, and will be using it in the next version of Bitvise SSH Server.

July 2-3, 2015:

A parallel implementation, ParaBeaver, is now implemented separately in ParaBeaver.h/.cpp. It uses C++11 threading in order to run a number of concurrent threads processing BusyBeaver instances. ParaBeaver is useful on clients that can expect to have most cores idle, and can benefit from parallelization to gain security at no cost to the user.

The implementation of MulModOp64 now uses a right shift by 4 bits between multiplication and modulo in order to use a portion of multiplication result that preserves more entropy. The result of MulModOp64 is now used via rotBits in all instructions, to prevent the CPU from detecting it's not used and optimizing it away.

An additional parameter, swapBlockThreshold, is now supported to provide a tradeoff between side channel resistance and preventing an attacker from rearranging the order in which instructions are executed. BusyBeaver will now operate in side-channel-free mode the first 1/4 of the time (all memory offsets are exclusively salt-based), and will swap blocks the remainder of the time (memory offsets become also password based, and both blocks are read from and written to).

ParaBeaver supports swapBlockThreshold as well, and implements it sensibly for multithreaded execution.


Honey badger ought to give a fuck

Sometimes people ask me, "Why do you care?" About some thing that they ostensibly don't care about.

Caring is important. Not caring is harmful.

I don't consider people less valuable if they are strangers. So many people will one day influence my life, who are now strangers to me. They all have value. Others, who will not be in my life, also have value. Their opinions do, as well.

I am ultimately connected to everyone. Therefore, I find it valuable to correct incorrectness when I can.

You could say I'm opposed to the church of "Honey Badger Doesn't Give a Fuck". Honey badger ought to give a fuck. People not giving a fuck - about the world, about people - is a disease. It's what is causing most of our trouble.

To not care about others is, ultimately, to not care about ourselves. Eventually, the not caring comes around, and bites us all in our collective ass.


Atheism's holy cow

To be atheist, as opposed to agnostic, is to find predictability and comfort in the belief that the prime constituent of the universe is matter. In this view, consciousness is an emergent phenomenon that arises from matter that has accidentally been organized sufficiently.

Belief in a materialist universe offers an illusion of metaphysical certainty. There's comfort in believing that our current scientific understanding is substantially complete, and substantially correct. That all that's left to work out are kinks and details. The belief suggests what is to be valued, and provides purpose: science, knowledge, technology, experimentation. The atheist points to the many successes of this process as evidence that brightness lies this way.

What the atheist gets from the materialist hypothesis is much the same thing that the Christian gets from God: answers of a firmness that otherwise could not be obtained; a metaphysical framework on which to build life, and find meaning. Much like Pascal rationalized the irrational using Pascal's Wager; or like Thomas Aquinas did the same with his five proofs; so the atheist does this using Occam's Razor - an excuse to believe that which has no proof. Like with a person who is religious, what compels the atheist to defend materialism so ardently is not ultimately reason; it is fear of losing the entire foundation of that which he's built.

The atheist fears that allowing investigation of a non-materialist hypothesis threatens to undermine science, and to impede the holy grail of technological progress, which the atheist believes is our main worthwhile goal. Anything doubting the materialist belief is seen as supportive of the religious; and anything religious is seen as a source of evil, backwardness, suffering, and darkness. Therefore, any suggestion of a non-materialist hypothesis is met with extreme prejudice, since it's perceived as threatening to the one thing which the atheist considers holy.

The atheist values investigation - but a non-materialist hypothesis is the one area he will not willingly investigate. You are either with us, or against us; and if you doubt the universe is made primarily of matter, you are against.


The water-efficiency of dinosaur foods!

I'm happy to find out that dinosaur foods - eggs and chicken - are actually by far the most water-efficient sources of protein! Way more calorie-water-efficient than mangos and asparagus, too! Yay!

(Now if only the poor dinosaurs weren't put through lives of absolute torture by the meat industry :-/ )

Source article in LA Times:

From steak to mangoes, here are some water-hogging foods