Posts

Reinventing wheels: A better, sliding-matrix diff algorithm

Image
A week ago, I posted about a simple diff algorithm which works, produces tolerable results, but is ultimately a toy. I wanted an algorithm that produces excellent results while keeping consistent worst-case performance.

I didn't want to implement the Myers algorithm because:
I find it hard to understand. I'd need a better understanding to implement it.It uses recursion. This is unfit for production. I would need to understand the algorithm even better to fix this.It has worst-case performance O(NM). This is unfit for production. I would need to understand it completely, inside-out, to figure out how to implement a compromise between optimality and performance.My capable colleague already attempted all of the above, suggesting a different approach is worthwhile.I was inspired by this excellent visualization from Robert Elder's page about the Myers diff:


I built on the core idea from my simple diff, the unique unit map, to devise a sliding matrix algorithm:
Instead of a full m…

Reinventing wheels: A straightforward, two-pass diff algorithm

For over two decades of writing software, I was scared of writing a diff algorithm. It seemed so formidable, like something an impressively bearded inventor of Unix might do, while the rest of us are doomed to fail and suffer in our cluelessness.

If you want to be discouraged, just look it up! Instead of finding a simple algorithm anyone could implement, you'll find that diffs are a special case of the Longest Common Subsequence problem. You'll learn this is a difficult problem. It's NP-hard. You'll find solutions that are mathematical and arcane and recursive, with horrible worst-case performance. There will be suggestions of wisdom along these lines:
How to put N grains of sugar in coffee: - If N=0, stop algorithm. - Otherwise, pick first grain of sugar. Mix it with coffee. - Repeat for remaining N-1 grains.When I needed a diff algorithm, on two separate occasions, in two separate decades, I hired two separate people to do it. They studied the same literature, found t…

My beloved aunt and her All-Adoption

Across the ocean lives my superficially kind, perfidiously manipulative aunt. Before I talk shit, I must note she was a second mother figure for me growing up. My actual mother is not quite a functional human, and my dad came to see me every two weeks. The occasional visits I had from my dad, along with visits to my aunt, were invaluable to me. It helped me grow up less screwed up than I could be. I needed that, so I am eternally grateful.

My aunt has also gone out of her way to help me. Most recently this year, she spent quite a bit of effort on a favor I asked, while refusing compensation beyond the costs. Back in 2007, she contributed awesomely at Jana and I's wedding with games*, poetry, and singing. Overall, in my life, she has been a kind, if subtly calculating presence.

She comes across as the nicest person. Practically a saint. That makes it disconcerting when you realize she finagles her way to be everyone's friend – only to smugly judge and diss them when they'r…

Forced-birth activists as paperclip maximizers

Related to previous post: Life continues, not begins, at conception

Through further arguments with forced-birth fundamentalists – those who call themselves "pro-life" – I've come to some interesting conclusions.

They don't care about suffering! The fact that their policies lead to suffering is irrelevant to them.

They care about minimizing gray areas. What fundamentally bothers them is how, if the rights of a person are granted at any other time than conception, the stage where these rights are granted is a judgment call. It bothers them there's no one-size answer that fits all.

The ultimate purpose of forced-birth activism is not to reduce suffering, or to maximize the number of people, or to increase happiness. It's a cover-your-ass, bureaucratic type of ethics that provides deniability. If the rules are clear, we can be sure we didn't break them.

The purpose is not to try and do what's most right, but to avoid being wrong at any cost. This makes th…

AI can change scenery, can't fix suffering

A number of people I respect work on the problem of AI friendliness. The problem is how to ensure that when a highly capable artificial intelligence arises – either as a sentience, or as a tool for some humans to use – this doesn't kill us.

This is a legitimate problem, because highly capable AI will arise – it is arising right now – and humans are problematic. We create a lot of suffering for ourselves and everything that has the misfortune to meet us. To a super-intelligent AI that wants to maximize happiness, ending us might seem a merciful and desirable outcome. The AI could then replace the entire planet with a quivering mass of neurons in a stable, never-ending state of bliss. Maximum happiness!

It appears our continued existence – and also all of its accompanying strife – could be ensured if an AI is engineered to respect and defend human free will, including – no, especially! – when this leads to unfortunate outcomes.

Lots of people have hopes beyond that: that AI will rel…

"Support" incompetence at Amazon (and elsewhere)

For a while now, I've been paying Amazon an extra $29.00 per month in hope to get at least perfunctory support if ever needed. Now, I need the most basic thing - remove the SMTP sending limit and create a reverse DNS record for a new IP address - and those $29.00 are accomplishing nothing. I have filed three requests already over several days. I'm just getting canned and inappropriate responses, and it seems nowhere closer to being done. I stand a real risk of not being able to reply to customers if this is not sorted.

I've been happy with Amazon Web Services in general. 99% of the time, it works great – as long as you don't have to interact with anyone. But when you need a human to look at something, it's only happening if your request is so ordinary that it just needs to be rubber-stamped.

I had no trouble, two times in the past, having Amazon process a similar request for new instances. But now, I'm trying to move an existing email server to a new Elastic IP…

What's wrong with computing?

What's wrong is that we are:
Using a bad universal data format.Depending on a universe of tools that make this bad format seem like the best choice.The bad universal format are text files. HTML, XML, JSON, and most programming languages are based on them. The universe of tools are all manner of utilities to create, search, process, edit, compile, compare, and store versions of them.

We need that universe of tools. But we need them for a better data format.

What's wrong with plain text, then?

It is fundamentally incongruous with the data we store. Almost all data is structured: HTML, XML, JSON, TOML are all ways to store structured data in text files. Programming languages are structured with complex grammars. Where we use binary formats, almost all of them store structured data. ZIP files, DOC files, PNG files, everything is structured.

The incongruity is in the use of in-band signaling to delineate data. We can signal start and end of data in two ways:
Length-prefixed encoding.