Posts

Showing posts from November, 2019

Reinventing wheels: A better, sliding-matrix diff algorithm

Image
A week ago, I posted about a simple diff algorithm which works, produces tolerable results, but is ultimately a toy. I wanted an algorithm that produces excellent results while keeping consistent worst-case performance.

I didn't want to implement the Myers algorithm because:
I find it hard to understand. I'd need a better understanding to implement it.It uses recursion. This is unfit for production. I would need to understand the algorithm even better to fix this.It has worst-case performance O(NM). This is unfit for production. I would need to understand it completely, inside-out, to figure out how to implement a compromise between optimality and performance.My capable colleague already attempted all of the above, suggesting a different approach is worthwhile.I was inspired by this excellent visualization from Robert Elder's page about the Myers diff:


I built on the core idea from my simple diff, the unique unit map, to devise a sliding matrix algorithm:
Instead of a full m…

Reinventing wheels: A straightforward, two-pass diff algorithm

For over two decades of writing software, I was scared of writing a diff algorithm. It seemed so formidable, like something an impressively bearded inventor of Unix might do, while the rest of us are doomed to fail and suffer in our cluelessness.

If you want to be discouraged, just look it up! Instead of finding a simple algorithm anyone could implement, you'll find that diffs are a special case of the Longest Common Subsequence problem. You'll learn this is a difficult problem. It's NP-hard. You'll find solutions that are mathematical and arcane and recursive, with horrible worst-case performance. There will be suggestions of wisdom along these lines:
How to put N grains of sugar in coffee: - If N=0, stop algorithm. - Otherwise, pick first grain of sugar. Mix it with coffee. - Repeat for remaining N-1 grains.When I needed a diff algorithm, on two separate occasions, in two separate decades, I hired two separate people to do it. They studied the same literature, found t…