2024-06-13

  • I am re-reading Kleppmann’s excellent book, Designing Data Intensive Applications. Early in the book, Kleppmann tells the story of how the SQL/relational model came to be dominant. He notes a couple other competing data models for databases, most notably the CODASYL model. I thought it was interesting the main reason the CODASYL model lost, according to Kleppmann, was that it was more work for application developers to deal with. And this happened because the CODASYL model failed to sufficiently insulate/isolate applications from common changes to the underlying schema/data layout. If, over time, an application developer wanted to evolve their schema and change relationships between entities, they would have to go back and rewrite all their queries throughout the application that used those entities. That becomes a huge burden, as these kinds of changes would come up regularly. Contrast this with SQL, which you can count on to be essentially rock solid for a large number of common schema transformations (e.g. adding a field to a table or connecting two new tables). You don’t have to rewrite anything you’ve done before, in those cases, everything just keeps humming along. There are, of course, still times where a schema migration would require rewriting queries. And those situations are sufficiently uncommon that the burden is manageable, and much more so than with CODASYL. That ends up being such a huge win that it dwarfs the performance considerations associated with how CODASYL and relational systems store/access data. This emphasis on what’s best for the bulk of software that uses this system?” determining the winning model reminds me about a hardware story. Admittedly, I only vaguely recall this story. But I think there was a Stratechery article a while back that talked about Pat Gelsinger arguing with a Stanford Professor about computer architecture stuff in like the 1970s or 80s. The professor, iirc, was saying something like X architecture (RISC, I think) will beat Intel’s x86 because it can be made more performant than x86 overall. And Gelsinger was like no, x86 will win because it has all the software already, and by the time people would rewrite the software for X architecture, Moore’s Law will mean that the x86 hardware is much faster anyways, so it’ll never make sense to do the software rewrite.” Developers and companies would just wait for x86 to get faster and keep pushing their systems forward on x86, which then makes a rewrite even harder, locking in x86 even more. Gelsinger ended up being right, at least on the time horizons they cared about in that conversation. Another situation where the speed/pain of writing software has historically (and still today) been much bigger than whatever pains come from slightly sub-optimal hardware or database designs.

Date
June 13, 2024