Good Things Weekly Roundup - 2024-06-24

Things I read, watched, and enjoyed this week

(data saved to/pulled from Readwise Reader)

Youtube Videos

James Hoffmann - the Beginner’s Guide to Coffee Machine Maintenance

First We Feast - Shane Gillis Pounds Milk While Eating Spicy Wings | Hot Ones

Marques Brownlee - Ai The Product vs Ai The Feature

Typecraft - What the Hell Is Zellij?

Tweets

For some reason, Readwise doesn’t return the full text from the endpoint I’m using, so here’s the gists…

Reading a distributed systems paper?

- Phil Eaton (on Twitter)

Databases are the pragmatic intersection and application of all the…

- Phil Eaton (on Twitter)

Books

Designing Data-Intensive Applications

The locality advantage only applies if you need large parts of the document at the same time. The database typically needs to load the entire document, even if you access only a small portion of it, which can be wasteful on large documents. On updates to a document, the entire document usually needs to be rewritten—only modifications that don’t change the encoded size of a document can easily be performed in place [19]. For these reasons, it is generally recommended that you keep documents fairly small and avoid writes that increase the size of a document [9]. These performance limitations significantly reduce the set of situations in which document databases are useful.

Systems Performance

Performance on NUMA systems can be significantly improved by making the kernel NUMA-aware, so that it can make better scheduling and memory placement decisions.

The Mysteries of Money

What trips us up is money metaphors. I began to realize this when I noticed that I was relating very differently to the completely trivial amounts of money this blog makes, compared to my paycheck.

Audiobooks

Note, titles may be wrong (looks like an audible/readwise issue), images should be accurate tho

Renaissance of the arts

Articles

Those I Read All the Way Through

Fast Crimes at Lambda School

Benjamin Sandofsky

Click for generated summary

Lambda School, once praised in Silicon Valley, faced criticism and controversy due to unethical practices and false promises. The company struggled with mismanagement, failed programs, and deceptive actions, leading to a loss of reputation and financial troubles. Despite attempts to recover, Lambda School’s downfall serves as a cautionary tale in the education and startup industry.

This one was a big bummer. I was a real believer in Lambda School, but the evidence here of bad behavior is really strong. Makes me sad.

I Add 3-25 Seconds of Latency to Every Page I Visit

howonlee.github.io

Click for generated summary

Reducing latency on websites can increase revenue, but the author has found a way to intentionally add latency to curb addiction to the internet. By artificially slowing down website loading times, the author believes they can dilute the addictive nature of the internet experience.

Loved this one. I’ve been looking for ways to reduce consumption of Twitter/IG. Adding Latency is really nice because it’s so subtle. I have struggled to stick with various approaches to outright blocking these apps. But maybe if I can just make them incredibly annoying to use, that’ll balance things out. Loved a quote in the blog that went like “Reddit at 100ms of latency is like cocaine. Reddit at 8000ms of latency is like a cup of coffee.”

Architectures for Central Server Collaboration

mattweidner.com

Click for generated summary

User interactions with the central server involve submitting operations that are validated and processed by the server using app-specific logic. In cases of concurrent operations, server-side rebasing or transformation techniques are used to handle updates to the server’s state. Different architectural choices, such as Server-Side Rebasing, Serializable CRDT-ish, and OT-ish, impact how operations are processed between clients and the server for effective collaboration.

This was a really nice summary of standard architectures for building collaborative/multi-player applications when you also have a central server (aka not fully P2P). Nothing necessarily surprising about the approaches, but a really clear writeup that was nice to read.

Atkinson Dithering

beyondloom.com

Click for generated summary

The Macintosh in 1984 had a high-resolution display and used dithering for images. Floyd-Steinberg and Atkinson dithering are popular techniques for simulating grayscale with limited colors. Atkinson’s algorithm spreads error differently, producing richer contrast in images.

Nice little writeup that introduced me to the concept of dithering. Very neat, and a reminder that the early Mac people had some really smart/exceptional programmers. Also cool to finally know the meaning behind Ben Thompson/John Gruber’s podcast name.

Partial Reads or Just Saved for Later

Making Your Writing Work Harder for You

kalzumeus.com

Click for generated summary

In this article, the author discusses the importance of writing in the software industry and how it can benefit businesses. They highlight the value of creating evergreen content that will remain relevant to your audience for years to come. The author advises against using blog posts as the primary platform for sharing valuable content, as they are often seen as quickly depreciating and less valuable. Instead, they suggest integrating your best work into the core navigation of your website to make it more discoverable. The article also emphasizes the need to align your writing with your business goals and target a broader audience, rather than just experts in the field.

This was linked on an HN comment about adding dates to blogs or not. Patio11 advocates against dates, at least for evergreen content, as it makes people think the information goes stale. Good point, and he backs it up by referencing his classic post on salary negotiation. And indeed, that post was excellent when it was published years ago and continues to be excellent. It was directly responsible for a 15k bonus I negotiated just a month ago.

bloom_impl.h

GitHub

Click for generated summary

This text describes implementations of Bloom filters for efficient data querying and caching, including details on false positive rates and hash functions.
The implementations focus on optimizing query performance and accuracy by utilizing cache locality and SIMD instructions for faster processing.
Different strategies are discussed, such as using double hashing and cache line boundaries to improve the performance of Bloom filters in handling large datasets.

Didn’t read this, but have been meaning to look more closely at Bloom Filters as a data structure and this seems like a great implementation. Really excellent/clear comments explaining things too.

V3 Session Management

GitHub

Click for generated summary

The text discusses the importance of secure session management for web-based applications and APIs. It outlines key security requirements for managing user sessions effectively, including generating unique session tokens, setting proper session timeouts, and using secure session handling methods like cookies. Additionally, it highlights specific security verification requirements and best practices to protect against session management exploits.

Read part of this bc I am kind of constantly worried that my little web projects are poorly configured from a security perspective. Was surprised how well-written/clear this documentation was, and also how much of it seems like common sense to me aka is stuff I would just do naturally, for the most part (though, the thoroughness of the recommendations does indeed go beyond what I typically implement in my personal projects).

NewSQL Database Systems Are Failing to Guarantee Consistency, and I Blame Spanner

Daniel Abadi

Click for generated summary

NewSQL systems are struggling to ensure consistency, with many failing to deliver on their promises. The author attributes this issue to a controversial design choice made by Spanner, which has influenced other systems. Guaranteeing consistency can lead to minimal reductions in availability and improve application reliability.

Found this through Phil Eaton’s Twitter. Didn’t yet read, but it seems like it would be a banger.

Upgrading GitHub From Rails 3.2 to 5.2

Eileen M. Uchitelle

Click for generated summary

GitHub successfully upgraded their main application from Rails 3.2 to 5.2 after a year and a half of work, improving their codebase along the way. They implemented a dual boot system to deploy changes for the next version without disrupting production, ensuring minimal customer impact and no downtime during the upgrade process. The upgrade allowed GitHub to address technical debt, improve their test suite, and open up new possibilities for their application and the community.

Why Your Ssd (Probably) Sucks and What Your Database Can Do About It

CedarDB - The All-In-One-Database

Click for generated summary

This text discusses how SSDs impact database performance, highlighting their strengths in read throughput but limitations in write latency. It offers solutions like group commits and asynchronous processing to improve SSD performance, and showcases how CedarDB integrates these approaches for optimal performance. Other popular database systems like MongoDB and PostgreSQL are also mentioned for how they handle SSD latency bottlenecks.

In Defence of Swap: Common Misconceptions

chrisdown.name

Click for generated summary

Having swap is important for memory management in a system, not just for emergencies. Disabling swap doesn’t prevent disk I/O issues, and understanding swap can improve system performance. Swapping out rarely-used memory helps optimize the system, especially under memory contention.

I’ve been reading Brendan Gregg’s Systems Performance book, so this seemed relevant to that. But didn’t get very far. Maybe next week (but realistically, no).

The Story of Reformatting 100k Files at Google in 2012

le-brun.eu

Click for generated summary

In 2012, Google engineers implemented a new tool, Buildifier, to reformat every Bazel BUILD file in the codebase. The tool enforced a strict formatting rule to improve code uniformity and maintenance ease. Despite initial concerns, the rollout was successful and greatly benefited productivity.

A cool story of a large-scale project that affected huge swaths (probably everything?) of the Google codebase.

Articles

ÞÿJSON Tiles: Fast Analytics on Semi-Structured Data

Note: I went to a distributed systems meetup in NYC and one of the speakers mentioned this paper, so I saved it. Haven’t read it yet, but it seems conceptually interesting, and I recognize the Thomas Neumann name from a previous paper, so I bet this is pretty solid.

Click for generated summary

JSON Tiles is a technique that allows relational database systems to perform fast analytics on semi-structured data in JSON format. It automatically detects the most important keys in JSON data and extracts them, achieving scan performance similar to columnar storage. JSON Tiles can handle heterogeneous and changing data, making it a robust and efficient solution. The approach leverages the implicit structure of JSON data to speed up query processing and provides optimizations for access performance and query optimization. The integration of JSON Tiles into a relational database system involves storing the extracted key paths and value types, as well as using a binary format for fast access to infrequent keys.

_{Process finished with exit code 0}

Date

June 24, 2024

Up next

2024-06-25 A new version of a library I use (dotenv) was just released. I read the release notes and was pretty impressed. Some interesting approaches to

Previously

2024-06-20 So lovely to be home and have coffee from my little espresso machine. Feels good to save those $5. Tastes great too. I have like 2 weeks until I