2024-05-30

Idea: embeddings are a form of hash function. But with the added component that close values of the hash are likely to be pretty similar. What’s nice is that an embedding has some of the privacy benefits of a hash function (though not in the strictest/strongest case). It would be kind of hard to go from a specific embedding output back to the specific input text. Of course, a hash value is often very small and thus easy to quickly check for equality (prob 100-1000x faster than a 1500-dim embedding). So for those kinds of purposes, embedding is not as good. But I think “embedding as hash value” was something interesting I hadn’t really thought about before.
Had a ton of fun imagining LLM future use cases with Ari today

May 30, 2024