2024-03-29

  • I’ve been continuing the foundational explorations on BaboTree. Wanted to write about some things that stand out to me as potentially interesting. (n.b. I got stuck for several minutes trying to figure out how to word this sensibly. Annoying/frustrating. I decided to brain dump instead.) Working with Language Models to turn web highlights into flashcards has got me thinking that Language Models can be thought of as a more flexible (and more unreliable) hyperlink/search tool. In the world of the internet, a hyperlink is, on the one hand/most basic level, a connection from one document to another. That’s fine and true. On the other hand, you can think of a hyperlink as an encoding of a (loose) question/answer pair. Or, maybe more accurately, it can be modeled as encapsulating a set of questions/answers. An example might be helpful. If you have a hyperlink to Napoleon’s wikipedia page, that could be thought of as encoding some set of questions. Some basic ones would be who is Napoleon?” or when did Napoleon become Emperor of the French?” and the answer would be (hopefully) found in the contents of the linked document. The hyperlink is an explicit link that connects the question in the readers mind to an answer in somewhere in the destination document (interestingly, the writer doesn’t even need to know what questions the reader might have, it’s open ended, which is also cool, but not the focus here). Language Models can perform a similar function. And just like in the recursive nature of hyperlinked documents (one document contains a link to another which contains a link to another etc), Language Models have recursive potential as well (an answer to one question can lead to other questions which leads to other answers etc). At this point, I think it’s worth spelling out what’s interesting to me about all this. In the past, I’ve mostly thought about Language Models like information processing pipelines. They are pretty linear to me. But viewing Language Models like hyperlinks means you can engage with them from the perspective of a graph. A pipeline with its various stages can, of course, also be modeled as a graph (each stage is a node, with a single edge going to the next node). But graphs are much more flexible as a structure than pipelines. And thinking of Language Models as a graph traversal engine brings up different ideas about what is possible and what is worth exploring with them as tools.
  • Took a look deeper look at Langchain today because I finally reached a point in my usage of LLMs where I thought I had the problem they are solving as a library/framework. tl;dr they do many things well and I’m not convinced it’s a good thing to adopt. Longer version: Basically, my LLM pipelines got long enough, and I wrote enough of them (maybe 25?) that I started to think that reducing boilerplate/consolidating essential complexity would actually be worthwhile. So, I started exploring my own solutions. After writing some exploratory APIs for composable pipeline steps, I realized ah, I think this is what Langchain was made to solve.” And now that I finally experienced the problem and made a first pass at a solution myself, I felt ready to evaluate the Langchain approach more deeply than when I had heard about the framework in the past. Overall, I have come away impressed in some ways and non-plussed in others. I think Langchain has a really nice pipeline description” syntax. Their use of the | operator leads to very clean pipeline definitions. Here’s an example:
    Really good job there! Very pretty. I am also impressed by their support for a number of nice-to-have LLM features. I think their API for streaming responses is very nice/simple. Similarly, their approach to embeddings looks great. Their support for concurrency with async/await built directly in is also a big plus. So, yes, lots to like. With that said, some big things stood out as bleh.” The first to mention is the general framework overhead problem.” Langchain is clearly trying to do a lot. They want to be the platform for building LLM applications. Unsurprisingly, they end up exposing a relatively large API surface you have to grasp before going from 0 to 1. It is definitely harder to learn than ramping up with something like OpenAI’s library, where you throw together an array of string dictionaries and you’re off to the races. As far as I can tell after like 30 minutes of reading their docs, the complexity of their API surface is very much downstream of them relying so heavily on this Runnable type for everything. And they seem to use the Runnable type, and structure it the way they do, because they really want the | syntax for chaining stages of LLM pipelines. While I appreciate the simplicity of the | operator, I don’t think this is a good tradeoff. Maybe this structure is less of a deliberate design decision and more about language limitations in Python. I’m not sure. In any case, it seems clear they really wanted composable” objects to get the | operator to work, and this is what it took. Unfortunately, it kinda sucks to have that stuff exposed in the user-facing API. As a moderately experienced Python dev, I find their APIs pretty unnatural/un-Pythonic (compared with, among others, OpenAI, who really did a good job here). Anyways, I’m sure it’s really nice once you are deeply embedded in the Langchain world. But getting in there seems not that fun and not that productive of a process. Classic problem for frameworks ofc. In addition to API heaviness that hits you the moment you open the door, Langchain seems to suffer another issue common to many frameworks: they end up encouraging/requiring framework-specific handling of things. AKA it doesn’t look super easy to migrate away from Langchain because they chose to take the framework in a direction that is not particularly consistent with other relevant libs/the broader language style. Langchain is clearly opinionated in how you should build your application, and those opinions don’t seem to play super nice with other approaches. This contrasts with my current approach to working with LLMs: strewing dictionaries of {"role": "user", "content": "..."} everywhere. Look this approach…it’s not super clean, I know that. But you know what it is: portable and rock solid. It is composable at the level of will work with any library”, even if it’s not composable at the |-operator level. I’m fairly confident my messages dictionaries will work forever. I’ll very likely be able to take them to whatever future LLM providers will pop up as either a drop-in replacement or with only minor variations. And, as a more general concern, say I want to change a pipeline to do something that isn’t well-supported by Langchain. My dictionaries will always be there for me. True, they’ll never reduce the inherent complexity of things I want to do, but they’ll never add accidental complexity either. Ultimately, after reviewing what Langchain does and comparing it with my initial approach to solving the same problem, I believe there’s a way to get like 80% of the benefits of Langchain without nearly as much API overhead or the concerns of framework lock-in. You’d lose some niceness around pipeline syntax (no more chaining stages with the | operator). But I generally don’t find that syntax improvement to be worth the likely framework headaches.

Date
March 29, 2024