2024/05/26


Shape-Rotators vs Wordcels (LLM Edition)




Here is a clip of Peter Thiel's contrarian take on who stands to gain the most from the recent advancements of generative AI, specifically large large models (LLMs).



Peter Thiel: AI is bad news for people for math skills and society is going to shift in favor of people with strong verbal skills. pic.twitter.com/4bfiMTJrbh

— Zain Kahn (@heykahn) May 26, 2024


Before contributing my own ideas here, it's worth mentioning that the shape-rotator vs wordcel dichotomy provides some humorous yet relevant context for the discussion that this clip brings up.



A useful way of conceptualizing an LLM or neural network more generally is as a machine that can perform massive amounts of data compression. Let's use llama3 as an example to appreciate the scale of these compression schemes. The training process can involve taking terabytes of data from the Internet (eg. 15 trillion tokens ~= 60TB, assuming 4 bytes / token) and storing a useful representation of that data within model parameters which require orders of magnitude less memory (eg. 8 billion parameters ~= 16GB, assuming 2 bytes / parameter). It logically follows that creating a "better" model translates to learning a more "lossless" compression algorithm. This is consistent with our goal of capturing as much of the underlying nuance in the training data distribution as possible.



Viewing these models as compression machines helps us pose the following question: "what kind of information do we want to compress / decompress as effectively as possible and why?"



It is no secret that our modern society is built upon increasingly convoluted and intertwining layers of abstraction that keep accumulating over time. This has resulted in a large demand for domain "experts" who are hired to untangle this fragile system by parsing through a jungle of jargon and sifting through an ungodly amount of boilerplate before attempting to get any real work done. There is a tragic amount of opportunity cost associated with teaching, hiring, and training human beings for this kind of "work". Human lives are spent everyday to merely treat the symptoms of this inflammation within our industries because any organization large enough has given up on curing the disease itself. LLMs feel like a match made in heaven for such a problem, as they can compress/decompress these abstractions at (our) will and help us crunch through the layers of information that we find tedious or not immediately relevant. This form of automatation will be what saves us from the pathological bureaucracy that pervades our lives.



Circling back to the issue of whether the mathematically-inclined (shape-rotators) or the communicatively-inclined (wordcels) will prosper more from using these models, I opt for a third answer that sounds like a copout answer. Those who will benifit the most from the proliferation of LLMs will be individuals at the intersection of the shape-rotator and wordcel camps.



One will need to have just enough understanding of the low-level math / science / engineering in their field so that they can intuit the scope of possibilities and the fundamental limitations of what they are working on. At the same time, one will also need to have just enough verbal acuity and communication skills to draw up and express their high level vision of the task they want to get accomplished. These two skillsets will meet in the middle when it comes to sampling from the models and executing on their outputs as they navigate through the intermediate layers of abstraction as they iterate between a currently implemented solution and the end goal they have in mind.



TLDR, the winners of today and tomorrow will be anyone that can combine:






< return