2024/05/26


Shape-Rotators vs Wordcels (LLM Edition)




There is a clip of Peter Thiel's contrarian take on who stands to gain the most from the recent advancements of generative AI, specifically LLMs.



Peter Thiel: AI is bad news for people for math skills and society is going to shift in favor of people with strong verbal skills. pic.twitter.com/4bfiMTJrbh

— Zain Kahn (@heykahn) May 26, 2024


Before contributing my own ideas here, it's worth mentioning that the shape-rotator vs wordcel dichotomy provides some humorous yet relevant context for the discussion that this clip brings up.



A useful way of conceptualizing a large language model (LLM) or neural networks more generally is as a machine that can perform massive amounts of data compression. Using llama3 as an example just to appreciate the scale of these compression schemes, the training process can involve taking terabytes of data from the internet (eg. 15 trillion tokens ~= 60TB, assuming 4 bytes / token) and storing a useful representation of the same data within the model's weights which require orders of magnitude less memory (eg. 8 billion parameters ~= 16GB, assuming 2 bytes / parameter). It logically follows that creating a "better" model translates to learning a more "lossless" compression algorithm since the ultimate goal is to capture the as much of the underlying nuance in the training data distribution as possible.



I find it valuable to view these models as compression machines because it helps us pose the question: what kind of information benefits us the most once effectively compressed?



Our modern society is built upon increasingly convoluted and intertwined layers of abstraction that have accumulated over time. This has resulted in a large demand for domain "experts" who are hired to untangle this mess by parsing through a jungle of jargon and sifting through all the boilerplate before attempting to get anything done. This is a problem because there is a ton of opportunity cost associated with this kind of subdomain abstraction-specific specialization. It is merely treating the symptoms of inflammation within an industry instead of curing the disease itself. Large language models feel like a match made in heaven for such a problem, as they can compress these abstractions and help us crunch through the layers that we find to be the most tedious. This form of automatation can be what saves us from drowning in the bureaucratic quicksand that pervades every industry.



Circling back to the issue of whether the mathematically-inclined (shape-rotators) or the communicatively-inclined (wordcels) will prosper more from using these large language models, I opt for a third answer that is somehwat of a copout. Those who will benifit the most from using these models will need to have both skillsets.



One will need to have just enough understanding of the low-level math / science / engineering in their field so that they can intuively know the fundamental limitations and boundaries of what they're working on (ie. first principles thinking a la Elon Musk). At the same time, one will also need to have just enough verbal acuity and communication skills to draw up and express their high level vision of the task they want to get accomplished. These two skillsets will meet in the middle when it comes to sampling from the models and executing on their outputs as they navigate through the intermediate layers of abstraction between an implemented solution and the goal they have in mind.



TLDR, the winning skillset will be a combination of:






< return