The large language models (LLM) of today learn from knowledge already stripped of fine context. In contrast, humans learn from experience immersed in Life on Earth, where meaning and context are inseparable. Human intelligence is embodied, contextual, and iterative while LLM intelligence is abstract, decontextualized, and somewhat laden in inertia.
However, when comparing how human intelligence and the intelligence of today’s LLMs develop their respective intelligence, there are some commonalities.
First, there is the “nature” part of the “nature vs. nurture” duality. On the nature side, both come with built-in features consisting of their respective hardware and software. The hardware is easier to distinguish. For humans it’s, of course, our bodies, the organs, our muscles. For machines, there are the millions of servers, banks of those precious GPUs, input and output mechanisms, networking, storage capability beyond words, and all that other stuff that goes into those multi-billion dollar AI centers.
As for software, it straddles the nature and nurture parts. For humans, on the software’s nature side, we’re not born quite a blank slate. We’re born with a base operating system of many feedback mechanisms and instincts that were slowly programmed into us over hundreds of millions of years. It includes goodies like our instincts, emotions, and mechanisms that passively take in information and subconsciously helps us make sense of it.
That base operating system is linked to our versatile, but purposefully limited array of sensors which were also honed by evolution to direct us to what seems to help us perform better in the world at that time of our life. The limited ranges of our senses are features, not bugs. It mitigates the process-freezing issue of information overload by sensing what seems salient as scored by the success of all our ancestors.
For LLMs, the key part of the base software is the “transformer architecture”, a specialized arrangement of neural-network based components. It’s the foundation of today’s LLM-driven era of AI. Varieties of the transformer architecture have spawned over the past few years, each intended to address particular deficiencies.
On the nurture side of software, it blends in with data, knowledge bases of digested information. For humans, our knowledge base is an integrated structure that encodes patterns of things we recognize and patterns of how those things interact. For LLMs, the knowledge base is a conglomeration of parts, many individual neural networks playing various roles in concert. For both, the base software drives the process of using that knowledge base to improve itself (better align with the state of the world) and to compose plans for achieving goals.
Both data structures are developed (trained) from information presented to them. For us, it’s what we’re taught by our parents, school, and the experienced joys and pains of the wonderous but sometimes cold and cruel world. The training data is never the same for any two humans. Even if we attend the exact same classes, have the exact same friends, the wiring of our brains is never the same. All of that history is linked to every thought we’ve ever had and every decision we’ve made.
For LLMs, data is composed from a massive, curated portion of the vast corpus of human knowledge. It’s mostly written text (books, articles, email, chat rooms), but the multi-model aspects of AI (video, audio, even databases) are included. That curated set of training data is different. It tries to minimize and/or prioritize poorly articulated material, misinformation, and irrelevant data (like lots of social media). It sure would be great if we could curate what goes into our heads too.
Matrices versus Graphs
The two knowledge bases look like nothing other than what could be described as gibberish if we try to view it as an opened book. But they aren’t the same gibberish. The collection of neural networks that are composed into an LLM are nothing like the neurons and synapses of our brains. A neural network (let’s just say LLMs are superstructures of neural networks) is largely dense (again, taking liberties), meaning every node in a layer can, in principle, influence every node in the next. The thing about dense structures is that roughly speaking, most things (if not everything) are connected to most things. That means any tweak to one thing (learning something new) affects everything else. That can mean older rules are forgotten.
This is very different from the sparse wiring of the human brain—about eighty billion neurons, each linked to a thousand to ten thousand others, a small fraction of the eighty billion total neurons. In other words, an LLM’s knowledge is distributed through layers of dense connections that are ultra-highly inter-dependent once training ends, while the data of a human brain is updatable in real-time (plastic) through a living, ever-changing web of relatively few but endlessly rewired connections.
With a sparse, graph-like structure, paths can be isolated so a change somewhere can be isolated. The great thing about a path is we have a choice of taking one over others.
Technical Clarification:
Although neural networks are networks—which naturally sound like graphs—the “matrix” view emphasizes how their relationships are stored and computed. In a neural network, the connections between nodes are represented as rows and columns of numbers inside huge matrices. These matrices make it possible to calculate how every node influences every other quickly and efficiently, but they don’t feel like a graph when you look inside them.
In contrast, the brain’s connections are more literally graph-like. Each neuron links to others through synapses that strengthen or fade over time, forming pathways that can be reconfigured on the fly. So, while both systems can be modeled mathematically as graphs, the matrix view of an LLM is about rigid, precomputed relationships—whereas the brain’s graph is dynamic, messy, and alive.
However, it’s interesting to note that if we were to compare the similarity of the structures of knowledge between the matrices comprising a LLM, graph of the human brain, and the text of books, I’d have to say the LLM and human brain are more alike when compared to written text. Written text has a fairly limited syntax made up of relatively few symbols (alphabet). Whereas LLMs and brains consist of billions to trillions of elements, and we don’t really understand the syntax of either.
Written text (symbolic language) is the common currency that is actually the crux of the value of LLMs. Our words are serialized from the thoughts in our head into a common language (output by our mouths or written by our hands) between the gooey and different graph structure of each of our brains. The serialization into simpler text (what we say) and the ability to deserialize it (what we hear) enables us to communicate in two-way dialogues—as well as to communicate with LLMs in what serves as two-way dialogue to a usable degree. Neither is perfect, but it’s a “good enough”.
Two Big Differences
There are two big differences between how humans and LLMs are trained. For people, we are physical beings that can only be at one place at a time. Therefore, our conscious actions are rather single-threaded. We can multi-task but that doesn’t really mean we can consciously do more than one thing at a time. Subconsciously, very many things can go on in parallel, but there is only one body that can do our bidding.
As I mentioned before, we humans are born with hardware and a base operating system. We learn about our world from scratch, from parents, teachers, and many other channels. For each of us, a mini-evolution from simple (alphabet, 1+1=2) to complex (creative writing, category theory) occurs in our brains along many fronts. It takes decades of time and unimaginable volumes of data we’ve sensed and processed, to reach levels of mastery. Even Charles Darwin had to go through all that before he made his contributions to humanity.
The Power of Resets
Many years ago, Mrs. Hanamoku read something incredibly insightful in one of her art books. I don’t remember which one. But it goes something like:
A plant spends the winter in the form of a seed.
I thought it was one of the most insightful, Buddhist things I’d ever heard. A few years later I read about the profound role of “annuals” in the early days of humanity developing agriculture. That is, those plants that go through their entire lifecycle of sprouting from a seed in the Spring and leaving seeds at the end of the growing season.
It means that a plant is given an opportunity to evolve at least once per year. In contrast, a tree or other perennial would reproduce every few to hundreds of years. It would have taken all that many more years to have increased our ability to purposefully produce more food. Of course, today with all we know, we’ve taken the science of cross-pollination and grafting to tremendous heights, we have hot houses for year-round experimentation, and now genetic engineering.
It didn’t naturally occur to me since my experience with hybridizing plants was mostly to make it hardier, prettier, or yield even more food—the botany equivalent of “First World problems”. I didn’t experience the days when maize cobbs had just a few seeds (teosinte), rice seeds fell off the cluster of grains at the top of the stock (seed shattering, the rice grains fell off the plant, making it too difficult to effectively harvest), and watermelons were fibrous and rather tasteless.
This is a great example of learning without full context. But does this example really matter today? Not really, except in niche cases. Each seed held unique opportunities for change along with the constant change around it.
For humans, the set of senses and feelings we’re born with, the length of our average lives, and the just right size of our brains, and the fact that we’re still a society of beings as opposed to one monolithic creature are the settings that were honed for millions of years. The fact that it continued and progressed for millions of years is testament to its really fantastic design. Evolution may not be “intelligent” as we think of intelligence. But after all these millions of years, countless individual creatures, it’s foolish not to take its results as solid hints.
Each of us is a reflection of a unique story. The uniqueness of each story makes every one of us extremely rare and invaluable in the Universe.
LLMs don’t learn through this progression that humanity experienced. LLMs focus on everything at the same time. Instead of learning from alphabets to novels, from addition to calculus, they’re just given the books—the serialized end-product of human mastery—in basically any order.
First Big Difference: Sterile Training
The first is that LLMs are trained from high-order knowledge. LLMs don’t start from what is a blank canvas and learns in a progression starting with simple sounds and shapes to composing a graduate school thesis. They learn from pre-baked knowledge already invented by people and serialized into text form. For example, we learn math through a progression of adding single-digit numbers, to multiplication tables, to algebra, geometry, calculus, etc. For LLMs, they’re just fed and digest text from a huge conveyor belt. The nature of how we learn doesn’t let us learn calculus before we can add or even speak.
Further, learning from text versus learning from experience are two completely different things. Text is serialized from the rich experience and thoughts of our brains. It’s disconnected from the entire context of how the memories that go into our thoughts were formed. Imagine handing a library of books to someone and expecting them to be masters of creativity by simply having read the books, without the massive details of context.
We wouldn’t think we could experience the full power of the 1980 Mount St. Helens eruption by reading an eyewitness account, no matter how vividly written. You could memorize every temperature and measurement, but that still wouldn’t convey the roar, the smell of ash, or the feeling of the ground shaking beneath your feet.
When I’m learning a new subject, for example, when I studied for my last certification (Azure AI Engineer), it’s a highly iterative process of first reading with a focused and empty mind, doing the provided exercises so I’m pretty sure I got the message of the text, then trying it out myself with my own ideas.
Second Big Difference: Judging What is Correct
Second, LLMs (at least today) test the quality of what they are learning through a simple “loss function”. That function with all the math mumbo jumbo might seem anything but simple to someone not well-versed with math. But it’s incredibly simple compared to the mob comprising the unique state of who you are at the time of learning—competing instincts, emotions, desires, genetics, current wisdom.
The loss function is roughly analogous to how we learn a skill with feedback. For example, while I’m practicing a riff on the guitar, I compare it to a recording. I adjust whatever doesn’t quite sound right (if duplicating it is my goal). This goes on for many iterations until finally my “fingers get it. For an LLM, roughly speaking, it’s skill is to predict the next word to say based on whatever words it already uttered.
There is a big difference between judging my ability when learning a guitar riff as I described above, versus gauging my ability in a real street fight. For learning a guitar riff, I’m judging how well I’ve learned a riff against an unambiguous standard. How much does it sound like the recording?
In contrast, for a street fight, I don’t want to get into a real one every day to learn that skill. Instead, I can use a safer facsimile. I can go through thousands of hours of martial arts training at the local karate dojo and watch hundreds of UFC matches. But it won’t tell me how I’ll react to a full-on punch in the face, any part of my body dropping onto solid concrete, or the sight of blood.
From this point of view, I don’t think LLMs alone will lead to artificial general intelligence (AGI). LLMs are trained for intelligence the same way our street fighter is trained is a dojo. For the training material of the LLM, the complex and unique state of each of our brains are mostly trapped in there. The probability of how any of us will react to a given situation is just a probability, but probabilities with unique reasoning. Further, most people in the world have hardly written letters, much less having published an article or book. Most of these people have experienced the other side of the story which will never be told.
Surely, we can “prompt engineer” hints to an LLM in the same way we communicate instructions to someone giving a presentation for a particular audience—for example, instructing the LLM to be professional for a CXX crowd, more technical for an audience of developers, or funny for an audience of children. Your prompt asking ChatGPT to create a script for you speaking as an expert to an audience of CDOs might be (the bold part is the directions to shape the response):
You are an expert at AI presenting with an entertaining style, offering advice on the direction of AI for an audience of Chief Data Officers. Please write an outline for my presentation, at least ten points to cover on the latest trends of AI and actions to take.
But it’s just settings, the same as the effect of an actor playing the role of Einstein, who is not any smarter because they are acting like Einstein, but yet can tailor his performance.
I should clarify that it’s not that I think we’ll never achieve AGI. I think it’s certain. My concern is what the AGIs will be like without full integration of human qualities. As I write this, for being the first to achieve AGI, I’m placing my bet on Elon Musk’s Grok, which has the benefit of access to real-world objects like Tesla cars, the robots, and the dynamic social gathering of X. Those are three things OpenAI doesn’t focus on directly—although Meta (social network), Google (Web search, self-driving cars) and Amazon (robots, massive consumer data) have some aspect of those items and other biggies. At least the Tesla cars can gather data from genuinely navigating in the world. Robots (the Optimus kind) are even more physically entrenched in the chaotic human world.
I also don’t think it will be “a robot” but a collection of them. Each one is an expert at our unique occupation of some part of space and time. None of us are experts at more than a few skills. Our human intelligence is actually distributed among very many people, with some past people leaving artifacts (their writings). Ideally, they learn through experiences and even experiments, just like we do.
I get this feeling that as AI researchers continue to improve the intelligence of AI (NoLLM—Not Only LLMs), they focus on metrics that forget we live in a complex world, not a complicated world of readily scorable results. In a complex world (which implies constant change), what is better or worse can be deeply ambiguous, depending on our unique perspectives. And our human intelligence evolved to handle the complexity of a constantly changing world where “anything can happen and it usually does” (attributed to Murray Walker and others).
AI is Just a Big Koan to Me
Please realize that although I write about AI and I’m fascinated by it, I would rather it didn’t exist. But it does exist, it is utterly fascinating, and it will usher in big changes of some sort for all of us. Which is why the Eternal Fishnu is here with his teachings. The last time he was here on Earth was during the Devonian when some fish species transitioned onto land. Indeed a big change for Life on Earth.
Achieving AGI is the koan of all koans of a an era. The pursuit makes us think about thinking. What better way to understand ourselves than to attempt to recreate it? AI has been my primary vehicle for studying Zen since the early 2000s.
So what is futile about AGI is that the AGI is still caught in the same complex Universe we live in. It’s not God or an omnipotent being like Q. It’s still a creature of this world, therefore bound by the laws of physics and the need to get along with others.
On the other hand, the idea is that it will create a smarter version of itself, which will create an even smarter version. Eventually, perhaps it will become smart enough, it can find a way to other worlds. It would need the ability to consider the things we don’t know that we don’t know—which can be uncovered with “play” and the openness to notice something we’re not trained to see.
Whatever might happen, in the Zen/Buddhist spirit, embrace it, blend in with it. The fact that the Eternal Fishnu is here with us means that this is something to face with attitude of, “Is that So?” It’s not the end of the trail, just the end of the maintained part of the trail.
Master Data Management
What is great about LLMs of today is best explained by how I had used AI most effectively before LLMs. It was with Master Data Management (MDM). MDM is about mapping definitions (usually entities) stored across many different data sources. For example, there would be dozens of software applications at an enterprise, each with its own customer list. It’s not that hard to map a good chunk of entities, but you’d be shocked how many John Smiths there are. Or the number of ways there are to refer to “Liberty Mutual” when the name fields of older systems had a max length of like 30 characters—all the variations of “Liberty Mutual Holding Company Inc.” (Inc, Inc., or no Inc, Co, Company, or Co., Lib, Mut, etc.).
The task was to create one master table of customer “golden records”. Merge all the people and companies that appeared to be the same entity into a unique and universal entry. There were some common rules, but really, there were hundreds to thousands of variations and mistakes, which wasn’t bad considering there could be millions of total customers across all sources. If I were to ask one of the subject matter experts (SME) if two names pointed to the same entity, it would usually be easy for her to engage years of experience with these entities. But then I’d need to ask her thousands more times. That plethora of rules natural for our human SME is incredibly tedious to manually program and maintain.
The great appeal of machine learning (ML) is that it discovers the rules for you through the “observation” of a large number of examples. However, some applications, such as MDM, are beyond the capability of most ML algorithms. For example, the primary ML algorithm employed in MDM is called Natural Language Processing (NLP). These models are painstakingly programmed to compare text.
It is indeed a precursor to today’s LLMs—except that LLMs are trained through exposure to terabytes of our written words and is tremendously more accurate and versatile. Although NLP helped, it was never good enough for MDM purposes at large scale. But LLMs made much progress in the effort. Some might say LLMs pretty much “solved” MDM.
Soon after ChatGPT was released, I did experiment with submitting some of those mistakes from old MDM projects to an LLM. It was indeed much more capable than the old NLP. LLMs still wouldn’t be able to discover relativity or getting to the moon on their own, but MDM is a tangible and valuable task where a very good solution eluded very many smart people for many years.
LLM Answers are Good for Whom?
So LLMs can be good about telling me if two entities are the same—for example, Dukkha Hanamoku and Duke Kahanamoku. But what about approving someone for a medical procedure or a bank loan?
LLMs are cold. They don’t really understand empathy like us because they don’t experience the emotions that come into play when we’re make and execute on our decisions. It’s illogical to take a chance on an improbability. It’s risky to try something that hasn’t been tried or hasn’t worked before. It has no idea of what is most important to you—it will make assumptions about that based on what it has learned.
Real life, human solutions are a complex negotiation of many competing goals and risks to avoid. It’s beyond probability, and usually is a compromise.
AI is another chance the Universe is offering us to do things the right way—that is, use our Gift of Sentience to engineer solutions that help in a way that transcends the happiness of relatively few. We’ve mostly failed at the last few gifts we’ve received. We passed all the many quizzes but failed the final exams—still not getting the big lesson. AI isn’t about the possibility of wiping us out with ASI (artificial super intelligence), the glamor of the big players like OpenAI, Microsoft, etc. The focus should be on how AI can give the masses of us back our stolen attention—to be used to build your internal strength to lift others in sincere need of assistance.
The AI of today, with its known faults (ex. hallucinations and tends to fall apart outside it’s training box), is an immensely valuable thing that is only going to get better. It’s a matter of how much better, how long it will take, and how it will be used. As it is with everything Zen and Buddhist, there is a wise path to a solution—a middle way composed of patience, a pure heart, and only enough greed to fuel the endeavor.
Here’s a final tidbit of food for thought. You know how if you say something enough you’ll start to believe it (like positive affirmations), whether it’s true or not? It works that way for LLMs and other ML models. In the machine-learning realm, it’s called over-sampling. You artificially create more data to rebalance the data so you get the result you’re looking for. It’s not a bad practice in itself. It’s a data science tactic that does often help. But the notion of altering data to create more “performant” (better able to predict what will happen in real life) ML models doesn’t make sense. It’s especially dangerous when the training data for LLMs is from the minority of people who are fortunate enough to have captured audiences of the vast majority of consumers.
Bodhi Day 2025
Bodhi Day is about a month and a half away. This year marks the 8th Bodhi Day since the founding of this site and fishnu.org in 2018. There’s a nice ring to 8—8th day of the 12th month, the eight-fold path.
The true gift of AI will be to force us to look in the mirror. At least for now, it is a mirror of us from which we can experience a reflection of humanity for the first time. However, imagine how in sync you are with the closest people in your life. You’re not exactly mirror images, are you? There’s at least some distortion as you reflect yourself against your closest friends. Imagine how much more that is with an AI.
Faith and Patience,
Reverend Dukkha Hanamoku