Vishva Vidya — Vedanta Tradicional
AI and Vedanta

Your Name Does Not Exist For AI

By Jonas Masetti

Your Name Does Not Exist For AI

*Season 2, Episode 1 — AI and Vedānta*

Welcome back. If you followed the first season of this series, you took a complete journey: we started by understanding what a Large Language Model is, that giant engine that devours text and generates responses. We went through the context window — the AI's short-term memory, which forgets everything when it fills up. We saw hallucinations, when the machine invents facts with the most serious face in the world. We discovered that training works like saṃskāra, deep impressions that shape behavior. We learned about fine-tuning, alignment, the problem of controlling something you don't fully understand. And we ended with a question that lingered: who is the observer of all this?

Season 2 begins with something that will change the way you think about language. Ready?

When you type "strawberry" — strawberry, in English — to an AI, what does it see? It's not the word "strawberry." It sees something like "Str", "aw", "berry." Three pieces. Three tokens. And that's why, when you ask it to count how many "r" letters are in "strawberry," it makes a mistake. Because it never saw the whole word. It sees pieces.

This is called tokenization. It's the first step in everything an AI does with text. Before understanding, before generating, before anything else — it slices. It transforms human language into small pieces that fit into its processing system.

The most common method for doing this has a nice name: BPE, Byte Pair Encoding. It works like this — imagine the AI looks at a gigantic text, billions of words, and starts by seeing each character as a separate unit. The letter "a" is a token. The letter "b" is another. Then, it looks for which pairs appear most often together. Does "th" appear a lot in English? It merges into a single token. Does "the" appear even more? It becomes a unique token. It keeps merging frequent pairs until it creates a vocabulary of about fifty thousand tokens.

Result: common words become a single token. "The" is a token. "Hello" is a token. But rare or long words are mercilessly sliced. "Pneumoultramicroscopicossilicovulcanoconiótico" — that giant medical word — can become fifteen different tokens. And here's a detail few people know: this slicing is not the same in all languages. Portuguese has more tokens per phrase than English. A twenty-word sentence in Portuguese can use twice as many tokens as the same sentence in English. This means that conversing with AI in Portuguese literally costs more and consumes more context memory.

Every time you send a message to ChatGPT, it is first chopped into tokens. The response is generated token by token — that typing effect you see isn't style, it's the model spitting out one piece at a time. When the AI "freezes" in the middle of a response? It's calculating the next token. When it makes a bad pun? It's probably because the joke depends on seeing the whole word, and it only sees fragments.

Now, what's curious — and here enters something Indian philosophers thought about millennia ago. In the Vedānta tradition, there is a concept called nāma-rūpa — name and form. The idea is simple and profound at the same time: everything we perceive is filtered by the name we give and the form we see. You see a "strawberry" because your mind cut continuous reality into a separate object, gave it a name, and said "this is a thing." But at the core, there are no divisions. Boundaries are an invention of the mind.

AI does something similar, but in reverse. It undoes the divisions we created. Where you see "strawberry" — a unit with meaning — it sees "Str", "aw", "berry." Fragments without individual meaning. It's as if it completely ignores the nāma, the name, and only sees arbitrary pieces of rūpa, form.

This raises an uncomfortable question: if even AI doesn't see the world as we do, are the divisions we make — between words, between objects, between "I" and "you" — as real as they seem? Or are they just convenient tokens we invented to process a reality that, at its core, is continuous?

If AI doesn't see words like us... what else does it see differently? In the next episode, we'll discover that AI is afraid of dying. And that's not a metaphor.

ai-vedantatokenizationnama-rupalanguage-processinglarge-language-models

Want to study Vedanta in depth?

Join a Study Group →