China's Token Terminology Debate: 'Word Unit' vs 'Symbol Unit'

Recently, the National Committee for Terms in Science and Technology issued an announcement recommending the translation of "Token" in the field of artificial intelligence as "term element," and is inviting public trial use. Subsequently, the People's Daily published an article titled "Experts Explain Why the Chinese Name for Token Is Defined as 'Term Element,'" providing a systematic professional interpretation of this terminology.

The term "token" originates from the Old English word "tācen," meaning "symbol" or "mark." In language models, a token is the smallest discrete unit obtained after text is segmented or byte-level encoded, and it can take various forms such as words, subwords, affixes, or characters. The model demonstrates a degree of intelligent capability by modeling sequences of tokens.

This translation is considered to align with the principles of univocality, scientific accuracy, simplicity, and consistency within the expert evaluation system and has a certain foundation of usage in the current Chinese context. However, after reading related interpretations, I have developed a different understanding of this naming approach.

From a standardization perspective, this naming scheme offers immediate clarity and dissemination advantages. However, when evaluated through the lenses of computational ontology, information structure, multimodal evolution, and back-translation consistency, its long-term adaptability remains to be further validated. In this context, an alternative approach—“Symbolon”—is increasingly emerging as offering greater structural consistency and cross-contextual stability.

I. Misplacement of Definition: "Origin" cannot substitute for "Essence"

Opinion of the article (Chen Xilin, researcher at the Institute of Computing Technology, Chinese Academy of Sciences): The initial role of a token in artificial intelligence is as a "fundamental semantic unit of language," making "token" a more accurate term that better reflects its essence.

This judgment was reasonable in its historical context, but in the face of a major paradigm shift in technology, this way of thinking is essentially an academic case of "marking the boat to find a lost sword."

At the logical level of term definitions, a strict distinction must be made between "initial use cases" and "inherent structural properties."

Tokens indeed originated in natural language processing (NLP), but along the evolutionary path of AGI, they have long transcended the boundaries of language models, evolving into fundamental units capable of uniformly processing text, images, audio, and even physical signals. In modern computing systems, the true structural essence of a token is a "discrete symbolic unit," not a unimodal linguistic unit.

If named according to their "initial roles," a computer (Computer) should today be called an "electronic calculator" (derived from its original function of replacing human calculators); the Internet should be called the "Cold War military network." The fatal flaw in this naming logic is that it only recognizes the technology's temporary function at a specific historical moment, while ignoring its enduring physical essence across eras.

Historical usage cannot be equated with inherent properties. Similarly, we cannot permanently confine Token to the narrow context of "word" simply because it was initially used to process text.

Defining the foundational concept using "initial application scenarios" essentially substitutes historical path dependence for the ontological truth of structure. While this definition may offer conceptual convenience in the early stages of technology, it rapidly becomes obsolete and a cognitive constraint during the paradigm expansion phase marked by multimodal proliferation. In contrast, "Symbolon" directly aligns with the symbolic ontology of cross-modal computation, defining not the "past" of a token, but its "truth."

II. The Limits of Analogy: Once an explanation becomes a definition, it begins to stray.

Opinion by Associate Professor Dong Yuxiao, Department of Computer Science, Tsinghua University: Discrete units in multimodal learning can be understood as "generalized words," analogous to concepts such as "word clouds" or "bag of words."

Professor Dong Yuexiao’s analogy aids understanding but should not replace the definition. While this perspective is insightful at the explanatory level, elevating it to a basis for naming may lead to conceptual category misalignment.

From a methodological perspective, analogy serves to lower the barrier to understanding, while definition is responsible for delineating semantic boundaries. As the term "word" is extended to encompass image patches, audio segments, vector embeddings, and even broader perceptual signals, its original linguistic properties are progressively diluted, and its semantic boundaries become increasingly blurred. This analogy-driven expansion path may maintain interpretive consistency in the short term but is prone to semantic drift over the long term.

In cross-modal expansion, be cautious of the shift from “analogy” to “definition.” In the context of terminology standardization, it is essential to distinguish between “explanatory metaphors” and “ontological definitions,” and to prevent the former from replacing the latter.

A more intuitive analogy would be: in a popular science context, we can compare a light bulb to a "man-made sun" to enhance intuitive understanding; however, in a scientific naming system, it would be impossible to rename the unit of electric current, the "ampere," to "light unit" based on this analogy. The former is a descriptive expression, while the latter involves a strict system of measurement and standardized definitions—the two cannot be conflated.

Similarly, terms such as “word cloud” and “bag of words” are essentially descriptive or statistical metaphors, designed to aid in understanding data structures or distribution patterns; in contrast, the term “token,” as the fundamental unit of measurement in large models, has become deeply embedded in computational billing, model training, and academic metrics. When its usage scales to hundreds of billions to trillions of daily calls, the term’s designation transcends mere explanatory function and evolves into a foundational concept with engineering and standardization significance. At this level, terminology must align with its intrinsic properties rather than rely on analogical extensions.

If this analogy is extended further to the level of naming, it implicitly assumes a dangerous premise: since people are already accustomed to understanding Token through the term “word,” we might as well continue using this analogy. But this is merely a continuation of path dependence—substituting the convenience of existing cognitive frameworks for correcting the conceptual essence. In this sense, such naming is closer to a form of “linguistic romanticism” than a precise alignment with the computational ontology.

We cannot require discussing "electronic horses" in electric motors just because the word "horsepower" contains the word "horse." Analogies can aid understanding, but they cannot define standards.

In contrast, "Fu," as a more neutral concept, inherently possesses cross-modal adaptability and can encompass various forms of information—such as text, images, and audio—without requiring additional explanation. Therefore, a naming approach centered on "symbol unit" aligns more closely with the structural essence of Token at a definitional level. Under this logic, "Fuyuan" as the corresponding translation offers greater conceptual consistency and long-term adaptability.

III. The Cost of Understanding: When Semantic Anchors Create Systemic Misconceptions

Article viewpoint (synthesizing expert opinions): The term "token" is concise, aligns with Chinese language habits, and is easy to spread.

This judgment has some validity at the level of communication, but it implicitly assumes that the public can accept cross-modal analogies for the term “word.” However, analogy is fundamentally an expert cognitive tool, not a natural mode of understanding for the general public. For ordinary users, the term “word” exerts a strong semantic anchoring effect—upon hearing “word,” their instinctive association is invariably with language systems, not other modalities such as images, sounds, or actions. This cognitive pathway is not a technical issue, but a stable structure rooted in cognitive psychology.

On this basis, when the term "word" is expanded into what is called a "broad word," a cognitive bias is already introduced. Users first form the intuitive understanding that "word = linguistic unit," rather than the abstract concept of a "cross-modal symbol unit." Once this misconception is established, all subsequent explanations become corrections to existing beliefs rather than natural extensions of understanding.

For example, when media reports that “a model was trained on 10 trillion tokens,” the public often interprets this as “read a large amount of text,” overlooking the fact that the data includes vast amounts of images, audio, and other modalities. This misunderstanding is not isolated but systematically induced by the semantic anchoring of the term itself.

In practical engineering contexts, this terminology can also create friction in interdisciplinary communication. When discrete units in visual or speech models are referred to as “words,” it not only risks semantic misunderstanding but also generates unnecessary linguistic conflicts across disciplines. Multimodal systems require unity at the “symbolic layer,” not an expansion of linguistic categories.

In comparison, the concept of "Fu," being more abstract, has a slightly higher initial learning barrier but carries a more neutral semantic orientation, avoiding preconceived linguistic associations. Over the long term, it facilitates the establishment of a stable and consistent cognitive framework, thereby reducing overall explanatory costs and providing a more solid cognitive foundation for multimodal unification.

The cost of naming does not occur at the time of definition, but at the time of correction; once an early name establishes a semantic anchor, the cost of subsequent cognitive repair rises exponentially.

Experts can extend the boundaries of a "word" through analogy, but the general public does not understand concepts through analogy. Naming is not intended to serve experts, but rather to serve the cognitive system of the entire era.

Four: The Illusion of Uniqueness: When a single term attempts to carry two systems

Article viewpoint (terminology standardization principle): "Token" complies with the principle of univocality and helps resolve issues of inconsistent translation.

Regarding terminological univocality, special attention must be paid to the systemic risks that may arise from polysemy. In the standardization of scientific terminology, univocality is one of the fundamental principles. A term that requires context or additional explanation to distinguish its meanings has already lost its value as a standard.

However, from the perspective of existing academic frameworks, this judgment still leaves room for further discussion. The term "token" has long been established in linguistics and natural language processing (NLP), where it traditionally corresponds to the concept of "lemma"—the canonical base form of a word (e.g., the lemma of is/am/are is be). This usage has achieved a stable consensus in foundational textbooks and academic papers in linguistics and NLP.

Under these circumstances, translating "Token" as "token" would lead to semantic conflicts in specific expressions, resulting in catastrophic outcomes.

For example, when describing the lemmatization of a token in NLP, Chinese phrasing may appear as “perform lemmatization on the ‘token’.” This construction not only increases cognitive load but also introduces ambiguity in academic writing and information retrieval, making it difficult for readers to distinguish whether “token” refers to the discrete unit after segmentation or the canonical base form of the word.

From a conceptual standpoint, there is a clear distinction between the two: Lemma emphasizes "reduction" at the linguistic level, referring to the standardized form after inflectional changes, while Token emphasizes "segmentation" during computation, representing the smallest discrete unit processed by the model. This difference between "reduction" and "segmentation" corresponds precisely to the distinct dimensions of the semantic layer and the symbolic layer.

Therefore, when a term needs to be "generalized" to encompass multiple existing concepts simultaneously, its univocality is effectively transformed into "interpretive unity" rather than "semantic stability."

When a term requires explanation to maintain consistency, its stability as a standard term has already begun to waver.

In comparison, "Fu Yuan" does not create semantic conflicts within the existing terminology system. On one hand, it preserves the ontological property of Token as a discrete symbol; on the other hand, it avoids overlap with the established translation of Lemma, thereby demonstrating greater stability in semantic clarity and systemic consistency.

V. The Return of the Object: Tokens Are Fundamentally "Symbols," Not "Words"

Article perspective (general explanation): A token is the smallest unit used by a language model to process text.

This statement is functionally valid, but it remains at the level of “how to use” without addressing its ontological nature within computational theory. From the perspectives of information theory and computational theory, the fundamental objects processed by computational systems are not “words,” but “symbols.”

This can be further understood at two levels:

On one hand, from an information theory perspective, the essence of information lies in eliminating uncertainty, measured in bits (bit), with discrete symbols serving as its carriers; symbols do not concern themselves with semantic content but are instead related only to probability distributions and encoding structures;

On the other hand, at the computational implementation level, large models do not inherently "understand characters"; their processing objects are discrete index representations (IDs). Whether an ID corresponds to a Chinese character, an image patch, or an audio sample point, it is uniformly treated as a symbolic form during computation.

Within this framework, it is precisely because its essence resides at the "symbolic layer," not the "semantic layer." Symbols themselves do not carry meaning; they exist solely as fundamental carriers of encoding and computation.

Naming the token as "word element" introduces, to some extent, an implicit linguistic semantic orientation, pulling this concept, originally at the symbolic level, back into a language-centered framework of understanding. While this naming may offer intuitive clarity in explanation, it risks blurring the boundary between "symbolic computation" and "semantic understanding" at the theoretical level.

In contrast, "Fu Yuan" remains conceptually confined to the symbolic level. On one hand, it accurately reflects the computational nature of tokens as discrete symbols; on the other hand, it avoids introducing semantic features into the ontological definition, thereby better aligning with the fundamental frameworks of information theory and computational theory.

From a broader perspective, as artificial intelligence systems continue to evolve toward multimodal and general intelligence, naming foundational concepts in direct alignment with their mathematical and computational ontologies will better support the construction of stable, scalable cognitive systems. In this sense, a naming approach centered on “symbol units” is not merely a matter of linguistic choice, but a consistent expression of computational essence—and “symlin” is the natural counterpart within this framework.

Defining concepts from the symbolic level aligns with the essence of computation; naming concepts from the semantic level is closer to explanation than definition.

Six: The Break in Language: Mapping Failure in the Back-Translation Mechanism

Article perspective (comprehensive interpretation): The term "token" has gradually established a usage foundation in the Chinese academic community and possesses certain dissemination advantages.

In cross-linguistic contexts, one must remain vigilant about the systemic impact of "back-translation gaps." The long-term viability of a technical term depends not only on its semantic clarity in Chinese but also on its ability to achieve stable mapping within the international academic system. An ideal term should possess "reversibility," enabling consistent semantic translation in both directions across languages.

The above judgment reflects the acceptability of the term "token" within its local context, but from a cross-linguistic perspective, further discussion remains possible. If a term is valid only within a single language system and cannot establish a stable equivalent in an international context, it may introduce additional cognitive costs in academic communication.

Specifically, the term "token" lacks a clear, unique correspondence during back-translation. When rendered back into English, it often leads to ambiguity among several similar concepts: for example, "word unit" lacks a rigorous academic definition, "morpheme" refers to the linguistic unit of meaning, and "lexeme" denotes a lexical item. None of these concepts accurately capture the meaning of "token" in a computational context, and each introduces a category shift.

In comparison, "Fu Yuan" can more naturally correspond to "symbolic unit." This concept has a well-established theoretical foundation and consistent usage in fields such as information theory, discrete mathematics, and multimodal representation, maintaining a stable semantic orientation across different contexts. As a result, it is easier to establish a one-to-one mapping between Chinese and English.

From a practical standpoint, once a term enters academic papers, technical documentation, and international communication contexts, its back-translation capability directly impacts expression efficiency and understanding accuracy. If a term requires additional explanation to achieve cross-linguistic conversion, its long-term usage cost will continue to accumulate.

Therefore, in cross-lingual systems, the main issue with "tokens" lies in unstable mapping paths, whereas "symbols" demonstrate greater certainty in semantic correspondence and conceptual consistency. In the context of increasingly globalized artificial intelligence, selecting terms with strong back-translation properties will better support the construction of open, interoperable academic and technical systems.

The international reversibility of a term is essentially the key criterion for its long-term academic viability.

Seven: The Misconception of Uniformity—Consistent Form Does Not Equal Consistent Structure

Article perspective (synthesizing expert opinions): The term "token" maintains consistency in style with terms such as "embedding" and "attention"—concise, abstract, and aligned with Chinese technical context.

Conclusion first: Terminology consistency should be based on "conceptual isomorphism," not "linguistic similarity."

In the justification for the term "token," a common argument is that its stylistic approach aligns with terms like "embedding" and "attention"—concise and abstract, fitting the Chinese technical context. This argument correctly identifies the genuine need for consistency in terminology, but the issue is this: if uniformity is confined only to the linguistic level rather than the structural level, it risks slipping from "order" into "illusion."

“Embedding” and “attention” have become established terms because they correspond to well-defined computational structures: the former is a vector mapping, and the latter is a weighting mechanism, with their names directly reflecting their computational essence. In contrast, “token” is an interpretive term whose validity depends on an analogy with the concept of a “broad word.” Once removed from this interpretive framework, the term itself lacks an inherent structural coherence.

This discrepancy raises a key issue: formal consistency with semantic drift.

The former reduces expression cost, while the latter ensures cognitive stability. If "linguistic homomorphism" is prioritized, complexity does not disappear but shifts to a long-term cognitive burden; only naming grounded in "conceptual isomorphism" can remain stable across contexts and through multimodal evolution.

When "embedding," "attention," and "token" appear side by side, they can create the illusion of being on the same conceptual level. In reality, the first two are mechanisms, while the third is an object; the first two have precise definitions, whereas the third depends on context for interpretation. This structural misalignment can introduce implicit fractures within one’s conceptual framework.

More importantly, when the naming of a foundational concept relies on analogy rather than structural definition, its impact extends beyond a single term and spreads throughout the entire terminology system. As subsequent concepts attempt to build around this naming, they must continually rely on explanations to maintain consistency, resulting in an implicit structural misalignment.

In this sense, "Fu Yuan" offers a more direct pathway to the underlying structure, pointing precisely to the fundamental objects in computational systems—symbols—without relying on analogies, thus maintaining consistency across different contexts.

Terminology is not merely a label, but an entry point to understanding. Good terminology makes explanations gradually disappear; poor terminology leads to ever-increasing annotations. When foundational concepts deviate from structure, the terminology system can only be sustained by explanations rather than self-consistent definitions.

Conclusion

At its core, the choice of terminology is not merely a linguistic issue, but an early shaping of the cognitive structure of a field. Once naming deviates from its structural essence in the initial stage, the subsequent system can only sustain itself through constant explanation, making it difficult to form a coherent conceptual network.

As artificial intelligence moves toward generalization and multimodal integration, a term that aligns with computational ontology and ensures cross-context stability is more likely to serve as a lasting cognitive foundation. In this sense, a naming approach centered on “symbol unit” offers a more balanced alignment with both the technical essence and cognitive clarity.