Better documentation about tokens

This commit is contained in:
2026-02-19 00:32:29 -05:00
parent cf77eb8544
commit 9f286a1648
2 changed files with 66 additions and 30 deletions

View File

@@ -251,15 +251,14 @@ Update 2: I don't remember using userdata objects at all. I am not sure that Upd
## Token Literal Syntax Patch
Tokens are lightuserdata values encoding short alphanumeric strings as base37 numbers (see `Tokens-A-New-Lua-Type.md`). Previously, tokens could only be created in C++ and inserted into the Lua environment via `LuaTokenConstant`. This patch adds a literal syntax to the Lua parser so that tokens can be written directly in Lua source code using the `@` prefix:
Tokens are lightuserdata values encoding short alphanumeric
strings as base37 numbers (see `Tokens-A-New-Lua-Type.md`).
This patch adds a literal syntax to the Lua parser so that
tokens can be written directly in Lua source code using the
`@` prefix:
```lua
local x = @null
local y = @found
```
The lexer (llex.c) recognizes `@` followed by one or more alphanumeric characters (a-z, 0-9, case insensitive, max 12 characters). It encodes the string as a base37 number using the same encoding as `LuaToken::parse()` in luastack.hpp and produces a `TK_TOKEN` token. The parser (lparser.c) handles `TK_TOKEN` in `simpleexp()` by storing it as a lightuserdata constant in the function's constant table via `luaK_lightuserdataK()` in lcode.c.
Underscores are not valid in token literals. Writing `@foo_bar` produces a lexer error rather than silently splitting into token `@foo` and identifier `_bar`.
This patch is live and functioning.

View File

@@ -1,37 +1,72 @@
### A New Lua Type: Tokens
Tokens are a custom Lua data type built on top of Lua's lightuserdata. They are mainly intended for use as sentinels and special reserved values.
Tokens are a custom Lua data type built on top of Lua's
lightuserdata. They are mainly intended for use as sentinels
and special reserved values.
## Motivation
Tokens were invented when we were developing a JSON-to-LUA converter. Such a converter is mostly straightforward: json tables and lua tables are very similar. However, we did encounter a stumbling block. Consider this JSON:
Tokens were invented when we were developing a JSON-to-LUA
converter. Such a converter is mostly straightforward: json
tables and lua tables are very similar. However, we did
encounter a stumbling block. Consider this JSON:
```json
{ "foo": null }
```
In Lua, setting a table key to nil deletes the key. There is no way to represent "foo is present with value null" in a Lua table. You might try `{foo = 0}` or `{foo = "null"}`, but both are lossy: you can no longer distinguish JSON null from the number 0 or the string "null". Any sentinel value drawn from an existing Lua type collides with legitimate values of that type.
In Lua, setting a table key to nil deletes the key. There is
no way to represent "foo is present with value null" in a
Lua table. You might try `{foo = 0}` or `{foo = "null"}`,
but both are lossy: you can no longer distinguish JSON null
from the number 0 or the string "null". Any sentinel value
drawn from an existing Lua type collides with legitimate
values of that type.
The solution is to use lightuserdata. A lightuserdata is a distinct Lua type — it cannot be confused with a string, number, boolean, or nil, and unlike nil, it can be stored in a table. The Luprex engine does not use lightuserdata for any other purpose, so all lightuserdata values are available for use as tokens.
The solution is to use lightuserdata. A lightuserdata is a
distinct Lua type — it cannot be confused with a string,
number, boolean, or nil, and unlike nil, it can be stored in
a table. The Luprex engine does not use lightuserdata for
any other purpose, so all lightuserdata values are available
for use as tokens.
## What a Token Is
A token is a short string encoded as a base36 number and stored in the 8-byte lightuserdata value. The lightuserdata is not actually a pointer to anything — it holds the base36-encoded integer directly. Tokens may only contain the characters a-z and 0-9. Since 36^12 fits in 64 bits but 36^13 does not, the maximum token length is 12 characters. That is sufficient for most natural identifiers.
A token is a short string encoded as a base37 number and
stored in the 8-byte lightuserdata value. The lightuserdata
is not actually a pointer to anything — it holds the
base37-encoded integer directly. Tokens may only contain the
characters a-z and 0-9, and the null terminator. Since 37^12
fits in 64 bits but 37^13 does not, the maximum token length
is 12 characters. That is sufficient for most natural
identifiers.
Since lightuserdata is not used for anything else, it is safe to assume that any lightuserdata in our engine represents a token.
## The Lua Lexer
We have modified the lua lexer/parser to support tokens.
To write a token in lua, use an @ sign:
local x = @hello
This actually stores a light user data constant in x.
## The C++ Side: struct LuaToken
On the C++ side, tokens are represented by `struct LuaToken` (in luastack.hpp). You can construct one from a string or from the raw integer:
On the C++ side, tokens are represented by `struct LuaToken`
(in luastack.hpp). You can construct one from a string:
```cpp
LuaToken("null") // parsed at compile time via consteval — becomes 0x10FAA9
LuaToken(0x10FAA9) // equivalent raw value
LuaToken("null")
```
The string form is preferred — it is readable, and because the constructor is `consteval`, it compiles down to the same constant as the raw integer. There is zero runtime cost. If the string contains invalid characters (anything outside a-z, 0-9) or is too long, the error is caught at compile time.
This constructor is `consteval`, this is as efficient as a
literal integer. If the string contains invalid characters
(anything outside a-z, 0-9) or is too long, the error is
caught at compile time.
There is also a runtime constructor that accepts `std::string_view`, for cases where the token string is not known at compile time.
There is also a runtime constructor that accepts
`std::string_view`, for cases where the token string is not
known at compile time.
The LuaStack API provides the usual accessors for tokens:
@@ -41,34 +76,36 @@ LuaToken t = LS.cktoken(slot) // extract a token (error if not lightuserdat
auto t = LS.trytoken(slot) // extract a token (returns empty optional on mismatch)
```
Named token constants can be auto-registered into the Lua environment using the `LuaTokenConstant` macro, which works the same way `LuaDefine` auto-registers functions:
Named token constants can be auto-registered into the Lua
environment using the `LuaTokenConstant` macro, which works
the same way `LuaDefine` auto-registers functions:
```cpp
LuaTokenConstant(null, "null", "Represents JSON null")
LuaTokenConstant(json_null, "null", "Represents JSON null")
```
## Properties
- **Distinct type.** Tokens are lightuserdata, a separate Lua type. They cannot collide with strings, numbers, booleans, tables, or nil.
- **Storable in tables.** Unlike nil, tokens can be used as both table keys and table values.
- **Storable in tables.** Tokens can be used as both table keys and table values.
- **No allocation.** Tokens are 8 bytes inline. There is no heap allocation and no string interning.
- **Fast comparison.** Comparing two tokens is just an integer comparison.
## Limitation: No Token Literals in Lua
Lua's parser has no syntax for token literals. In C++, you can write `LuaToken("null")` and it's clean and compile-time. In Lua, there is no equivalent — you cannot write a token literal the way you write `"hello"` or `42`.
Currently, the way tokens are made available to Lua is that C++ code uses `LuaTokenConstant` to insert specific token values into global tables. Lua scripts can then reference these pre-registered constants by name.
Modifying the Lua parser to add token literal syntax has been considered but is unappealing — it would be a significant and invasive patch. Adding a Lua function like `token("null")` to construct tokens at runtime is also possible and not off the table, but there hasn't been a need for it yet.
## Passing Tokens to Unreal
Tokens can get passed to Unreal in a variety of ways. For example, in animation step key-value pairs, the value can be a token. When animation queues are passed to Unreal, tokens are converted to FNames. Since both tokens and FNames are short identifier-like strings with fast comparison, the mapping is natural.
Tokens can get passed to Unreal in a variety of ways. For
example, in animation step key-value pairs, the value can be
a token. When tokens are passed to Unreal, they are
converted to FNames. Since both tokens and FNames are short
identifier-like strings with fast comparison, the mapping is
natural.
## Usage
Tokens are mainly intended as sentinels and special reserved values. The JSON null example above is the motivating case, but tokens can represent any short reserved constant the engine needs.
Tokens are mainly intended as sentinels and special reserved
values. The JSON null example above is the motivating case,
but tokens can represent any short reserved constant the
engine needs.
## Serialization and Difference Transmission