diff --git a/Docs/A-Summary-of-our-Lua-Patches.md b/Docs/A-Summary-of-our-Lua-Patches.md index a9854fbe..1998aa92 100644 --- a/Docs/A-Summary-of-our-Lua-Patches.md +++ b/Docs/A-Summary-of-our-Lua-Patches.md @@ -251,15 +251,14 @@ Update 2: I don't remember using userdata objects at all. I am not sure that Upd ## Token Literal Syntax Patch -Tokens are lightuserdata values encoding short alphanumeric strings as base37 numbers (see `Tokens-A-New-Lua-Type.md`). Previously, tokens could only be created in C++ and inserted into the Lua environment via `LuaTokenConstant`. This patch adds a literal syntax to the Lua parser so that tokens can be written directly in Lua source code using the `@` prefix: +Tokens are lightuserdata values encoding short alphanumeric +strings as base37 numbers (see `Tokens-A-New-Lua-Type.md`). +This patch adds a literal syntax to the Lua parser so that +tokens can be written directly in Lua source code using the +`@` prefix: ```lua local x = @null local y = @found ``` - -The lexer (llex.c) recognizes `@` followed by one or more alphanumeric characters (a-z, 0-9, case insensitive, max 12 characters). It encodes the string as a base37 number using the same encoding as `LuaToken::parse()` in luastack.hpp and produces a `TK_TOKEN` token. The parser (lparser.c) handles `TK_TOKEN` in `simpleexp()` by storing it as a lightuserdata constant in the function's constant table via `luaK_lightuserdataK()` in lcode.c. - -Underscores are not valid in token literals. Writing `@foo_bar` produces a lexer error rather than silently splitting into token `@foo` and identifier `_bar`. - This patch is live and functioning. diff --git a/Docs/Tokens-A-New-Lua-Type.md b/Docs/Tokens-A-New-Lua-Type.md index f584007d..a7e1212f 100644 --- a/Docs/Tokens-A-New-Lua-Type.md +++ b/Docs/Tokens-A-New-Lua-Type.md @@ -1,37 +1,72 @@ ### A New Lua Type: Tokens -Tokens are a custom Lua data type built on top of Lua's lightuserdata. They are mainly intended for use as sentinels and special reserved values. +Tokens are a custom Lua data type built on top of Lua's +lightuserdata. They are mainly intended for use as sentinels +and special reserved values. ## Motivation -Tokens were invented when we were developing a JSON-to-LUA converter. Such a converter is mostly straightforward: json tables and lua tables are very similar. However, we did encounter a stumbling block. Consider this JSON: +Tokens were invented when we were developing a JSON-to-LUA +converter. Such a converter is mostly straightforward: json +tables and lua tables are very similar. However, we did +encounter a stumbling block. Consider this JSON: ```json { "foo": null } ``` -In Lua, setting a table key to nil deletes the key. There is no way to represent "foo is present with value null" in a Lua table. You might try `{foo = 0}` or `{foo = "null"}`, but both are lossy: you can no longer distinguish JSON null from the number 0 or the string "null". Any sentinel value drawn from an existing Lua type collides with legitimate values of that type. +In Lua, setting a table key to nil deletes the key. There is +no way to represent "foo is present with value null" in a +Lua table. You might try `{foo = 0}` or `{foo = "null"}`, +but both are lossy: you can no longer distinguish JSON null +from the number 0 or the string "null". Any sentinel value +drawn from an existing Lua type collides with legitimate +values of that type. -The solution is to use lightuserdata. A lightuserdata is a distinct Lua type — it cannot be confused with a string, number, boolean, or nil, and unlike nil, it can be stored in a table. The Luprex engine does not use lightuserdata for any other purpose, so all lightuserdata values are available for use as tokens. +The solution is to use lightuserdata. A lightuserdata is a +distinct Lua type — it cannot be confused with a string, +number, boolean, or nil, and unlike nil, it can be stored in +a table. The Luprex engine does not use lightuserdata for +any other purpose, so all lightuserdata values are available +for use as tokens. ## What a Token Is -A token is a short string encoded as a base36 number and stored in the 8-byte lightuserdata value. The lightuserdata is not actually a pointer to anything — it holds the base36-encoded integer directly. Tokens may only contain the characters a-z and 0-9. Since 36^12 fits in 64 bits but 36^13 does not, the maximum token length is 12 characters. That is sufficient for most natural identifiers. +A token is a short string encoded as a base37 number and +stored in the 8-byte lightuserdata value. The lightuserdata +is not actually a pointer to anything — it holds the +base37-encoded integer directly. Tokens may only contain the +characters a-z and 0-9, and the null terminator. Since 37^12 +fits in 64 bits but 37^13 does not, the maximum token length +is 12 characters. That is sufficient for most natural +identifiers. -Since lightuserdata is not used for anything else, it is safe to assume that any lightuserdata in our engine represents a token. +## The Lua Lexer + +We have modified the lua lexer/parser to support tokens. +To write a token in lua, use an @ sign: + + local x = @hello + +This actually stores a light user data constant in x. ## The C++ Side: struct LuaToken -On the C++ side, tokens are represented by `struct LuaToken` (in luastack.hpp). You can construct one from a string or from the raw integer: +On the C++ side, tokens are represented by `struct LuaToken` +(in luastack.hpp). You can construct one from a string: ```cpp -LuaToken("null") // parsed at compile time via consteval — becomes 0x10FAA9 -LuaToken(0x10FAA9) // equivalent raw value +LuaToken("null") ``` -The string form is preferred — it is readable, and because the constructor is `consteval`, it compiles down to the same constant as the raw integer. There is zero runtime cost. If the string contains invalid characters (anything outside a-z, 0-9) or is too long, the error is caught at compile time. +This constructor is `consteval`, this is as efficient as a +literal integer. If the string contains invalid characters +(anything outside a-z, 0-9) or is too long, the error is +caught at compile time. -There is also a runtime constructor that accepts `std::string_view`, for cases where the token string is not known at compile time. +There is also a runtime constructor that accepts +`std::string_view`, for cases where the token string is not +known at compile time. The LuaStack API provides the usual accessors for tokens: @@ -41,34 +76,36 @@ LuaToken t = LS.cktoken(slot) // extract a token (error if not lightuserdat auto t = LS.trytoken(slot) // extract a token (returns empty optional on mismatch) ``` -Named token constants can be auto-registered into the Lua environment using the `LuaTokenConstant` macro, which works the same way `LuaDefine` auto-registers functions: +Named token constants can be auto-registered into the Lua +environment using the `LuaTokenConstant` macro, which works +the same way `LuaDefine` auto-registers functions: ```cpp -LuaTokenConstant(null, "null", "Represents JSON null") +LuaTokenConstant(json_null, "null", "Represents JSON null") ``` ## Properties - **Distinct type.** Tokens are lightuserdata, a separate Lua type. They cannot collide with strings, numbers, booleans, tables, or nil. -- **Storable in tables.** Unlike nil, tokens can be used as both table keys and table values. +- **Storable in tables.** Tokens can be used as both table keys and table values. - **No allocation.** Tokens are 8 bytes inline. There is no heap allocation and no string interning. - **Fast comparison.** Comparing two tokens is just an integer comparison. -## Limitation: No Token Literals in Lua - -Lua's parser has no syntax for token literals. In C++, you can write `LuaToken("null")` and it's clean and compile-time. In Lua, there is no equivalent — you cannot write a token literal the way you write `"hello"` or `42`. - -Currently, the way tokens are made available to Lua is that C++ code uses `LuaTokenConstant` to insert specific token values into global tables. Lua scripts can then reference these pre-registered constants by name. - -Modifying the Lua parser to add token literal syntax has been considered but is unappealing — it would be a significant and invasive patch. Adding a Lua function like `token("null")` to construct tokens at runtime is also possible and not off the table, but there hasn't been a need for it yet. - ## Passing Tokens to Unreal -Tokens can get passed to Unreal in a variety of ways. For example, in animation step key-value pairs, the value can be a token. When animation queues are passed to Unreal, tokens are converted to FNames. Since both tokens and FNames are short identifier-like strings with fast comparison, the mapping is natural. +Tokens can get passed to Unreal in a variety of ways. For +example, in animation step key-value pairs, the value can be +a token. When tokens are passed to Unreal, they are +converted to FNames. Since both tokens and FNames are short +identifier-like strings with fast comparison, the mapping is +natural. ## Usage -Tokens are mainly intended as sentinels and special reserved values. The JSON null example above is the motivating case, but tokens can represent any short reserved constant the engine needs. +Tokens are mainly intended as sentinels and special reserved +values. The JSON null example above is the motivating case, +but tokens can represent any short reserved constant the +engine needs. ## Serialization and Difference Transmission