4.0 KiB
A New Lua Type: Tokens
Tokens are a custom Lua data type built on top of Lua's lightuserdata. They are mainly intended for use as sentinels and special reserved values.
Motivation
Tokens were invented when we were developing a JSON-to-LUA converter. Such a converter is mostly straightforward: json tables and lua tables are very similar. However, we did encounter a stumbling block. Consider this JSON:
{ "foo": null }
In Lua, setting a table key to nil deletes the key. There is
no way to represent "foo is present with value null" in a
Lua table. You might try {foo = 0} or {foo = "null"},
but both are lossy: you can no longer distinguish JSON null
from the number 0 or the string "null". Any sentinel value
drawn from an existing Lua type collides with legitimate
values of that type.
The solution is to use lightuserdata. A lightuserdata is a distinct Lua type — it cannot be confused with a string, number, boolean, or nil, and unlike nil, it can be stored in a table. The Luprex engine does not use lightuserdata for any other purpose, so all lightuserdata values are available for use as tokens.
What a Token Is
A token is a short string encoded as a base37 number and stored in the 8-byte lightuserdata value. The lightuserdata is not actually a pointer to anything — it holds the base37-encoded integer directly. Tokens may only contain the characters a-z and 0-9, and the null terminator. Since 37^12 fits in 64 bits but 37^13 does not, the maximum token length is 12 characters. That is sufficient for most natural identifiers.
The Lua Lexer
We have modified the lua lexer/parser to support tokens. To write a token in lua, use an @ sign:
local x = @hello
This actually stores a light user data constant in x.
The C++ Side: struct LuaToken
On the C++ side, tokens are represented by struct LuaToken
(in luastack.hpp). You can construct one from a string:
LuaToken("null")
This constructor is consteval, this is as efficient as a
literal integer. If the string contains invalid characters
(anything outside a-z, 0-9) or is too long, the error is
caught at compile time.
There is also a runtime constructor that accepts
std::string_view, for cases where the token string is not
known at compile time.
The LuaStack API provides the usual accessors for tokens:
LS.set(slot, LuaToken("null")) // store a token in a LuaSlot
LuaToken t = LS.cktoken(slot) // extract a token (error if not lightuserdata)
auto t = LS.trytoken(slot) // extract a token (returns empty optional on mismatch)
Named token constants can be auto-registered into the Lua
environment using the LuaTokenConstant macro, which works
the same way LuaDefine auto-registers functions:
LuaTokenConstant(json_null, "null", "Represents JSON null")
Properties
- Distinct type. Tokens are lightuserdata, a separate Lua type. They cannot collide with strings, numbers, booleans, tables, or nil.
- Storable in tables. Tokens can be used as both table keys and table values.
- No allocation. Tokens are 8 bytes inline. There is no heap allocation and no string interning.
- Fast comparison. Comparing two tokens is just an integer comparison.
Passing Tokens to Unreal
Tokens can get passed to Unreal in a variety of ways. For example, in animation step key-value pairs, the value can be a token. When tokens are passed to Unreal, they are converted to FNames. Since both tokens and FNames are short identifier-like strings with fast comparison, the mapping is natural.
Usage
Tokens are mainly intended as sentinels and special reserved values. The JSON null example above is the motivating case, but tokens can represent any short reserved constant the engine needs.
Serialization and Difference Transmission
I believe that we properly serialize and difference transmit tokens.
- the serialize_lua function handles tokens explicitly
- the difference transmitter has code for tokens
- eris always saved lightuserdata as 64-bit numbers