Files
integration/Docs/Tokens-A-New-Lua-Type.md

4.6 KiB

A New Lua Type: Tokens

Tokens are a custom Lua data type built on top of Lua's lightuserdata. They are mainly intended for use as sentinels and special reserved values.

Motivation

Tokens were invented when we were developing a JSON-to-LUA converter. Such a converter is mostly straightforward: json tables and lua tables are very similar. However, we did encounter a stumbling block. Consider this JSON:

{ "foo": null }

In Lua, setting a table key to nil deletes the key. There is no way to represent "foo is present with value null" in a Lua table. You might try {foo = 0} or {foo = "null"}, but both are lossy: you can no longer distinguish JSON null from the number 0 or the string "null". Any sentinel value drawn from an existing Lua type collides with legitimate values of that type.

The solution is to use lightuserdata. A lightuserdata is a distinct Lua type — it cannot be confused with a string, number, boolean, or nil, and unlike nil, it can be stored in a table. The Luprex engine does not use lightuserdata for any other purpose, so all lightuserdata values are available for use as tokens.

What a Token Is

A token is a short string encoded as a base36 number and stored in the 8-byte lightuserdata value. The lightuserdata is not actually a pointer to anything — it holds the base36-encoded integer directly. Tokens may only contain the characters a-z and 0-9. Since 36^12 fits in 64 bits but 36^13 does not, the maximum token length is 12 characters. That is sufficient for most natural identifiers.

Since lightuserdata is not used for anything else, it is safe to assume that any lightuserdata in our engine represents a token.

The C++ Side: struct LuaToken

On the C++ side, tokens are represented by struct LuaToken (in luastack.hpp). You can construct one from a string or from the raw integer:

LuaToken("null")       // parsed at compile time via consteval — becomes 0x10FAA9
LuaToken(0x10FAA9)     // equivalent raw value

The string form is preferred — it is readable, and because the constructor is consteval, it compiles down to the same constant as the raw integer. There is zero runtime cost. If the string contains invalid characters (anything outside a-z, 0-9) or is too long, the error is caught at compile time.

There is also a runtime constructor that accepts std::string_view, for cases where the token string is not known at compile time.

The LuaStack API provides the usual accessors for tokens:

LS.set(slot, LuaToken("null"))     // store a token in a LuaSlot
LuaToken t = LS.cktoken(slot)      // extract a token (error if not lightuserdata)
auto t = LS.trytoken(slot)          // extract a token (returns empty optional on mismatch)

Named token constants can be auto-registered into the Lua environment using the LuaTokenConstant macro, which works the same way LuaDefine auto-registers functions:

LuaTokenConstant(null, "null", "Represents JSON null")

Properties

  • Distinct type. Tokens are lightuserdata, a separate Lua type. They cannot collide with strings, numbers, booleans, tables, or nil.
  • Storable in tables. Unlike nil, tokens can be used as both table keys and table values.
  • No allocation. Tokens are 8 bytes inline. There is no heap allocation and no string interning.
  • Fast comparison. Comparing two tokens is just an integer comparison.

Limitation: No Token Literals in Lua

Lua's parser has no syntax for token literals. In C++, you can write LuaToken("null") and it's clean and compile-time. In Lua, there is no equivalent — you cannot write a token literal the way you write "hello" or 42.

Currently, the way tokens are made available to Lua is that C++ code uses LuaTokenConstant to insert specific token values into global tables. Lua scripts can then reference these pre-registered constants by name.

Modifying the Lua parser to add token literal syntax has been considered but is unappealing — it would be a significant and invasive patch. Adding a Lua function like token("null") to construct tokens at runtime is also possible and not off the table, but there hasn't been a need for it yet.

Passing Tokens to Unreal

Tokens can get passed to Unreal in a variety of ways. For example, in animation step key-value pairs, the value can be a token. When animation queues are passed to Unreal, tokens are converted to FNames. Since both tokens and FNames are short identifier-like strings with fast comparison, the mapping is natural.

Usage

Tokens are mainly intended as sentinels and special reserved values. The JSON null example above is the motivating case, but tokens can represent any short reserved constant the engine needs.