15 KiB
A Summary of Our Lua Patches
This document lists the set of patches we need to make to the lua interpreter. I'm doing my best to keep the set of patches to a minimum, and to keep the patches as simple as possible.
Error Line Number Patch
Standard Lua is pretty inconsistent about how it reports the line number and filename of errors:
- Some errors have the line number and filename right in the error message. Some don't.
- Some errors generate stack tracebacks, some don't.
- When there's a traceback, the file and line number in the error message is redundant.
The goal of this patch is to achieve a more consistent state of affairs. First, we follow the rule that every error must include a stack traceback. Doing this doesn't require any patches to lua. It just requires consistent use of Lua's traceback facilities at every entry point into lua.
Second, since the traceback will always contain the file and line number, and since there will always be a traceback, there's no reason to put the file and line number into the error message itself. We have taken steps to remove that feature. To accomplish this, the following patches to lua were made:
- removed calls to 'lua_where' (replacing them with 'lua_no_where')
- removed calls to 'addinfo' (replacing them with 'no_addinfo')
This patch is live and functioning.
Print Integers Patch
Lua numbers are actually double-precision floating point. There is no separate "int" data type. If you try to store an integer in lua, it gets stored as a double-precision float. That's usually just fine. Double precision can store integers up to 53 bits losslessly and precisely.
When you use the builtin lua function "tostring" that converts a number to a string, this is actually implemented using 'sprintf', using the format directive in LUA_NUMBER_FMT, which by default is "%.14d". As you can see, that's a number format for double-precision numbers, which makes sense, because lua numbers are double-precision floats.
That particular format has the effect of printing certain large integers in scientific notation. I found that to be problematic, because you can't see all the digits. It's a very bad format for printing, say, unique ID numbers.
So, I changed the default format from "%.14d" to "%.16d". That may seem like a trivial change, but it makes it so that every large integer up to 53 bits, which is the largest integer that can be losslessly stored in a double, is printed as a simple integer, showing all the digits.
The 'print', 'pprint,' and 'tostring' routines built into our lua interpreter do not use this builtin functionality. We have rewritten all of those functions from scratch, to take precise control over what is printed. None of those functions rely on the builtin sprintf.
The patch is probably still live, though. The builtin sprintf probably does get called implicitly from inside certain lua error message generating routines. It's just of very minor importance these days.
The Generalized Less-Than Patch
The standard lua less-than operator will throw an error if you try to compare two objects of different types, or if you try to compare two tables, two functions, or two threads.
This patch adds the lua function genlt and the C function lua_genlt. This is a generalized less-than operator to compare any two lua objects. It always returns true or false, and never generates an error. Here is how comparison works:
- boolean: false is less than true.
- number: ordered by ascending value.
- string: ordered lexically.
- table: all tables are equal (ie, not less than).
- functions: all functions are equal (ie, not less than).
- threads: all threads are equal (ie, not less than).
- two different types: they are ordered nil, bool, number, string, table, function, thread.
This patch is live and functioning. The generalized less-than operator is quite useful as the second parameter to lua's builtin table.sort function. We have also provided an iterator table.sortedpairs that is similar to the lua builtin table.pairs that iterates over a table in sorted order. This implicitly uses the genlt comparison operator.
We originally designed this patch to help with determinism. But we eventually realized it was the wrong design, and we ended up not needing it for determinism. It's still a very useful function, though.
The Table Iterator Patch
This patch is designed to address the nondeterminism of the lua 'next' iterator. In the original Lua design, table iteration was nondeterministic. By that, I mean that in the original lua, I can create two empty tables, I can then perform the same sequence of insertions and deletions on those two tables. The two tables are identical: they have the same keys, and they've had the same same sequence of operations applied to them. But I can iterate over them using "table.pairs", and they produce their keys in two different orders. That's nondeterminism.
We can't allow nondeterminism in our version of Lua. To fix this was a big deal: we had to change the internal representation of lua tables. Internally, a lua table now contains a separate "sequence" vector which works as follows:
-
When you insert a new key into a table (by mapping that key to non-nil), the new key is appended to the sequence.
-
When you remove a key from a table (by mapping that key to nil) that key is removed from the sequence, creating a gap in the sequence. The last key in the sequence is moved to fill the gap in the sequence.
-
When you call the iterator, next(table, k), it finds k in the sequence, and then returns the key immediately after k in the sequence.
Importantly, the sequence is deterministic - given a fixed set of operations on a table, it will always be in the same order. Therefore the next iterator and the lua_next function are now both deterministic, as is the pairs iterator.
There's a very special case. The standard lua 'next' iterator can find a successor to a key that was recently deleted. This functionality is necessary in order to be able to iterate over a table and delete elements while iterating. We support this same functionality, but only on the very most recently deleted key. It's still sufficient to allow deleting table elements while iterating.
This patch is live and is used implicitly whenever you iterate over a lua table.
The Table Length Patch
I've changed the lua length operator so that when it is applied to a table, it returns the number of keys in the table. It does this in constant time. This change affects lua_len, lua_rawlen, and the lua # operator.
You might be wondering what the lua length operator used to do? The lua documentation says this:
The length operator applied on a table returns a border in that table. A border in a table t is any non-negative integer that satisfies the following condition:
(border == 0 or t[border] ~= nil) and (t[border + 1] == nil or border == math.maxinteger)
Those are terrible semantics:
- They're not useful for anything.
- It's not deterministic.
- In no sense of the word "length" is this the length of the table.
Let me explain how that mess happened. They obviously wanted the length operator to return the number of keys in the table. Unfortunately, to count the number of keys in a lua table actually takes O(N) time. So they came up with a hack to make it faster: O(1). Unfortunately, the hack relies on the table being a vector. That is, the table must have numbered keys starting with 1. As long as you apply their hack to a vector, it works perfectly and returns the number of keys.
Unfortunately, if you apply the hacked length algorithm to a table that isn't a vector, it doesn't work at all.
But I think the lua documentation didn't want to admit, "it doesn't work at all." So instead, they invented this concept of "a border" and pretended that was in some way a helpful result. They should have just said, "the result is undefined."
I had to change the table internal representation for the table iterator patch (above). With the modified table representation, returning the number of keys in the table can be done in constant time, whether it's a vector or not. So I changed the length operator to just return the number of keys, full stop.
I've also added another function, lua_nkeys. This also returns the number of keys in the table. It doesn't add any functionality - I could use lua_rawlen and that would work just as well. However, using lua_nkeys emphasizes the fact that my code needs the real table length, not the "border" bullshit that lua used to provide.
This patch is live, and is necessary to the determinism of the system.
The Table Flag Bits Patch
Our difference transmission algorithm does a recursive walk of the tables in a given tangible. That recursive walk requires a visited bit in each table. Of course, the lua way of doing this would be to store the set of visited tables as a separate table. But that would be a lot slower than just setting a bit, and difference transmission is the core of our system's performance bottleneck.
We also need to store, in each table, a table-type enum. We have several subtypes of tables: general tables, tangible tables, class tables, and so forth. The difference transmitter treats different types of tables differently.
This patch adds a 16-bit "flagbits" field to every Lua table. We have added these lua API functions to access these flag bits:
uint16_t lua_getflagbits(lua_State *L, int index);
void lua_setflagbits(lua_State *L, int index, uint16_t flagbits);
void lua_modflagbits(lua_State *L, int index, uint16_t clearbits, uint16_t setbits);
The eris code for serializing the lua data structures has been modified to save and restore the flagbits. Aside from simply storing them, and saving and restoring them with eris, the lua runtime doesn't do anything at all with the flagbits.
The Luprex engine has set aside four of the bits to store the table-type enum. It has set aside one of the bits for the 'visited' bit of the difference transmission algorithm. The rest of the bits are currently unused.
This patch is live and in-use.
The Insert Frame Patch
When we write C functions for Lua, we allocate a "stack frame" on the lua stack. This is accomplished by class LuaDefStack. See the document "Our In-House Lua API" for more information.
This function has to insert some "nils" into the base of the stack. The lua API does have a function that can do this, but using it would be O(N^2). Since this functionality is used in every single C function for Lua, we decided to optimize things a little. We added a function to the lua API that can do it in O(N) time. The name of the function is lua_insert_frame, which sounds fancy, but all it does is insert N "nils" at the bottom of the stack.
This patch is live and is used in class LuaDefStack.
The C++ Exceptions Patch
We've compiled lua to use C++ exceptions instead of longjmp. The advantage of this is that if you do a lua_yield or lua_error, any C++ destructors on the stack will get called.
Although lua_yield and lua_error both throw C++ exceptions, Lua cannot deal with C++ exceptions except for those it generates itself. Therefore:
- Never call the lua interpreter inside a C++ catch-block!
- Never throw an exception from inside a LuaDefine!
Exception 1: If you throw an uncaught exception, all that does is terminate the program. It's always legal to terminate the program.
Exception 2: If you throw an exception inside a LuaDefine and then catch it inside the same LuaDefine, that's OK, because the lua interpreter is not getting unwound.
Using C++ exceptions in lua_yield and lua_error means that C++ destructors get called. Normally, calling destructors is a good thing. However, there is one known case where this causes issues: class LuaExtStack. Class LuaExtStack pushes values onto the lua stack in its constructor, and later, in its destructor, it pops those values back off. Straightforward enough. But if you throw an error using the lua_error function, then the error message is pushed on top of the lua stack. If the throw triggers the LuaExtStack destructor, then LuaExtStack will pop the stack, and in doing so, it will unintentionally throw out the error message. Oops.
To fix this, we had to add a lua patch, which adds a new API function "lua_isthrowing." This API function is used by the LuaExtStack destructor, to decide whether to clean up the stack or not. This new API function is not used anywhere else in Luprex, and I do not expect it will ever be needed anywhere else.
This patch is live and is needed to keep lua error messages working.
The Object-Oriented Lua Patch
We have a patch to make lua object-oriented. To really understand this patch, we need a separate markdown file, "Object-Oriented-Lua.md". See that file for an explanation.
The Print No Address Patch (Unimplemented)
The lua library function luaL_tolstring converts a lua object into a string, no matter what the type of the lua object is. In the case of tables, the string it generates looks like "table: 0123456701234567", with the address of the table being part of the string. That would make any code that uses that string dependent on the precise address of the table. That's considered nondeterminism for our purposes, it's not allowed.
We don't use luaL_tolstring in the Luprex codebase. However, the luaL_tolstring function is used from within the lua runtime in three different places:
- In the lua function string.format. This is no longer an issue. We wrote our own version of format which replaces the built-in version. Our implementation doesn't use luaL_tostring.
- In the builtin function tostring. This is no longer an issue. We wrote our own version of tostring which replaces the built-in version. Our implementation doesn't use luaL_tolstring.
- In the eris runtime. It is using luaL_tolstring to generate keys that become part of an associative map.
So the obvious thing to do would be to just change the code for luaL_tolstring to not include the table address. But that would totally break the eris code.
Another possibility would be to patch or override the code for string.format. That fixes the problem, but it leaves luaL_tolstring in there as a time-bomb where somebody else might use it not realizing that it's an issue.
There's no obvious approach to fixing this, so I haven't patched it yet.
The GC Finalizer Patch and the Weak Table Patch (Unimplemented)
GC Finalizers and weak tables both introduce nondeterminism into Lua execution. We can't allow that. It may be necessary to patch the lua interpreter to simply disable these functions. Alternately, we could simply ask the scripters not to use these features, and declare "undefined behavior" if they do.
Update 1: I'm using GC finalizers in some cases to clean up userdata objects. I think it's safe as long as the only thing the finalizer does is free memory. (NOTE: WHERE?)
Update 2: I don't remember using userdata objects at all. I am not sure that Update 1 is the truth any more.
Token Literal Syntax Patch
Tokens are lightuserdata values encoding short alphanumeric
strings as base38 numbers (see Tokens-A-New-Lua-Type.md).
This patch adds a literal syntax to the Lua parser so that
tokens can be written directly in Lua source code using the
@ prefix:
local x = @null
local y = @found
This patch is live and functioning.