Unknown mess

This commit is contained in:
2026-02-17 13:28:09 -05:00
57 changed files with 2518 additions and 1318 deletions

View File

@@ -61,6 +61,11 @@ This patch adds the lua function *genlt* and the C function *lua_genlt*. This is
This patch is live and functioning. The generalized less-than operator is quite useful as the second parameter to lua's builtin *table.sort* function. We have also provided an iterator *table.sortedpairs* that is similar to the lua builtin *table.pairs* that iterates over a table in sorted order. This implicitly uses the *genlt* comparison operator.
We originally designed this patch to help with determinism.
But we eventually realized it was the wrong design, and we
ended up not needing it for determinism. It's still a
very useful function, though.
## The Table Iterator Patch
This patch is designed to address the nondeterminism of the lua 'next' iterator. In the original Lua design, table iteration was nondeterministic. By that, I mean that in the original lua, I can create two empty tables, I can then perform the same sequence of insertions and deletions on those two tables. The two tables are identical: they have the same keys, and they've had the same same sequence of operations applied to them. But I can iterate over them using "table.pairs", and they produce their keys in two different orders. That's nondeterminism.
@@ -81,23 +86,63 @@ This patch is live and is used implicitly whenever you iterate over a lua table.
## The Table Length Patch
The builtin lua function lua_len is nondeterministic. By that, I mean that two tables with the exact same keys might return different values for lua_len. We can't allow nondeterministic anything in our version of Lua. We have altered the implementation of lua_len so that it is deterministic. Two tables with the same keys will always return the same lua_len, that is now guaranteed.
I've changed the lua length operator so that when it is
applied to a table, it returns the number of keys in the
table. It does this in constant time. This change affects
lua_len, lua_rawlen, and the lua # operator.
Our new implementation of lua_len conforms to the specification in the documentation. I'm not sure that's the right thing to do.
You might be wondering what the lua length operator
used to do? The lua documentation says this:
It's obvious how this specification got written: they implemented an algorithm to find the length of a vector as efficiently as possible. By "vector," I mean a table whose keys are 1,2,3,4,5 and so forth. After they wrote this vector-length algorithm, somebody asked, "what happens if you apply that algorithm to a table that's not a vector?" The implementor replied, "it wasn't meant for non-vectors." "Ok, but what *does* it do if you apply it to a non-vector?" They puzzled it out, and they wrote down what it does as "the specification." But that specification when applied to non-vectors isn't *useful*. It's just what their vector-length algorithm happens to spit out when you feed it an input that it wasn't designed to handle.
> The length operator applied on a table returns a border in
> that table. A border in a table t is any non-negative
> integer that satisfies the following condition:
>
> (border == 0 or t[border] ~= nil) and
> (t[border + 1] == nil or border == math.maxinteger)
So why did they use an algorithm that only works on vectors? Why not use a better algorithm, one that can return the number of keys in the table regardless of whether the table is a vector? The answer is that given the lua table internal representation, returning the number of keys in the table is O(N), whereas the vector-only implementation is usually O(1).
Those are *terrible* semantics:
However, I had to change the table internal representation for the table iterator patch (above). With the modified table representation, returning the number of keys in the table can be done in constant time, whether it's a vector or not.
- They're not useful for anything.
- It's not deterministic.
- In no sense of the word "length" is this the
length of the table.
Let me explain how that mess happened. They obviously
wanted the length operator to return the number of keys in
the table. Unfortunately, to count the number of keys in a
lua table actually takes O(N) time. So they came up with a
hack to make it faster: O(1). Unfortunately, the hack relies
on the table being a vector. That is, the table must have
numbered keys starting with 1. As long as you apply their
hack to a vector, it works perfectly and returns the
number of keys.
Now, I'm seriously tempted to have lua_len just return the number of keys in the table. That would be so straightforward and self-explanatory, and faster than the current algorithm. The only reason I haven't done this is that it wouldn't conform to the specification! My new lua_len algorithm is similar to the original algorithm, in that it fails in exactly the same way on non-vectors, in order to be compliant with the specification.
Unfortunately, if you apply the hacked length algorithm to a
table that isn't a vector, it doesn't work at all.
Since this feels insane, I have also provided a totally new API function: lua_nkeys. This returns the number of keys in the table, full stop. It's constant-time.
But I think the lua documentation didn't want to admit, "it
doesn't work at all." So instead, they invented this
concept of "a border" and pretended that was in some way a
helpful result. They should have just said, "the result is
undefined."
This patch also includes a function lua_nthkey, to get the Nth item in the table iteration, random-access style. I am not certain that this is a good idea, and I have deliberately avoided the use of this function for now, until I am convinced that it's wise.
I had to change the table internal representation
for the table iterator patch (above). With the modified
table representation, returning the number of keys in the
table can be done in constant time, whether it's a vector or
not. So I changed the length operator to just return
the number of keys, full stop.
This patch is live, and is necessary to the determinism of the system.
I've also added another function, lua_nkeys. This also
returns the number of keys in the table. It doesn't add any
functionality - I could use lua_rawlen and that would work
just as well. However, using lua_nkeys emphasizes the fact
that my code needs the *real* table length, not the "border"
bullshit that lua used to provide.
This patch is live, and is necessary to the determinism of
the system.
## The Table Flag Bits Patch
@@ -170,7 +215,7 @@ That's not bad, but it puts both values and methods into the same namespace:
- By putting both values and methods into the same namespace, we create the possibility of unintended mistakes.
I am thinking about implementing a new metatable entry: __METHODS = true. If this flag is present, then the colon operator *obj:method* looks for the method in the metatable, instead of looking for it in the object. With this new metamethod, the way to create a class would be to make a table full of methods, and then in the class table, put __METHODS = true. Then you would do this:
I am thinking about implementing a new metatable entry: __METHODS = true. If this flag is present, then the colon operator *obj:method* looks for the method in the metatable, instead of looking for it in the object. With this new metamethod, the way to create a class would be to make a table full of methods, and then in that table, also put __METHODS = true. Then you would do this:
```lua
setmetatable(obj, class)
@@ -200,6 +245,6 @@ There's no obvious approach to fixing this, so I haven't patched it yet.
GC Finalizers and weak tables both introduce nondeterminism into Lua execution. We can't allow that. It may be necessary to patch the lua interpreter to simply disable these functions. Alternately, we could simply ask the scripters not to use these features, and declare "undefined behavior" if they do.
Update 1: I'm using GC finalizers in some cases to clean up userdata objects. I think it's safe as long as the only thing the finalizer does is free memory.
Update 1: I'm using GC finalizers in some cases to clean up userdata objects. I think it's safe as long as the only thing the finalizer does is free memory. (NOTE: WHERE?)
Update 2: I don't remember using userdata objects at all. I am not sure that Update 1 is the truth any more.