Change the lua length operator to do the obvious thing.
This commit is contained in:
@@ -86,21 +86,63 @@ This patch is live and is used implicitly whenever you iterate over a lua table.
|
||||
|
||||
## The Table Length Patch
|
||||
|
||||
The builtin lua function lua_len is nondeterministic. By that, I mean that two tables with the exact same keys might return different values for lua_len. We can't allow nondeterministic anything in our version of Lua. We have altered the implementation of lua_len so that it is deterministic. Two tables with the same keys will always return the same lua_len, that is now guaranteed.
|
||||
I've changed the lua length operator so that when it is
|
||||
applied to a table, it returns the number of keys in the
|
||||
table. It does this in constant time. This change affects
|
||||
lua_len, lua_rawlen, and the lua # operator.
|
||||
|
||||
Our new implementation of lua_len conforms to the specification in the documentation. I'm not sure that's the right thing to do.
|
||||
You might be wondering what the lua length operator
|
||||
used to do? The lua documentation says this:
|
||||
|
||||
It's obvious how this specification got written: they implemented an algorithm to find the length of a vector as efficiently as possible. By "vector," I mean a table whose keys are 1,2,3,4,5 and so forth. After they wrote this vector-length algorithm, somebody asked, "what happens if you apply that algorithm to a table that's not a vector?" The implementor replied, "it wasn't meant for non-vectors." "Ok, but what *does* it do if you apply it to a non-vector?" They puzzled it out, and they wrote down what it does as "the specification." But that specification when applied to non-vectors isn't *useful*. It's just what their vector-length algorithm happens to spit out when you feed it an input that it wasn't designed to handle.
|
||||
> The length operator applied on a table returns a border in
|
||||
> that table. A border in a table t is any non-negative
|
||||
> integer that satisfies the following condition:
|
||||
>
|
||||
> (border == 0 or t[border] ~= nil) and
|
||||
> (t[border + 1] == nil or border == math.maxinteger)
|
||||
|
||||
So why did they use an algorithm that only works on vectors? Why not use a better algorithm, one that can return the number of keys in the table regardless of whether the table is a vector? The answer is that given the lua table internal representation, returning the number of keys in the table is O(N), whereas the vector-only implementation is usually O(1).
|
||||
Those are *terrible* semantics:
|
||||
|
||||
However, I had to change the table internal representation for the table iterator patch (above). With the modified table representation, returning the number of keys in the table can be done in constant time, whether it's a vector or not.
|
||||
- They're not useful for anything.
|
||||
- It's not deterministic.
|
||||
- In no sense of the word "length" is this the
|
||||
length of the table.
|
||||
|
||||
Let me explain how that mess happened. They obviously
|
||||
wanted the length operator to return the number of keys in
|
||||
the table. Unfortunately, to count the number of keys in a
|
||||
lua table actually takes O(N) time. So they came up with a
|
||||
hack to make it faster: O(1). Unfortunately, the hack relies
|
||||
on the table being a vector. That is, the table must have
|
||||
numbered keys starting with 1. As long as you apply their
|
||||
hack to a vector, it works perfectly and returns the
|
||||
number of keys.
|
||||
|
||||
Now, I'm seriously tempted to have lua_len just return the number of keys in the table. That would be so straightforward and self-explanatory, and faster than the current algorithm. The only reason I haven't done this is that it wouldn't conform to the specification! My new lua_len algorithm is similar to the original algorithm, in that it fails in exactly the same way on non-vectors, in order to be compliant with the specification.
|
||||
Unfortunately, if you apply the hacked length algorithm to a
|
||||
table that isn't a vector, it doesn't work at all.
|
||||
|
||||
Since this feels insane, I have also provided a totally new API function: lua_nkeys. This returns the number of keys in the table, full stop. It's constant-time.
|
||||
But I think the lua documentation didn't want to admit, "it
|
||||
doesn't work at all." So instead, they invented this
|
||||
concept of "a border" and pretended that was in some way a
|
||||
helpful result. They should have just said, "the result is
|
||||
undefined."
|
||||
|
||||
This patch is live, and is necessary to the determinism of the system.
|
||||
I had to change the table internal representation
|
||||
for the table iterator patch (above). With the modified
|
||||
table representation, returning the number of keys in the
|
||||
table can be done in constant time, whether it's a vector or
|
||||
not. So I changed the length operator to just return
|
||||
the number of keys, full stop.
|
||||
|
||||
I've also added another function, lua_nkeys. This also
|
||||
returns the number of keys in the table. It doesn't add any
|
||||
functionality - I could use lua_rawlen and that would work
|
||||
just as well. However, using lua_nkeys emphasizes the fact
|
||||
that my code needs the *real* table length, not the "border"
|
||||
bullshit that lua used to provide.
|
||||
|
||||
This patch is live, and is necessary to the determinism of
|
||||
the system.
|
||||
|
||||
## The Table Flag Bits Patch
|
||||
|
||||
|
||||
Reference in New Issue
Block a user