Files
integration/Docs/Multipass Difference Transmission.md
2026-02-05 12:41:07 -05:00

222 lines
20 KiB
Markdown

### Multipass Difference Transmission
To difference transmit, we have to compare a tangible in the master world model to the corresponding tangible in a client world model. For example, a tangible has an XYZ coordinate. You compare the XYZ of the tangible in the master model, to the XYZ of the corresponding tangible in the client model. This is so straightforward that it hardly merits any explanation. Where it stops being straightforward is when it comes time to compare the Lua tables of the two tangibles.
Suppose that in the master world model, tangibles T1 and T2 both point to lua table A, which points to lua table B, which points back to lua table A. Also, suppose that the client model doesn't contain any of this. We would like for the difference transmitter to be powerful enough to recreate the entire graph, including the two tangibles both pointing to the same table, and including the cycle.
In order for that to be possible, the difference transmitter can't treat each tangible as a separate entity. Instead, it needs to view lua tables as a big interconnected graph with cycles, and it needs to try to reproduce the graph as a whole.
Let's complicate matters further. Suppose that the player is standing next to tangible T1, but tangible T2 is far away. We want the algorithm to reproduce not the whole graph, but only the portion that is reachable from tangible T1. So, if the player is next to T1, but T2 is far away, then the difference transmitter should transmit tangible T1, which points to table A, which points to B, which points back to A. But it should omit tangible T2.
Next, let's complicate matters even further. Suppose the player is standing next to tangible T1, and his client model contains the subset of the graph rooted at T1. But then, he warps from where he is, to where tangible T2 is. At this point, the difference transmitter will try to recreate the portion of the graph rooted at T2.
When it does this, it's OK if the difference transmitter allows tangible T1 to get "out of date", because T1 isn't visible any more. Our difference transmitter creates two new tables, "updated A" and "updated B." T2 points to updated A, which points to updated B, which points to updated A. Meanwhile, T1 points to obsolete A, which points to obsolete B, which points to obsolete A. In short, tangible T1's portion of the graph is now out of date and T1 doesn't point to the updated portion of the graph which is visible from T2.
Next, suppose the player warps from where he is, to a point halfway between T1 and T2, such that both T1 and T2 are within his sight radius. Only now will the difference transmitter create the entire graph I described: when it is done, both T1 and T2 will point to updated A, updated A will point to updated B, and updated B will point to updated A.
The difference transmission algorithm has to do a fair amount of analysis. In order to make the algorithm comprehensible, we have divided it into multiple passes, each of which does something fairly simple. We call this algorithm "multipass difference transmission." This paper describes the multipass difference transmission algorithm.
Not every pass generates server-to-client messages. Some passes are purely done in a local manner, on either the server, the client, or both. When a pass does generate messages, it is expected that those messages get applied to the client models before the next pass begins. In other words, each pass begins after the previous pass has completed fully.
This paper contains a section for each pass of the algorithm.
## Pass: Update Player Position
This pass compares and transmits the animation queue of the player.
The reason this step is first is that the difference transmitter wants to transmit tangibles that are near where the player is standing. So therefore, both client and server need to know where the player is standing. The player's position is actually stored in the animation queue: the XYZ coordinate of the last step in the animation queue is the player's position. So to make sure both client and server know the player's position, we must difference transmit the player's animation queue.
This pass also difference transmits all the other non-Lua data in the player tangible. That includes the player's print-buffer, the player's ID allocator, and possibly some other things. The only reason we do this is that we're transmitting one part of the player tangible, so we might as well transmit all of it.
This pass *only* applies to the player tangible. No other tangible is updated during this pass.
## Pass: Create and Delete Tangibles
The server does a *scanradius* around the player in the master model, and another *scanradius* around the player in the client model. These both return sets of tangible ID numbers. The server calculates the set-union of these two scans. This is the *visible tangible list*. It contains the tangible IDs of all the tangibles that are near the player in either the master or client model. Then, the following steps are taken to update the client:
For every tangible TAN in the visible tangible list:
If TAN exists in both master and client, then compare the animation queues of the two. If necessary, send an animation queue correction to the client.
If TAN does not exist in the client model, transmit a "create tangible" message to the client. The animation queue of the new tangible is transmitted along with the create tangible message. Nothing else about the new tangible is transmitted yet.
If TAN does not exist in the master model, then send a "delete tangible" message to the client.
Once this pass is applied, the set of tangibles in the visible area is guaranteed to match between master and client models, and the tangibles in this radius will all have correct animation queues. Since they have matching animation queues, and since the tangible's XYZ position is stored in the animation queue, they all have matching XYZ positions as well.
## Pass: Generate the Close Tangible List
The server has a *visibility radius*, which is fairly large, and and a *close radius*, which is smaller. Only tangibles within the *close radius* will have their Lua data updated.
The server does a scanradius around the player, using the close radius and the master model. This yields the *close tangible list*.
The invariant from the "Create and Delete Tangibles" pass (above) guarantees that doing the same scanradius using the client model yields exactly the same result. For safety, we do verifications to ensure that the invariant hasn't been broken. The first verification step is that the master does the same scanradius using its own copy of the client model. It checks that the two lists are identical. Then, it sends the hash-value of the *close tangible list* to the client. When the client receives the hash-value, it does its own scanradius, and it verifies that the hash matches.
So at this point, both client and server have a copy of the *close tangible list*, and we are absolutely certain that they are in agreement. Several of the following passes use the *close tangible list*.
## Pass: Number the Tables in the Client Model
We want to give every reachable table in the client model a unique ID number.
This numbering pass is done both on the server's copy of the client model, and the client's copy of the client model. Because of the invariant that both these models are identical at all times, the numbering pass should produce identical numbering on both the server and client. No messages are transmitted during this pass. Client and server both independently compute the exact same result.
Numbering is accomplished by doing a graph traversal algorithm. Since there can be cycles in the graph, we need to use a *visited* bit to ensure that we don't get into an infinite loop. When we visit a table for the first time, we assign it a unique ID number. ID numbers start at 1 and increment from there.
Only "ordinary tables" are numbered and recursed into by the graph traversal. The following are some examples of tables that are not ordinary tables, and therefore are not traversed:
- Tangible databases.
- Classes created by the "makeclass" function.
- The global environment table.
- The lua registry.
We avoid all of these for various reasons. When the graph traversal finds one of these special tables, it just ignores it, it doesn't number it or recurse into it.
In this graph traversal, when we encounter a table entry Key→Value where the Key is not a string, number, or boolean, that table entry is ignored. Strings, numbers, and booleans can are called "sortable" values, because they can be compared less-than. Nowhere in the difference transmission code do we actually do any sorting, but we still call these "sortable keys." Throughout the difference transmission code, we ignore table entries Key→Value if the Key is not sortable.
Metatables are treated as if they were just another Key→Value table entry, METATABLE→Value. They are traversed just like any other Key→Value pair. Throughout the remainder of this paper, we usually don't mention metatables explicitly. But whenever you see a loop over a table's Key→Value pairs, remember that one of the pairs might be METATABLE→Value. We consider METATABLE to be a sortable key.
Graph traversal algorithms need to have a set of "roots" to start at. In this algorithm, we loop over all the close tangibles, and then, within each close tangible, we find all ordinary tables pointed to by the tangible database. It is these ordinary tables that make up the roots of the graph traversal.
The numbers that are assigned by this pass aren't stored in the tables that are being traversed. They are stored in separate maps:
- The *tnmap*, the table-to-number map. This is a lua table containing T→N, where T is a table that was traversed, and N is the number that was assigned.
- The *ntmap*, the number-to-table map. This is a lua table containing N→T, where T is a table that was traversed, and N is the number that was assigned.
These tables get stored in the Lua registry of the client model, in the keys *registry.tnmap* and *registry.ntmap*. These maps get used by a lot of the following difference transmission passes. When difference transmission is done, these two tables are removed from the Lua registry.
It probably would be a good idea to write some sort of verification step involving the transmission of a hash-value, to ensure that the server and client computed the same result. We haven't done that yet.
## Pass: Pair Up Tables in the Master and Client Models
We attempt to "pair up" tables in the master model to their equivalent tables in the client model.
The pairing algorithm runs only on the server. No messages are sent to the client. The server analyzes the client and master model, and tries to find a correspondence between the tables.
When this pairing process begins, the tables in the client model have already been numbered, by the previous pass. The client model contains a *tnmap* and an *ntmap*. But the tables in the master model have no ID numbers. There is no *ntmap* or *tnmap* in the master model. The pairing algorithm creates a *tnmap* and an *ntmap* in the master model, and it initializes them to be empty.
When the pairing algorithm decides to pair a table TC in the client model to an equivalent table TM in the master model, it does so by copying the table number from TC to TM. Copying the table number consists of the following steps: it fetches the entry TC→N from the *tnmap* in the client model. Then, it stores the entry TM→N in the *tnmap* of the master model, and N→TM in the *ntmap* of the master model. Now the two tables TC and TM both have the same table number.
When the pairing algorithm is done, the master model has a *tnmap* and an *ntmap* similar to the one in the client model, but there is a difference: the maps in the master model may be incomplete. There may be a reachable table in the master model that didn't have an equivalent in the client model. In that case, that particular table will still not have any table number. There will not be an entry for it in the master *tnmap* or *ntmap*.
The algorithm to do the pairing is another graph traversal. This time, it traverses the client model and the master model in parallel, in lockstep. The actual traversal algorithm is a little complicated, so bear with me.
The traversal algorithm maintains a stack containing pairs of tables - one from the master, one from the client. If a pair of tables is on the stack, it means that we have strong evidence that the tables should be paired. However, we still have to do some sanity checks before actually pairing the tables. If they pass the sanity checks, they will be paired.
Initially, the stack is empty. Then, we populate it with the "roots" of the graph traversal algorithm. Here's how we find all the roots:
```
for every tangible ID in the *close tangible list*:
lookup the two tangibles TANCLIENT and TANMASTER with the specified ID
for every table entry Key→Value in TANMASTER:
if Key is Sortable AND TANCLIENT[Key] and TANMASTER[Key] are both tables:
push the likely pair TANCLIENT[Key], TANMASTER[Key] onto the stack
```
So now the stack contains all the roots of the traversal. Next, we come to the code that pops the stack and does the sanity checks:
```
while the stack is not empty:
pop a pair TABLECLIENT and TABLEMASTER from the stack
if either of the tables is not an ordinary table, then don't pair them
if TABLEMASTER already has a table number, then don't pair them
if TABLECLIENT doesn't have a table number, then don't pair them
if all the checks above pass, then pair the two tables
```
And finally, we have the code that is used when the final decision is made to pair two tables. After pairing the new tables, it generates new possible pairings and pushes them onto the stack:
```
Pair TABLECLIENT, TABLEMASTER:
Copy the number from TABLECLIENT to TABLEMASTER using *tnmap* and *ntmap*.
For every table entry Key→Value in TABLEMASTER:
if Key is Sortable AND TABLECLIENT[Key] and TABLEMASTER[Key] are both tables:
push the likely pair TABLECLIENT[Key], TABLEMASTER[Key] onto the stack
```
When the algorithm is done, the master's *tnmap* and *ntmap* exist and are populated. There may be tables in the master model that still aren't paired, however. These do not have entries in the *tnmap* and *ntmap*.
No messages are transmitted during this pass. This pass happens on the server only.
## Pass: Number the Remaining Tables in the Master Model
The goal of this pass is to assign numbers to any reachable tables in the master model that don't already have numbers.
The tables in the client model were already numbered from 1 to N by a previous pass. We're going to start our numbering from N+1. In other words, the table numbers we're using in this pass pick up where the client numbers left off.
Numbering is accomplished by doing another graph traversal algorithm in the master model. It visits only ordinary tables, ignoring special tables. It again ignores table entries where the key isn't sortable. It uses the same roots: the ordinary tables pointed to by the close tangibles. When the traversal finds a table that doesn't already have a number, it assigns a value starting from N+1 and incrementing from there. When a table number is assigned, it is stored in the master *tnmap* and *ntmap*.
After this traversal is done, all the reachable tables in the master model have numbers. The table numbers in the master model now fall into two different ranges:
- Tables 1 to N: these numbers were assigned by the pass, "Pair Up Tables in the Master and Client Models." Therefore, these tables are all paired to tables in the client model.
- Tables N+1 to M: these numbers were assigned by this pass, "Number the Remaining Tables in the Master Model." Therefore, these tables aren't paired up with anything in the client model.
The graph traversal is done only on the server. But after the graph traversal is done, the final step of this pass is to send a message to the client, instructing it to create empty tables numbered N+1 to M. After the client has done this, *all* the tables in the master model have pairings to tables in the client model.
## Pass: Transmit Table Contents Non Recursively
This is the step that actually sends the table corrections to the ordinary tables. In this pass, the server will be sending the client messages of this form:
```
in table [table number], store the entry [Key]→[Value]
```
The client will receive these messages and will just follow the instructions. The algorithm to send the messages is a loop over all the tables:
```
for every N → TABLEMASTER in the master *ntmap*:
look up the paired table N → TABLECLIENT in the client *ntmap*
for every table entry Key→Value in TABLEMASTER:
if Key is Sortable AND TABLECLIENT[Key] != TABLEMASTER[Key]:
send "in table number N, store the entry Key→TABLEMASTER[Key]"
for every table entry Key→Value in TABLECLIENT:
if Key is Sortable AND TABLEMASTER[Key] == nil:
send "in table number N, store the entry Key→Nil"
```
The equality comparison used in the algorithm above has certain rules:
- Strings, Numbers, and Booleans, and Nil are compared in the usual way.
- Two tables are considered equal if they have the same table number.
- Two tangibles are considered equal if they have the same tangible ID.
- Two class tables are considered equal if they have the same class name.
- The client global environment is considered equal to the master global environment.
The content of the *Value* part of the message depends on the type of *Value*:
- Strings: send the string.
- Numbers: send the number.
- Booleans: send the boolean.
- Nil: send "nil"
- Tables: send "table [table number]"
- Tangibles: send "tangible [tangible ID]"
- Class tables: send "class [classname]"
- The global environment: send "globalenv"
- Functions: cannot be transmitted, send "nil" instead.
- Threads: cannot be transmitted, send "nil" instead.
This is all packed into an efficient binary representation. When this pass is done, the contents of all the ordinary tables have been updated. Only the tangible databases have not been updated.
## Pass: Transmit Tangible Differences
For each master tangible database in the *close tangibles list*, compare it to the corresponding tangible in the client model.
This pass is exactly the same the previous pass, "Transmit Table Differences," except that instead of comparing the ordinary tables, we compare the tangible databases. Aside from that difference, the code and logic are the same. The subroutine used by both passes is the same. We will therefore omit the detailed explanation.
There is one small difference with the previous pass. Throughout the difference transmission code, we have treated metatables as if they were just another Key→Value pair, METATABLE→Value. We traversed them the same as any other Key→Value pair. But in this pass, we skip over metatables. Tangible metatables are handled in a separate pass.
## Pass: Compare Tangible Metatables
A tangible's metatable is a place where the engine hides a great deal of information about the tangible, including for example its threads. Most of this data should not be difference transmitted at all. So therefore, we have exempted tangible metatables from the general-purpose table transmission passes above. Instead, we handle them separately.
Currently the only piece of data in the tangible metatable which is transmittable is the tangible's classname. The comparison process is simple, we just compare the class name as a string, and if there's a mismatch, we send the class name as a string.
## Afterword
Currently, that is all the passes of the difference transmission algorithm.
## About Determinism
In order to ensure that the table numbering is identical on both client and server, we need for the graph traversal algorithm to visit Key→Value table entries in a deterministic order. We had to make certain patches to the Lua runtime to achieve this. See the document "A Summary of our Lua Patches" for more information about deterministic table iteration.