Rewrite some bad AI-generated documentation.
This commit is contained in:
@@ -1,58 +1,138 @@
|
||||
### About Determinism
|
||||
|
||||
The driven portion of the Luprex engine is deterministic. This document explains what that means and why it matters. For the specific rules you must follow to maintain determinism, see "The Event-Driven Structure of the Engine."
|
||||
Luprex uses two different kinds of determinism.
|
||||
|
||||
## Two Degrees of Determinism
|
||||
**Synchronous Model Determinism** Predictive reexecution
|
||||
uses four world models, including a server-synchronous and
|
||||
client-synchronous model. These two models are fed the same
|
||||
events, and must remain in the same state after executing
|
||||
the same events. See the document "Predictive Reexecution"
|
||||
for an explanation of why these models exist. I you were to
|
||||
do a comparison of the two models, they would be equal in
|
||||
the lisp sense of `equal`, but not in the sense of `eq`,
|
||||
because corresponding data structures are not at the
|
||||
same memory address.
|
||||
|
||||
There are two distinct degrees of determinism in the engine, each serving a different purpose.
|
||||
**Replay Log Determinism** The server stores a log of all
|
||||
events it feeds into the Luprex DLL. It can replay a log by
|
||||
feeding the same events into a new copy of the Luprex DLL.
|
||||
When replaying a log, the new copy of the Luprex DLL
|
||||
reproduces the original execution right down to the memory
|
||||
level: every data structure is at the same address, every
|
||||
byte of memory is the same. This is the `eq` level of
|
||||
equivalence.
|
||||
|
||||
**Value-level determinism** is the property that the server-synchronous and client-synchronous models stay in the same logical state. These two models run on different machines and receive the same command acknowledgements in the same order. Value-level determinism guarantees that they end up containing the same Lua tables with the same keys and values. If you go into both models and print things out, everything looks the same. However, a Lua table in one model is not necessarily at the same memory address as the corresponding table in the other, because they are running on different machines with different memory layouts. The two models are *equal* in the Lisp sense of `equal`, but not in the sense of `eq`.
|
||||
These two forms of determinism serve different purposes and
|
||||
impose different costs.
|
||||
|
||||
**Bit-exact determinism** is the property that a recorded event log, when replayed into a fresh DrivenEngine, reproduces the original execution right down to the memory level: every data structure is at the same address, every byte of memory is the same. This is the `eq` level of equivalence.
|
||||
## Implementing Synchronous Model Determinism
|
||||
|
||||
The engine currently aims for both, but they serve different purposes and impose different costs.
|
||||
To get the two synchronous models to be deterministic
|
||||
enough, we had to take several steps:
|
||||
|
||||
## Value-Level Determinism: Synchronous Model Pairing
|
||||
- **Deterministic Lua table iteration.** We patch the Lua
|
||||
runtime so that iterating over a table always produces
|
||||
keys in the same order. The order depends only on
|
||||
the order in which the keys were inserted, but not on the
|
||||
memory layout.
|
||||
- **No iterating over C++ unordered maps.** Unordered maps
|
||||
produce elements in an order that depends on memory
|
||||
addresses. Since addresses differ between the two models,
|
||||
iteration order would differ, breaking value-level
|
||||
determinism. An exception: iterating an unordered map and
|
||||
then immediately sorting the results into a predictable
|
||||
order is allowed, because the randomness is sandboxed.
|
||||
- **No genuinely random numbers.** We do not use random
|
||||
numbers in the world model. We do use pseudorandom
|
||||
numbers, we store the generator's state as part of the
|
||||
world model and maintain it using difference transmission.
|
||||
|
||||
Value-level determinism is what makes the multiplayer architecture work. It is non-negotiable.
|
||||
|
||||
Luprex uses four types of world models to handle multiplayer networking (see "Predictive Reexecution" for the full explanation). Two of these models are critical to understand here:
|
||||
|
||||
- The **server-synchronous** model runs on the server.
|
||||
- The **client-synchronous** model runs on the client.
|
||||
|
||||
These two models receive the same command acknowledgements in the same order. Because the driven portion is deterministic at the value level, the two models always end up in the same logical state: the same Lua tables, the same values, the same game state. They never need to exchange full state to stay in sync.
|
||||
|
||||
The two models are running on different machines, so naturally they have different memory layouts and different pointer addresses. That's fine. All that matters is that the values match. This is why value-level determinism is sufficient for synchronous model pairing.
|
||||
|
||||
The constraints that maintain value-level determinism are:
|
||||
|
||||
- **Deterministic Lua table iteration.** We patch the Lua runtime so that iterating over a table always produces keys in the same order, regardless of memory layout. Without this, two engines processing the same commands could iterate tables in different orders and produce different results.
|
||||
- **No iterating over unordered maps.** Unordered maps produce elements in an order that depends on memory addresses. Since addresses differ between the two models, iteration order would differ, breaking value-level determinism. (An exception: iterating an unordered map and then immediately sorting the results into a predictable order is allowed, because the randomness is sandboxed.)
|
||||
- **No genuinely random numbers.** Pseudorandom numbers are fine as long as the state is privately owned by the driven portion and seeded deterministically.
|
||||
- **Controlled use of real-time clocks.** The driven portion (the Luprex DLL) cannot call system functions to obtain the current time, because the result would differ between runs and between machines. However, the driver can feed the current time into the driven portion as an event. Since events are the same during paired execution and during replay, the time value is deterministic from the driven portion's perspective.
|
||||
|
||||
## Bit-Exact Determinism: Replay Debugging
|
||||
|
||||
Bit-exact determinism enables replay debugging. It is valuable but expensive, and its cost-benefit tradeoff is an open question.
|
||||
Bit-exact determinism enables replay debugging. It is
|
||||
valuable but expensive, and its cost-benefit tradeoff is an
|
||||
open question.
|
||||
|
||||
As the server runs, the driver can write a log of every event it feeds into the driven portion. Later, a new DrivenEngine can be created and fed those same events from the log file. The goal of bit-exact determinism is that during this replay, the DrivenEngine does the *exact* same thing it did during the live run, right down to every data structure being at the same memory address.
|
||||
As the server runs, the driver can write a log of every
|
||||
event it feeds into the driven portion. Later, a new
|
||||
DrivenEngine can be created and fed those same events from
|
||||
the log file. The goal of bit-exact determinism is that
|
||||
during this replay, the DrivenEngine does the *exact* same
|
||||
thing it did during the live run, right down to every data
|
||||
structure being at the same memory address.
|
||||
|
||||
Why does this matter? If the server crashed during the live run, the replay will crash in exactly the same way. You can run the replay inside a debugger, single-step right up to the crash, and examine the exact same pointers and memory layout that existed during the original crash.
|
||||
Why does this matter? If the server crashed during the live
|
||||
run, the replay will crash in exactly the same way. You can
|
||||
run the replay inside a debugger, single-step right up to
|
||||
the crash, and examine the exact same pointers and memory
|
||||
layout that existed during the original crash.
|
||||
|
||||
Value-level determinism alone is not sufficient for this. If the replay produces the same logical state but at different memory addresses, then pointer-related bugs (buffer overruns, use-after-free, etc.) might not reproduce. Bit-exact determinism ensures they do.
|
||||
Value-level determinism alone is not sufficient for this. If
|
||||
the replay produces the same logical state but at different
|
||||
memory addresses, then pointer-related bugs (buffer
|
||||
overruns, use-after-free, etc.) might not reproduce.
|
||||
Bit-exact determinism ensures they do.
|
||||
|
||||
The additional constraints that maintain bit-exact determinism, beyond those needed for value-level determinism, are:
|
||||
To implement replay determinism, we took several
|
||||
difficult steps:
|
||||
|
||||
- **The eng::malloc heap.** A custom memory allocator positioned at a fixed address, used exclusively by the driven portion. Because the driven portion is deterministic, the sequence of allocations and frees is identical between the live run and the replay, so every data structure ends up at the same address. See "The Event-Driven Structure of the Engine" for details.
|
||||
- **No threads in the driven portion.** Thread scheduling is nondeterministic at the OS level. Even if two threaded programs produce the same final values, the interleaving of operations differs between runs, which would cause memory allocations to occur in different orders and at different addresses.
|
||||
- **The Driver/Driven Partition**. The luprex engine is
|
||||
event-driven portion, and an event-driver. The driven
|
||||
portion contains all the game logic. The driver is mainly
|
||||
for I/O. The driven portion cannot contain any I/O. That
|
||||
includes:
|
||||
|
||||
Note that the constraints for value-level determinism (deterministic table iteration, no unordered maps, etc.) also contribute to bit-exact determinism. But they are *required* for value-level determinism regardless. The eng::malloc heap and the no-threads rule are the additional cost imposed specifically by the bit-exact guarantee.
|
||||
- **Clocks only in the Driver.** The driven portion cannot
|
||||
call system functions to obtain the current time.
|
||||
However, the driver can feed the current time into the
|
||||
driven portion as an event.
|
||||
- **Lua Source files only in the Driver** The driven
|
||||
portion cannot read lua source files. It can however
|
||||
enter a state that indicates to the driver that it
|
||||
wants a lua source file. Then, the driver can feed
|
||||
the lua source file in as an event.
|
||||
- **Sockets only in the Driver** The driven portion
|
||||
cannot open TCP/IP sockets. However, it can enter
|
||||
a state that indicates its desire to make a TCP/IP
|
||||
connection, and then the driver can do it and feed
|
||||
the data into the driven portion.
|
||||
|
||||
## The Practical Distinction
|
||||
- **The eng::malloc heap.** A custom memory allocator
|
||||
positioned at a fixed address, used exclusively by the
|
||||
driven portion. The memory allocator, if asked to
|
||||
perform the same sequence of malloc/free operations,
|
||||
will return the same addresses.
|
||||
|
||||
If the engine ever relaxed its determinism requirements, the value-level constraints would remain because they are essential to the multiplayer architecture. The bit-exact constraints (eng::malloc, no threads) could theoretically be dropped if replay debugging were deemed not worth the cost. Dropping the no-threads rule in particular would be a significant performance benefit.
|
||||
- **No threads in the driven portion.** Thread scheduling is
|
||||
nondeterministic at the OS level. We cannot use it in the
|
||||
driven portion.
|
||||
|
||||
## Should we Ditch Replay Determinism?
|
||||
|
||||
Implementing synchronous model determinism is necessary
|
||||
for predictive reexecution. It is non-negotiable.
|
||||
|
||||
On the other hand, replay log determinism is not necessarily
|
||||
required for us to have a usable engine. We could ditch it.
|
||||
It certainly does impose a lot of difficult constraints on
|
||||
the engine.
|
||||
|
||||
The driver/driven distinction certainly required us to tie
|
||||
ourselves into knots in some part of the engine design.
|
||||
But, that's pretty baked in at this point, we're probably
|
||||
never going to change that.
|
||||
|
||||
However, it also imposes a no-threads requirement. That
|
||||
is certainly a bummer from a performance perspective.
|
||||
|
||||
## Lua Scripters Don't Need to Worry
|
||||
|
||||
The Lua environment is carefully sandboxed to be deterministic at both levels without any effort from the scripter. Lua's random number generators are seeded pseudorandom generators owned by the driven portion. Table iteration is patched to be deterministic. Lua "threads" (coroutines) are not real OS threads and don't run concurrently. The scripter writes ordinary Lua code and gets determinism for free.
|
||||
The Lua environment is carefully sandboxed to be
|
||||
deterministic at both levels without any effort from the
|
||||
scripter. Lua's random number generators are seeded
|
||||
pseudorandom generators owned by the driven portion. Table
|
||||
iteration is patched to be deterministic. Lua "threads"
|
||||
(coroutines) are not real OS threads and don't run
|
||||
concurrently. The scripter writes ordinary Lua code and gets
|
||||
determinism for free.
|
||||
|
||||
Reference in New Issue
Block a user