139 lines
5.6 KiB
Markdown
139 lines
5.6 KiB
Markdown
### About Determinism
|
|
|
|
Luprex uses two different kinds of determinism.
|
|
|
|
**Synchronous Model Determinism** Predictive reexecution
|
|
uses four world models, including a server-synchronous and
|
|
client-synchronous model. These two models are fed the same
|
|
events, and must remain in the same state after executing
|
|
the same events. See the document "Predictive Reexecution"
|
|
for an explanation of why these models exist. I you were to
|
|
do a comparison of the two models, they would be equal in
|
|
the lisp sense of `equal`, but not in the sense of `eq`,
|
|
because corresponding data structures are not at the
|
|
same memory address.
|
|
|
|
**Replay Log Determinism** The server stores a log of all
|
|
events it feeds into the Luprex DLL. It can replay a log by
|
|
feeding the same events into a new copy of the Luprex DLL.
|
|
When replaying a log, the new copy of the Luprex DLL
|
|
reproduces the original execution right down to the memory
|
|
level: every data structure is at the same address, every
|
|
byte of memory is the same. This is the `eq` level of
|
|
equivalence.
|
|
|
|
These two forms of determinism serve different purposes and
|
|
impose different costs.
|
|
|
|
## Implementing Synchronous Model Determinism
|
|
|
|
To get the two synchronous models to be deterministic
|
|
enough, we had to take several steps:
|
|
|
|
- **Deterministic Lua table iteration.** We patch the Lua
|
|
runtime so that iterating over a table always produces
|
|
keys in the same order. The order depends only on
|
|
the order in which the keys were inserted, but not on the
|
|
memory layout.
|
|
- **No iterating over C++ unordered maps.** Unordered maps
|
|
produce elements in an order that depends on memory
|
|
addresses. Since addresses differ between the two models,
|
|
iteration order would differ, breaking value-level
|
|
determinism. An exception: iterating an unordered map and
|
|
then immediately sorting the results into a predictable
|
|
order is allowed, because the randomness is sandboxed.
|
|
- **No genuinely random numbers.** We do not use random
|
|
numbers in the world model. We do use pseudorandom
|
|
numbers, we store the generator's state as part of the
|
|
world model and maintain it using difference transmission.
|
|
|
|
|
|
## Bit-Exact Determinism: Replay Debugging
|
|
|
|
Bit-exact determinism enables replay debugging. It is
|
|
valuable but expensive, and its cost-benefit tradeoff is an
|
|
open question.
|
|
|
|
As the server runs, the driver can write a log of every
|
|
event it feeds into the driven portion. Later, a new
|
|
DrivenEngine can be created and fed those same events from
|
|
the log file. The goal of bit-exact determinism is that
|
|
during this replay, the DrivenEngine does the *exact* same
|
|
thing it did during the live run, right down to every data
|
|
structure being at the same memory address.
|
|
|
|
Why does this matter? If the server crashed during the live
|
|
run, the replay will crash in exactly the same way. You can
|
|
run the replay inside a debugger, single-step right up to
|
|
the crash, and examine the exact same pointers and memory
|
|
layout that existed during the original crash.
|
|
|
|
Value-level determinism alone is not sufficient for this. If
|
|
the replay produces the same logical state but at different
|
|
memory addresses, then pointer-related bugs (buffer
|
|
overruns, use-after-free, etc.) might not reproduce.
|
|
Bit-exact determinism ensures they do.
|
|
|
|
To implement replay determinism, we took several
|
|
difficult steps:
|
|
|
|
- **The Driver/Driven Partition**. The luprex engine is
|
|
event-driven portion, and an event-driver. The driven
|
|
portion contains all the game logic. The driver is mainly
|
|
for I/O. The driven portion cannot contain any I/O. That
|
|
includes:
|
|
|
|
- **Clocks only in the Driver.** The driven portion cannot
|
|
call system functions to obtain the current time.
|
|
However, the driver can feed the current time into the
|
|
driven portion as an event.
|
|
- **Lua Source files only in the Driver** The driven
|
|
portion cannot read lua source files. It can however
|
|
enter a state that indicates to the driver that it
|
|
wants a lua source file. Then, the driver can feed
|
|
the lua source file in as an event.
|
|
- **Sockets only in the Driver** The driven portion
|
|
cannot open TCP/IP sockets. However, it can enter
|
|
a state that indicates its desire to make a TCP/IP
|
|
connection, and then the driver can do it and feed
|
|
the data into the driven portion.
|
|
|
|
- **The eng::malloc heap.** A custom memory allocator
|
|
positioned at a fixed address, used exclusively by the
|
|
driven portion. The memory allocator, if asked to
|
|
perform the same sequence of malloc/free operations,
|
|
will return the same addresses.
|
|
|
|
- **No threads in the driven portion.** Thread scheduling is
|
|
nondeterministic at the OS level. We cannot use it in the
|
|
driven portion.
|
|
|
|
## Should we Ditch Replay Determinism?
|
|
|
|
Implementing synchronous model determinism is necessary
|
|
for predictive reexecution. It is non-negotiable.
|
|
|
|
On the other hand, replay log determinism is not necessarily
|
|
required for us to have a usable engine. We could ditch it.
|
|
It certainly does impose a lot of difficult constraints on
|
|
the engine.
|
|
|
|
The driver/driven distinction certainly required us to tie
|
|
ourselves into knots in some part of the engine design.
|
|
But, that's pretty baked in at this point, we're probably
|
|
never going to change that.
|
|
|
|
However, it also imposes a no-threads requirement. That
|
|
is certainly a bummer from a performance perspective.
|
|
|
|
## Lua Scripters Don't Need to Worry
|
|
|
|
The Lua environment is carefully sandboxed to be
|
|
deterministic at both levels without any effort from the
|
|
scripter. Lua's random number generators are seeded
|
|
pseudorandom generators owned by the driven portion. Table
|
|
iteration is patched to be deterministic. Lua "threads"
|
|
(coroutines) are not real OS threads and don't run
|
|
concurrently. The scripter writes ordinary Lua code and gets
|
|
determinism for free.
|