Most of the Luprex engine is 'event-driven'. The event-driven design makes certain things easier, and certain things harder. This document goes over all the various consequences of this design decision.
## What does "Event-Driven" Mean?
To be clear about what I mean by "event driven," think of a traditional Finite State Machine (FSM). Here's an example FSM, for a coin-operated turnstile:
So this example FSM has two *events* that it can process: a person can insert a coin, and a person can push through the turnstile.
So the important thing for our purposes is how traditional FSMs deal with I/O. In a traditional FSM, every bit of I/O has to be somehow expressed as an *event*. The event is fed into the FSM, and it drives the FSM forward to the next state.
When writing GUI software for the Win32 API, you have to write a 'WinMain' function. This is an infinite loop that calls GetMessage, and then processes the message. "Message" is their word for "event." So at least partially, Windows programs are event-driven. However, in Windows, it is possible to do some I/O in a non-event-driven manner. For example, in Windows, you can just call a function to read a file. That's I/O, and it's not event-driven. So the windows API is only partially event-driven.
Our system is *fully* event-driven. That means that all I/O, no matter where or what kind, is expressed as an event.
## The Driver and the Driven Portion
We have carefully separated the Luprex code into two halves: the *driver*, and the *driven portion*. The driven portion is the part of the system that is event-driven (naturally). The driver is the part that is responsible for generating events and feeding them into the driven portion.
The driven portion of the system is organized as a library of subroutines. The top-level API of the driven portion is class DrivenEngine. The driver is responsible for constructing the object of class DrivenEngine. Then, the driver feeds events into the driven portion by calling various event methods in class DrivenEngine. Finally, when there's nothing left to do, the driver calls DrivenEngine's destructor. The driven portion of the system is not allowed to call functions in the driver. The driven portion is also not allowed to do I/O. All I/O is done in the driver, and it's always the driver calling into the driven portion.
Here's an example of how input typically works in this system: if a TCP socket is open, and bytes have arrived, then the driver reads the bytes. Then, the driver calls DrivenEngine::drv_recv_incoming to pass the bytes into the driven portion: this is an input event.
So how does the driven portion *send* bytes to a TCP socket? The driven portion isn't allowed to do any I/O, and it's not allowed to call into the driver. But what it can do is leave the bytes in a buffer. The driver polls the buffer, by calling DrivenEngine::drv_get_outgoing, and it sees that the bytes are there. By the way, DrivenEngine::drv_get_outgoing doesn't count as an event, because it never changes the state of the driven portion. The driver then sends the bytes to the TCP socket. Finally, the driver calls DrivenEngine::drv_sent_outgoing, to notify the driven portion that it should remove the bytes from the buffer. That *does* count as an event, because it does modify the state of the driven portion. In this way, the driven portion can do output.
If the driven portion wants to *open* a TCP connection, how does it do that? Again, it can't do I/O, and it can't call into the driver. So instead, here's what it does. It creates a "channel" data structure representing the new communication channel. This channel lives within the driven portion. The "channel" data structure has buffers to store incoming and outgoing bytes, but it doesn't have any actual socket in it: remember, the driven portion is not allowed to use raw I/O primitives. The driver polls for the existence of this channel, using DrivenEngine::drv_get_new_outgoing. Like all the other drv_get methods that don't change the state of the driven portion, this is not an event. When the driver sees the new channel, it opens the TCP socket. TCP sockets live in the driver, whereas channels and their buffers live in the driven portion. The driver maintains a one-to-one connection between the two.
File I/O is the most awkward thing to do with an event-driven design. Currently, the only file I/O needed by Luprex is to read the lua source code. So the driven portion simply sets a flag indicating that it needs an updated copy of the lua source. The driver polls the flag, and if it is set, the driver reads the lua source code and feeds it into the driven portion using DrivenEngine::drv_invoke_lua_source. It works, but it's awkward for the driven portion of the engine. In all likelihood, this will eventually get replaced by a more sophisticated interface.
The driver is a very small piece of code. Its only function is to do socket I/O, file I/O, and a few other kinds of I/O. All the complexity of the game engine is inside the driven portion.
## Different Drivers for Different Operating Systems
Currently, we have two versions of the driver: one that runs from the command line using raw operating system primitives to do I/O, and one that runs within the Unreal Engine using Unreal primitives to do I/O. The command-line version uses ifdefs in order to support both Windows and Linux.
The driven portion, on the other hand, is entirely operating-system independent. There is only one version of the driven portion. The driven portion doesn't need any ifdefs for operating systems because there is no I/O code in the driven portion. It's pure standard-compliant C++.
The driver is actually a very small piece of code. For example, the command-line version of the driver is only about a thousand lines of code for the whole thing, including both the Windows and Linux versions of certain subroutines.
The event-driven design has the effect of concentrating all the operating-dependent code into one place, the driver.
## How Event-Driven Design is Useful for Debugging
When our server runs, it can maintain a log of every *event* that the driver feeds into the driven portion. Later, it is possible to replay the log: during replay, the driver creates a DrivenEngine. Then, it feeds that DrivenEngine the exact same events, in the exact same order as when the game was live. If we designed things correctly, then the driven portion will perform the exact same computations, in the exact same way as when the game was running live.
The advantage of this is that if the server crashes, you should be able to replay the crash a second time, this time inside a debugger. You can single-step the code right up to the point of the crash, examining variables and data structures using the debugger, as many times as necessary to figure out the cause of the crash.
In order for this to work, the driven portion needs to be *deterministic*. That means that if I create two identical DrivenEngines, and call the same event methods in the same order on both of them, they must both perform the same exact computations, and must both end up in the exact same state. Ideally, this should be the case even if the two DrivenEngines are running on different machines!
The replay code lives in the driver. The driver knows it is running in replay mode. In replay mode, the driver is generating events from a logfile, not from TCP sockets. So therefore, the driver is *not* in the exact same state as when the system was running live. However, the driven portion has no idea that the system is in replay mode. As far as the driven portion knows, when the driver calls DrivenEngine::drv_recv_incoming, it's because a TCP socket received some bytes. It has no idea that the event is actually coming from a logfile.
So the driven portion needs to be deterministic, it needs to be running exactly the same during recording and during replay. The driver, on the other hand, does not need to be deterministic, since what it's doing at recording time (receiving bytes from TCP sockets) and what it's doing at replay time (generating events from a logfile) are completely different.
## The Driven Portion is a DLL, the Driver is an EXE
The driven portion of the engine lives in a DLL. The driver is an EXE file. The driver loads the DLL. If running inside Unreal Engine, then the driver and the Unreal Engine are integrated, the EXE contains both. The driven portion is the exact same DLL regardless of whether it's loaded by an Unreal Engine driver or a command-line driver.
Putting the driver in an EXE and the driven portion into a DLL helps enforce the rule that the driven portion is not supposed to call into the driver. It literally cannot, because the driver is not in the DLL.
The driven portion DLL is built using the Luprex build system. The command-line driver is also built using the Luprex build system. However, the driver that is integrated into the Unreal Engine is built using the Unreal build system.
Since it is possible that the two different build systems might use different C++ compilers, it is theoretically possible for the C++ in the DLL to not be entirely compatible with the C++ in the EXE. For example, virtual tables might not be laid out the same by the two compilers. Name mangling might also not be done the same by the two compilers. To avoid problems, we have given the DLL a pure C interface - in general, a C interface is compatible between all compilers.
The pure C interface of the DLL is implemented by *struct EngineWrapper*. Notice that it's a struct, not a class, because it's pure C, not C++. EngineWrapper is a very thin wrapper around the methods of class DrivenEngine. For example, DrivenEngine contains a method DrivenEngine::drv_recv_incoming. Meanwhile, struct EngineWrapper contains EngineWrapper::play_recv_incoming, which is a C function that passes its arguments to the C++ function DrivenEngine::drv_recv_incoming.
The wrappers in class EngineWrapper, in addition to simply translating C calling conventions to C++ calling conventions, are also responsible for the logging code. For example, when the driver calls EngineWrapper::play_recv_incoming, the EngineWrapper checks if it is in logging mode. If so, it writes the event to the logfile before it calls DrivenEngine::drv_recv_incoming.
Struct EngineWrapper is considered part of the driver, not the driven portion. That's the case for two reasons. First, struct EngineWrapper passes events into class DrivenEngine. That's the driver's job, passing events into the DrivenEngine. Second, EngineWrapper does logging. That's I/O, so it must be part of the driver.
Struct EngineWrapper is a violation of the rule that the driver lives in the EXE, and the driven portion lives in the DLL. EngineWrapper is a piece of the driver, but it lives in the DLL. This is the only violation of the separation rules.
EngineWrapper does file I/O for logging. Fortunately, the C++ standard provides operating-system independent file I/O primitives. Class EngineWrapper does violate the rule that the driver code lives entirely in the EXE, but it doesn't violate the rule that the DLL is operating-system independent.
## Struct EngineWrapper looks a lot like it Has Methods
In our driver code, you'll see a lot of code that looks like it calls methods in struct EngineWapper. Here's a typical excerpt from the driver:
It looks like a method invocation, but struct EngineWrapper is supposed to be pure C. There are no 'methods' in pure C. What is going on here?
The trick is that it's not actually a C++ method invocation at all. It's a pure C function call, using a pure C function pointer. Look at this excerpt from struct EngineWrapper:
Putting a pointer to a C function into struct EngineWrapper gives the illusion of a "method" that you can call. It's an illusion that we deliberately encourage. We want you to be able to pretend that EngineWrapper is a class, full of methods. That makes working with EngineWrapper more intuitive. EngineWrapper contains a long list of function pointers.
However, there is one important way that C function calls differ from true C++ methods: when you call a C++ method, there is an implicit parameter called *this*. In other words, the address of the object automatically gets passed into the method. In pure C, that's not the case. Because of this, all the pseudo-methods in struct EngineWrapper require you to manually pass the wrapper itself. As you can see in both of the snippets above, the first parameter to play_recv_incoming is a pointer to the EngineWrapper.
## Separation of the C++ Code
In the Luprex codebase, the driven portion lives in the directory "cpp". The command-line driver lives in the directory "drv". We intentionally keep them separate. The Unreal Engine driver lives in a totally separate git repository, with our Unreal code.
The driven portion does not include any header files from the driver portion. The driver portion only includes one header file from the driven portion: "enginewrapper.hpp".
This header file, enginewrapper.hpp, only contains the definition of struct EngineWrapper. This header very deliberately does not include any other header file: not even operating-system files like <stdio>, because we don't want to create any operating-system dependencies. It's just the definition of struct EngineWrapper, and nothing else.
Therefore, when the driver includes enginewrapper.hpp, it's only getting the definition of struct EngineWrapper, and absolutely nothing else. The driver that is integrated with Unreal Engine also includes "enginewrapper.hpp", and it also does not include anything else from the driven portion.
## The Eng-Malloc Heap
It sure would be nice if the malloc heap were in *exactly* the same state during a replay as during logging. By the "exact same state," I mean that every malloc'ed data structure is at exactly the same memory address during replay as during logging. If that were the case, then any buffer overrun or memory overrun in the logging phase would be duplicated exactly during the replay.
Unfortunately, that's not going to happen. The malloc heap is used by Unreal Engine, the C++ standard library, and the driver, none of which are designed to be deterministic. Like it or not, the addresses of malloc'ed data structures are not going to be the same during replay as during logging.
Therefore, we have created a new memory allocator specifically for the use of the driven portion. This memory allocator is called "eng::malloc". It is based on Doug Lea's malloc, which is a very fast and well-trusted implementation of malloc. We have arranged that the eng::malloc heap is always positioned at the exact same memory address. And, since it is used exclusively by the driven portion, the sequence of 'malloc' and 'free' operations should be exactly the same during replay as during logging. Therefore, anything allocated by eng::malloc should be at deterministic addresses.
If a C++ class derives from eng::opnew, then that C++ class will inherit an *operator new* and an *operator delete*. This will make any 'new' operation on that class go to the eng::malloc heap.
The STL function std::make_shared allocates both the data structure and the reference counts in the malloc heap. Overriding operator new doesn't change this behavior, unfortunately. Therefore, we must provide eng::make_shared, which allocates the data structure and the reference counts in the eng::malloc heap.
By default, STL classes like *std::map*, *std::vector*, and so forth use the malloc heap to allocate their data. However, using template parameters, it is possible to force these classes to use alternate heaps. We have done exactly this. We have created thin wrappers around these classes, called *eng::map*, *eng::vector* and so forth that allocate their data in the eng::malloc heap. When coding for the driven portion, please use these classes instead of the std ones. These wrappers live in the header files "wrap-map.hpp", "wrap-set.hpp", and so forth.
Some STL classes, like *std::pair* and *std::string_view* are so simple that they don't ever allocate anything on any heap. We have not wrapped these classes. But that means if you say "new std::pair", it will end up in the malloc heap. Don't do that.
If you happen to run into an STL class or a third-party library that can't be configured to use the eng::malloc heap, that's not a disaster. It just means that *one* data structure won't be at a predictable address. As long as 99% of the data structures in the driven portion are at predictable addresses, it's okay if the occasional one isn't. Crashes are still very likely to be deterministic. Be careful, though, if you use third-party libraries. They may not be deterministic.
Driver code must not put anything into the eng::malloc heap, because that would inject nondeterminism into the eng::malloc heap. The driver should use plain old malloc. As long as the driver obeys the rules and doesn't include any header files other than "enginewrapper.hpp," then it won't have a declaration of eng::malloc, which helps keep things properly separate.
The eng::malloc heap maintains a running hash of all the addresses returned by eng::malloc. This hash value can be fetched using eng::memhash. As long as two eng::malloc heaps are in the exact same state, this hash should be exactly the same. The record and replay code uses eng::memhash as a check to verify the determinism of a replay.
On Windows, none of this has been implemented yet. It is all stubbed out: on Windows, eng::malloc just calls regular malloc. Therefore, on Windows, determinism does not extend to the addresses of data structures. The driven portion should still run the same code in the same order, but memory overruns might not have predictable effects.
## Maintaining Determinism in the Driven Portion
Computers are naturally deterministic, but there are a few common sources of randomness that break the determinism. These must be avoided inside the driven portion:
*No genuinely random numbers*. It is legal to use pseudorandom numbers, on the condition that the pseudorandom state is privately owned by the driven portion and never touched, by, say, Unreal. The pseudorandom state must be initialized from a seed that is the same during replay as during the original logging.
*No iterating over unordered maps*. Unordered maps produce their elements in what is effectively a random order. It's not technically random, but it is usually determined by the exact addresses of the data structures, which may vary between runs. There is an exception to the rule: if you iterate over an unordered map, and then immediately sort the results into a predictable order, then it's OK. In this case, you have to very carefully sandbox the randomness, to ensure that the randomness doesn't influence anything else.
*No use of genuine real-time clocks*. The amount of time it takes to execute the exact same piece of code can vary randomly. Of course, this is already prohibited based on the prohibition on I/O in the driven portion, but it's worth mentioning separately. There are two exceptions. First, the driver occasionally passes the current time into the driven portion as an event. This is a real-time clock, but it's one that will have the exact same value during replay as during the original logging. Using that clock is fine anywhere. The other exception is for performance profiling. You can measure the amount of time it takes to execute a subroutine, as long as the "random" value (the amount of time) is printed and then immediately forgotten. Properly sandboxing the randomness can be tricky.
*No use of threads.* When two threads execute at the same time, then the order in which various operations occur is effectively random. The rule against using multithreading is probably the most problematic part of this whole determinism thing: it really is unfortunate from a performance perspective. However, you can bend this rule: you can allow threads if you can somehow sandbox the threads so that they live in their own little worlds, apart from the "deterministic portion," then it can be allowed. However, sandboxing is hard.
*You don't have to worry when writing Lua code.* It would seem like all of these things are potentially an issue when writing lua code, but we've carefully sandboxed lua from doing anything truly random. For example, the random number generators exposed to Lua are actually seeded pseudorandom generators where the seed is privately owned by the driven portion. Lua tables seem like a kind of unordered map, so you would think that maybe you would have to worry. But we've patched the lua runtime so that table iteration is actually ordered. The order isn't anything straightforward, but it is deterministic. Also, note that the rule against using threads doesn't apply to lua threads, because lua threads aren't really threads at all: they don't run concurrently.
I feel like I've forgotten a few other sources of randomness. I'll add to this list as I think of things.
## On Sandboxing Randomness in the Driven Portion
The rule against using true randomness in the driven portion has exceptions if you can "sandbox" the randomness. To successfully sandbox randomness, you must do two things. First, you must contain the random-like values to an "infected" region in memory. Second, the program must eventually reach a point where it is done using the random-like values, and it must wipe the "infected area," so that the randomness cannot affect the remainder of the program. Contain the randomness, then get rid of it. That is easier said than done.
Here's an example you can think about: profiling some expensive code.
```cpp
double t1 = get_current_real_time();
// Expensive Code Goes Here
double t2 = get_current_real_time();
util::dprint("Elapsed time:", t2 - t1);
t1 = t2 = 0;
```
That uses real-time clocks, which is normally against the rules. But it would be pretty hard to do performance profiling without real-time clocks. So we try to contain the randomness, and then get rid of it.
By the way, util::dprint is our debugging print routine that sends its output to someplace visible. For example, if you're running inside Unreal, it goes to the unreal console.
In the first few lines of this example, we fetch the time and store it in the local variables t1 and t2. Now the "infected region" consists of variables t1 and t2, but the randomness hasn't infected anywhere else in memory. So it would appear that we've met the first criterion: we've contained the randomness to a known region of memory. Then, in the last line of code, we set both t1 and t2 to zero. So it would appear that we've met the second criterion: after using the random-like values, we have wiped the randomness out of memory so that it can't affect the remainder of the program.
However, it's not quite right. The print routine util::dprint allocates a temporary string for the message, in this case, "Elapsed time: 3075.287". The length of the string depends on the actual amount of time elapsed. So therefore, the amount of time elapsed affects how many bytes get allocated to hold the string. So therefore, the amount of time affects the layout of the heap.
In our old implementation of util::dprint, the string was allocated in the eng::malloc heap. Therefore, in the old implementation, the randomness ends up infecting the layout of the eng::malloc heap. When we realized this, we changed the implementation of util::dprint to use the malloc heap. The malloc heap is already known to be infected by randomness, because Unreal uses the malloc heap. If randomness escapes into the malloc heap, then that's not a problem, because we always treat the malloc heap as an "infected area." Now "util::dprint" contains an explicit guarantee that any randomness you pass to it will remain sandboxed. So now that we've fixed util::dprint, the code above is correct.
## Passing Binary Blobs Back and Forth
Sometimes, the driver sends binary encoded data into the driven portion, and the driven portion sometimes passes binary encoded data to the driver (by leaving binary data in a buffer and letting the driver poll it.) In order for this to be possible, the binary data has to be in a format that is known to both driver and driven portion.
To help the two different codebases share binary data, we have created a C++ library "base-buffer.hpp". This library defines standardized ways to store ints, floats, doubles, strings, and so forth into a binary blob. It also has ways of storing dynamically typed data into a binary blob. It ensures that both the driver and driven portion use the same endianness, that they both store string lengths in the same ways, and that they both use the same enum for dynamic types. In general, it ensures that the binary blobs created by one can be parsed by the other.
The base-buffer library is not considered part of either the driver or the driven portion. Instead, it's considered a "third party library" that is included by both. As a third-party library, it goes in the directory for third-party libraries, "ext." It is a pure header-only C++ library, meaning that there's no need for an associated cpp file. It's a single source file. It includes some STL header files, but no other includes, to avoid creating dependencies. It's operating-system independent code.