![]() |
![]() |



For those of you that have not heard of Cipher, it is a commercial game engine I have developed over the last few years. Its goals were simply to provide us with the complete solution we needed to make games and to put more control into the hands of the people making the content (the artists and level designers). I also wanted it to be as portable as possible to help reduce the costs involved in moving to the new consoles. A scripting language of some description seems to satisfy these goals quite well. The list of benefits of implementing a scripting language were as follows…
What are the Options
When I started to investigate the problem, it struck me that there were 3 realistic options. The first was to use an existing scripting language such as LUA or one of the many other mini-languages that have popped up in recent years. The second option was to roll my own scripting language that could be custom designed to solve the problems I needed solutions for. The final option was to develop a virtual machine to run a bytecode from a more common language like Java or C.
This was certainly an appealing option when I first started looking into it. I don't have a great deal of experience with scripting languages, so just downloading the source to a working scripting language such as LUA or Perl sounded like it would be a real time saver. It did not take too long to realise that saving time was the last thing this approach would achieve.
Most of them turned out to be far bigger than I was expecting, often drawfing the rest of Cipher. Adding a couple of hundred unsupported source files to the project and integrating them with the rest of Cipher is no easy task. Modifying this bulk of code to add in the features I needed to make the languages more suitable for use in games was also going to be a pretty big job. The lack of a debugger was also a pretty serious blow. Overall using an existing scripting language did not look too appealing after I had thought about it for a while, with the possible exception of LUA, which is really small neat and easy to extend. Even though I did not use LUA in the end, I might well add support for it later on, to give users of Cipher a choice about what they use to write their game code.
Creating a new scripting language from scratch offered a few advantages over using an existing language...
But there are also a bunch of disadvantages...
Man, that sounds like a lot of work!
The third option was to write a virtual machine (vm) that could execute a bytecode in a similar fashion to the way Java works. A bytecode is really just an instruction set that is not specific to any CPU. The instructions are normally simple, such as adding two numbers together or writing a value to an address. The problem is, how do you get your program into a bytecode form so you can execute it? You could use one of the solutions above (i.e., use a language that compiles to a bytecode, or write your own), so this doesn't seem to offer any advantage.
That was when I noticed a program called lcc. Lcc is a retargetable optimising C compiler that comes with full source code and provides all the tools needed to write your own back ends. Using this, it should be possible to produce assembler output for almost any CPU, even if it was one I just made up. The real advantage that lcc offered was that it came with a sample back end that produced bytecode assembler output, which was exactly what I needed and saved me doing any work at all (well almost). I would strongly recommend that anyone interested in this area should get the book that explains how lcc works and includes the full source code for the compiler and its tools.
I liked the list of advantages I got from using C, compiling it to a bytecode with lcc and running it on a vm.
The list of disadvantages was not too serious either.
The first step was to make sure that the bytecode produced by lcc was suitable for execution on a vm. A couple of days reading the book and studying the output showed that I only needed to make a couple of minor modifications to the bytecode back end of lcc to get the output I wanted.
In the end I probably only changed 4 or 5 lines of code in lcc, so that was about as painless as I could have hoped for.
Casm is a simple 2 pass assembler. It reads in all the .asm files produced by lcc and then assembles and links them together. The output from this program is a Cipher bytecode file (.cbc) that Cipher can load and execute directly on its vm. I won't go into detail on this component, as it is not very exciting, but e-mail me if you would like more information.
The vm in Cipher is a stack based machine. There are no registers to speak of, and almost all instructions take values off the top of the stack, do something with them and put the results back on the top of the stack. Since each instruction performs a very specific task, the code to execute this instruction is generally quite simple. For example the ADDI4 instruction takes the 2 values at the top of the stack, adds them together and puts the result on the top of the stack. The code looks a bit like this.
arg0 = PopOperand(); arg1 = PopOperand(); arg0 = arg1 + arg0; PushOperand(arg0);
There are about 100 instructions in total, and most of them are no more complicated than the add instruction. It does get pretty tedious though, as there are often lots of combinations that are very similar. For example, there are several add instructions - add float, add integer and add unsigned integer.
At this stage I had a working vm, but it was still not very useful. I could write a simple program in C, compile it with lcc, assemble and link the results with Casm, and finally, load and execute the program on the vm. However, the vm has no operating system and no runtime library, so these programs could not produce any output or cause any side effects. Sure it was secure, but not too useful. I needed to be able to call functions in Cipher from my C programs.
In the end I decided to use a system similar to the old DOS interrupt calls. There is a single function that takes a variable number of parameters that is passed on to Cipher. Cipher uses the first parameter to indicate which internal function to call, with the remaining parameters having a meaning specific to that internal function. For example, to call the function in Cipher that prints some text to the console I would do something like this..
cipher(FUNCTION_PRINT, "Hello world");
This made it simple to allow the game code to be compiled to either the bytecode or to a native DLL, as only a single function needs to be exported from Cipher to support this, and the list of internal functions exposed can be easily changed without having to change that interface. When running as a DLL, the function cipher is set up to point directly at the appropriate function inside Cipher that handles calls. When running on the vm cipher is set to 0xFFFFFFFF. The vm traps any attempt to call a function at this address and passes it on to Cipher instead. A similar method is used to allow Cipher to call into the game code (e.g., to ask it to render the next frame). Extra care needs to be taken to support re-entrant calls (where the game code calls Cipher, and that call causes Cipher to call the game code again). Cipher's vm handles this case perfectly.
Using the vm imposes a few limitations on the code that we can produce. The most obvious limitation is that I must code in C, as the only compiler that produces a suitable bytecode output is a C compiler. Anyone wanting to develop game code using C++ (or any other language for that matter) in Cipher will not get the benefits of the vm, as they will have to compile directly to executable components (such as DLLs).
Many of you might argue that C would make a pretty poor scripting language, and you would be right. Getting level designers to code up the game logic in C is not going to be pretty. However, having had more time to think about it, I believe that getting level designers to code up the game logic in any language is only going to end in tears. By using a simple scripting language, some people manage to convince themselves that the level designers will not really be programming. The reality is that they will be programming, and most people would be better off getting programmers to do the programming and leave the level designers to design the levels.
The only other limitation imposed by the vm that causes a bit of shock initially, is that you can not allocate memory in a way that most people would be used to. The vm has no operating system and no runtime library. All the vm's services (such as printing out some text to the console) are provided by calling into Cipher. However, the vm is unable to call into Cipher to allocate some memory as it is unable to access any memory outside of itself (the vm in Cipher implements a protected memory scheme to prevent game code from accessing any memory that does not belong to it). If you want to be able to call something like malloc() to allocate memory during runtime, then it is possible, but you will need to allocate your "heap" as a global variable (e.g. char RawHeap[16*1024*1024];) and then write the appropriate code to allocate memory from it. Currently, all our game code does not allocate any memory. Instead is just sets up the data structures that it knows it needs and uses them. Console programmers will be more comfortable with this approach, but PC programmers, that are used to accessing an infinite supply of memory, will scream in terror until they get used to it.
There are a number of benefits the vm has brought to Cipher...
It is possible to get very close to the performance of a natively compiled version of the game code by extending the vm with platform specific compilers. Normally the vm loads in the instructions for the program in bytecode form, and interprets those instructions one at a time at runtime. A significant boost in performance can be gained by compiling the bytecode down to the equivalent machine code after it is loaded and executing this machine code directly at runtime. This compilation phase would obviously be different for each platform that is supported and can be fairly complex. I will leave this as an exercise for the reader.
User Contributed Comments