diff --git a/Decompiling-Tips-IDA.md b/Decompiling-Tips-IDA.md index 4a92924..a379007 100644 --- a/Decompiling-Tips-IDA.md +++ b/Decompiling-Tips-IDA.md @@ -9,17 +9,14 @@ When you open IDA, load the openrct2.exe file from this repository. You will see Once you have the IDC file, load it by clicking "File -> Load Script" and loading it. ## Debugging in general - RCT2 is written in [x86][x86], which is about as close to the actual CPU instructions as you are going to get. Each of the 8 CPU registers can store 32 bits (in this game), and you can perform operations on the actual bits contained in those registers. ### Hex/Decimal/Binary - Lots of numbers and addresses in the [Tycoon Technical Depot][ttd] are written in hex. The prefix "0x" generally denotes a hex address. The letters used for hex are a superset of those used for decimal, so numbers which look like decimals ("12") can actually refer to a different value than you think (in this case, 0x12 = 1\*16 + 2 = 18). In IDA these are represented as numbers with the letter "h" as a suffix, like "1Ah". [ttd]: https://freerct.github.io/RCTTechDepot-Archive/ To convert between these, I find it very convenient to keep a Python REPL up and make transformations between the types as necessary. It's also easy to view the binary representation of a number. Alternatively Calculator can be used. - ``` # Print the decimal representation of a hex value: >>> 0x12 @@ -31,43 +28,35 @@ To convert between these, I find it very convenient to keep a Python REPL up and >>> bin(0x12) '0b10010' ``` - Note that RCT2 uses a [little endian][le] encoding for integers that span multiple bytes, so the most significant bit is in the 2nd byte of a 16-bit integer. [le]: https://en.wikipedia.org/wiki/Endianness ## Converting from x86 to C - ### Registers +The registers beginning with e and ending with x or i (edx, eax, ecx etc) are 32 bits each (4 bytes). The general purpose registers that are two letters ending in x (dx, ax, cx) are 16 bits each. The registers ending in "h" or "l" stand for "high" and "low" and are each 8 bits. -The registers beginning with e and ending with x or i (edx, eax, ecx etc) are 32 bits each (4 bytes). The general purpose registers that are two letters ending in x (dx, ax, cx) are 16 bits each. The registers ending in "h" or "l" stand for "high" and "low" and are each 8 bits. - -When converting to C, use an `int` type to represent a 32 bit register, a `short` to represent a 16 bit register, and `uint8` to represent a 8 bit register. +When converting to C, use an `int` type to represent a 32 bit register, a `short` to represent a 16 bit register, and `uint8` to represent a 8 bit register. ### Subroutines - The general unit of work in the x86 codebase is the subroutine. Subroutines are called like this: call sub_6CFFFF - This will cause execution to jump to the subroutine, and the subroutine will execute until a `retn` value is encountered. There are three functions in C you can use to replace a subroutine call. - -- **RCT2_CALLPROC_EBPSAFE**: Use this function if the subroutine does not +* **RCT2_CALLPROC_EBPSAFE**: Use this function if the subroutine does not use any registers from the calling program. For example, if the subroutine starts with a `pusha` instruction, this saves all of the registers from the calling routine. call sub_6CAB00 - And then in the subroutine: sub_6CAB00 proc near pusha - This generally means that the registers are saved. You may also see sub_6CAB00 proc near @@ -79,30 +68,23 @@ There are three functions in C you can use to replace a subroutine call. pop edx pop eax ; Restore the registers to their values retn - - -- **RCT2_CALLPROC_X**: Use this function if the subroutine starts using +* **RCT2_CALLPROC_X**: Use this function if the subroutine starts using registers without loading any data first. For example: sub_6CAB00 proc near add ebx, 7 - - This subroutine begins operating in whatever value is stored in ebx, so it's safe to assume the caller has deliberately put a value there to be manipulated. - -- **RCT2_CALLFUNC_X**: Use this function if the subroutine stores values in +* **RCT2_CALLFUNC_X**: Use this function if the subroutine stores values in registers to be used by the caller. For example sub_6CAB00 proc near add ebx, ecx retn - It's safe to assume that ebx is being used by the calling program. ### Pointers - You should brush up on how pointers work, if you are unfamiliar, or coming from a higher level language like Python. I haven't found a great tutorial for this yet, but [here's something on pointers that might help][ptr]. @@ -112,7 +94,6 @@ expression. In this case this generally means the value stored in the register is an address. add [ebp+2], 7 - This means take the value stored in EBP (which should be an address like 0x579992), add two to it (0x579994), and then add 7 to the value stored at address 0x579994. Sometimes a register represents an address, and sometimes it @@ -124,23 +105,18 @@ Always remember that pointers are unsigned do not try to use them as a signed in [ptr]: https://research.swtch.com/godata #### What this means for OpenRCT2 - If you see a value like this in the code: - ``` mov bh, byte ptr word_F440AE ``` - This roughly says, get the value at 0x00F440AE as a byte (8 bits) instead of a word (16 bits) and copy the value into the register `bh`. -This converts into code as - +This converts into code as ``` int bh = RCT2_GLOBAL(0x00F440AE, uint8); ``` ### Nullsub - If you see an instruction like this: jz nullsub_65 @@ -148,83 +124,62 @@ If you see an instruction like this: This represents a call that actually existed in a version of the program but doesn't exist in the final version. In this case, if the flag is set to zero, execution will jump to the end of the subroutine (eg a `return` in C); otherwise it will proceed to the next instruction in the code. #### imul exx, 260h - If you see a set of instructions in the code that looks like this: - ``` movzx edx, current_ride_index imul edx, 260h movzx edx, rides[edx] ``` - In C code this would be: - ```c edx = RCT2_ADDRESS(RCT2_ADDRESS_RIDE_LIST, rct_ride)[current_ride_index]; ``` - -This stores in edx the beginning of data from a ride instance. The ride instance data follows the layout described [here][sv6]. +This stores in edx the beginning of data from a ride instance. The ride instance data follows the layout described [here][sv6]. [sv6]: https://github.com/OpenRCT2/OpenRCT2/wiki/SV6-Ride-Structure -In general if you see something that is multiplied by 0x260 then its quite likely that it is a ride that is being references as rides are 0x260 bytes. Sprites are 0x100 and instead of multiplied by the number it is normally left shifted by 8 (<<8). This makes it very easy to work out what a loop is iterating over. +In general if you see something that is multiplied by 0x260 then its quite likely that it is a ride that is being references as rides are 0x260 bytes. Sprites are 0x100 and instead of multiplied by the number it is normally left shifted by 8 (<<8). This makes it very easy to work out what a loop is iterating over. #### Offset - If you see an instruction that looks like this: - ``` add ebx, offset sprites ``` - (where sprites is a named address in IDA, like 0x123456). This means, roughly, *add the register on the left to the value on the right, and store it in the register on the left. In this case, this would mean - ```c ebx = ebx + RCT2_ADDRESS_SPRITE_LIST ``` - where `RCT2_ADDRESS_SPRITE_LIST` is a value like `0x123456`. In the binary, `ebx` could be any register, and `offset` can refer to any address in the code. This will eventually end up like the following once we have the offset properly mapped to a C arrary - ```c ebx = RCT2_ADDRESS_SPRITE_LIST[ebx]; ``` ### Print debugging - Use `printf` or, To print statements to the Visual Studio output after RCT2 begins, include the following header: - ``` #include "windows.h" ``` - Then use the command [`OutputDebugString`](https://msdn.microsoft.com/en-us/library/aa363362%28VS.85%29.aspx). - ``` OutputDebugString("Hello World!\n"); ``` ### IDA Tips - -- Use the spacebar to shuffle between the graphical layout and the line-by-line instructions. - -- Press semicolon to add a comment at the end of a line. - -- Press x to show all read / write / offset / jump references to an address. - -- Press n to rename an address. - -- If you are trying to read from the binary, note every address in the code is `0x400000` higher than its physical address in the binary. So if you have an address at `0x900123` in the code and you want to read from it in an external program, start reading at `0x500123` instead. +* Use the spacebar to shuffle between the graphical layout and the line-by-line instructions. +* Press semicolon to add a comment at the end of a line. +* Press x to show all read / write / offset / jump references to an address. +* Press n to rename an address. +* If you are trying to read from the binary, note every address in the code is `0x400000` higher than its physical address in the binary. So if you have an address at `0x900123` in the code and you want to read from it in an external program, start reading at `0x500123` instead. ### Finding that section of code you want to edit - This is tricky, and note that the addresses in the [Tycoon Technical Depot][ttd] are only valid for RCT1. I would try starting with the work that's already been done and trying to branch from there to find sections of the code that are useful for you. You can also read through the code in the OpenRCT2 project, especially the addresses in [src/addresses.h][addresses], which contains a very useful list of important addresses in the game. Most of the functions in the OpenRCT2 C code list the address of the corresponding subroutine in the docstring. [addresses]: https://github.com/OpenRCT2/OpenRCT2/blob/master/src/addresses.h -Another approach is to work backwards from the strings or windows that exist in the game to the subroutines that you want to change. That is, find a string like "Too high for supports!" and try to figure out where it is used in the game, by searching for the hex representation of its ID. \ No newline at end of file +Another approach is to work backwards from the strings or windows that exist in the game to the subroutines that you want to change. That is, find a string like "Too high for supports!" and try to figure out where it is used in the game, by searching for the hex representation of its ID.