Monday, May 28, 2007

MSIL to C# -> The theory development begins

Welcome to my journey of writting a .NET assembly decompiler.

First of all I am trying to develop a theory to decompile MSIL. I just do whatever a MSIL instruction ask me to do. But I do it keeping in mind that I am decompiling MSIL. So when it asks to push me a value of a variable I push the name of that varuiable on stack.

To understand the code it is required that you know or have a reference of what each isntruction
of MSIL actually does.

Here is sample program to test if our concept works:

namespace DisasmIL
{
class Math
{
public int add(int x, int y)
{
return x + y;
}
}

class Program
{
static void Main(string[] args)
{
Math m;
int a, b;
m = new Math();
a = 20;
b = 50;
int p = m.add(a, b);
}
}
}

We only check one method. The Main method.When the Main method is
compiled it takes following form.
.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size 23 (0x17)
.maxstack 3
.locals init (
[0] class DisasmIL.Math m,
[1] int32 a,
[2] int32 b,
[3] int32 p
)
IL_0000: nop
IL_0001: newobj instance void DisasmIL.Math::.ctor()
IL_0006: stloc.0
IL_0007: ldc.i4.s 20
IL_0009: stloc.1
IL_000a: ldc.i4.s 50
IL_000c: stloc.2
IL_000d: ldloc.0
IL_000e: ldloc.1
IL_000f: ldloc.2
IL_0010: callvirt instance int32 DisasmIL.Math::'add'(int32,int32)
IL_0015: stloc.3
IL_0016: ret
} // end of method Program::Main
=============================================================

We parse line by line:
------------------------------------------------------------------------
.method private hidebysig static void Main(string[] args) cil managed

It is method declaration with default starting curly brace.
Output code:
static void Main(string[] args)
{
Stack: [empty]
------------------------------------------------------------------------
.entrypoint
// Code size 23 (0x17)
.maxstack 3
.locals init (
[0] class DisasmIL.Math m,
[1] int32 a,
[2] int32 b,
[3] int32 p
)

Need not to be explained. They are self explanatory. Declare variables.
Output code:
DisasmIL.Math m;
int a;
int b;
int p;
Stack:
------------------------------------------------------------------------
IL_0000: nop:

Does nothing (nop).
Output code: [none]
Stack: [empty]
------------------------------------------------------------------------
IL_0001: newobj instance void DisasmIL.Math::.ctor()

Create new instance of DisasmIL.Math using default constructor
so we push "new DisasmIL.Math()" on our stack.
Output code:
Stack: new DisasmIL.Math()
------------------------------------------------------------------------
IL_0006: stloc.0

So we pop top of stack and assign it to local variable 0.
Output code:
m = new DisasmIL.Math();
Stack: [empty]
------------------------------------------------------------------------
IL_0007: ldc.i4.s 20

What we do is push constant 20 on stack.
Output code: [none]
Stack: 20
------------------------------------------------------------------------
IL_0009: stloc.1

So we pop top value and assign it to local variable 1.
Output code:
a = 20;
Stack: [empty]
------------------------------------------------------------------------
IL_000a: ldc.i4.s 50

We push constant 50 on stack.
Output code: [none]
Stack: 50
------------------------------------------------------------------------
IL_000c: stloc.2

We pop top value and assign it to local variable 2.
Output code:
b = 50;
Stack: [empty]
------------------------------------------------------------------------
IL_000d: ldloc.0
IL_000e: ldloc.1
IL_000f: ldloc.2

Push local variable 0, 1 and 2 on stack.
Output code:
Stack: m, a, b
-------------------------------------------------------------------------
IL_0010: callvirt instance int32 DisasmIL.Math::'add'(int32,int32)
IL_0015: stloc.3

We call add method with values top-1, top of stack for instance of
top-2. For any method call if it returns value it is returned on
stack. So check next instruction. If it is a stloc then we assign the
return value. We assign return value to local variable 3.

Output code: p=m.add(a,b);
Stack:
--------------------------------------------------------------------------
IL_0016: ret

Return void. So no code except closing curly brace.
Output code:
}

Stack: [empty]

Now if you add the output codes together you'll find the original C# code is generated. This works for simple cases. Need to test if it works for complex situations. Miles to go....