Monday, May 28, 2007

MSIL to C# -> The theory development begins

Welcome to my journey of writting a .NET assembly decompiler.

First of all I am trying to develop a theory to decompile MSIL. I just do whatever a MSIL instruction ask me to do. But I do it keeping in mind that I am decompiling MSIL. So when it asks to push me a value of a variable I push the name of that varuiable on stack.

To understand the code it is required that you know or have a reference of what each isntruction
of MSIL actually does.

Here is sample program to test if our concept works:

namespace DisasmIL
{
class Math
{
public int add(int x, int y)
{
return x + y;
}
}

class Program
{
static void Main(string[] args)
{
Math m;
int a, b;
m = new Math();
a = 20;
b = 50;
int p = m.add(a, b);
}
}
}

We only check one method. The Main method.When the Main method is
compiled it takes following form.
.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size 23 (0x17)
.maxstack 3
.locals init (
[0] class DisasmIL.Math m,
[1] int32 a,
[2] int32 b,
[3] int32 p
)
IL_0000: nop
IL_0001: newobj instance void DisasmIL.Math::.ctor()
IL_0006: stloc.0
IL_0007: ldc.i4.s 20
IL_0009: stloc.1
IL_000a: ldc.i4.s 50
IL_000c: stloc.2
IL_000d: ldloc.0
IL_000e: ldloc.1
IL_000f: ldloc.2
IL_0010: callvirt instance int32 DisasmIL.Math::'add'(int32,int32)
IL_0015: stloc.3
IL_0016: ret
} // end of method Program::Main
=============================================================

We parse line by line:
------------------------------------------------------------------------
.method private hidebysig static void Main(string[] args) cil managed

It is method declaration with default starting curly brace.
Output code:
static void Main(string[] args)
{
Stack: [empty]
------------------------------------------------------------------------
.entrypoint
// Code size 23 (0x17)
.maxstack 3
.locals init (
[0] class DisasmIL.Math m,
[1] int32 a,
[2] int32 b,
[3] int32 p
)

Need not to be explained. They are self explanatory. Declare variables.
Output code:
DisasmIL.Math m;
int a;
int b;
int p;
Stack:
------------------------------------------------------------------------
IL_0000: nop:

Does nothing (nop).
Output code: [none]
Stack: [empty]
------------------------------------------------------------------------
IL_0001: newobj instance void DisasmIL.Math::.ctor()

Create new instance of DisasmIL.Math using default constructor
so we push "new DisasmIL.Math()" on our stack.
Output code:
Stack: new DisasmIL.Math()
------------------------------------------------------------------------
IL_0006: stloc.0

So we pop top of stack and assign it to local variable 0.
Output code:
m = new DisasmIL.Math();
Stack: [empty]
------------------------------------------------------------------------
IL_0007: ldc.i4.s 20

What we do is push constant 20 on stack.
Output code: [none]
Stack: 20
------------------------------------------------------------------------
IL_0009: stloc.1

So we pop top value and assign it to local variable 1.
Output code:
a = 20;
Stack: [empty]
------------------------------------------------------------------------
IL_000a: ldc.i4.s 50

We push constant 50 on stack.
Output code: [none]
Stack: 50
------------------------------------------------------------------------
IL_000c: stloc.2

We pop top value and assign it to local variable 2.
Output code:
b = 50;
Stack: [empty]
------------------------------------------------------------------------
IL_000d: ldloc.0
IL_000e: ldloc.1
IL_000f: ldloc.2

Push local variable 0, 1 and 2 on stack.
Output code:
Stack: m, a, b
-------------------------------------------------------------------------
IL_0010: callvirt instance int32 DisasmIL.Math::'add'(int32,int32)
IL_0015: stloc.3

We call add method with values top-1, top of stack for instance of
top-2. For any method call if it returns value it is returned on
stack. So check next instruction. If it is a stloc then we assign the
return value. We assign return value to local variable 3.

Output code: p=m.add(a,b);
Stack:
--------------------------------------------------------------------------
IL_0016: ret

Return void. So no code except closing curly brace.
Output code:
}

Stack: [empty]

Now if you add the output codes together you'll find the original C# code is generated. This works for simple cases. Need to test if it works for complex situations. Miles to go....

2 comments:

  1. Did you think about decompiling,
    obfuscated code ?

    ReplyDelete
  2. You need guts to think yourself capable of learning from this legend.

    I was his junior in the undergrad school. To many of us, he was the sky which was our limit.

    He died (married) couples of month ago. Peace be upon him.

    ReplyDelete