Wednesday, June 6, 2007

MISL to C# (Sharp) -> Loops - the simple cases

I wish, as a decompiler writer, there would be no loops. Programmers use thousands of goto statements with if statements. But as it is not the case I must understand how to parse MSIL instructions that were generated from the loops.

Of the three types of most common loops (for, while, do-while) the while loop is the basic one. The block that is generated from any type of these loops usually has a conditional jump (usually a brtrue.s

Monday, June 4, 2007

MISL to C# (Sharp) -> Iff (if and only if)

Today I’ll talk about the most basic and useful if-then structure. The compiler generates a conditional branch. We create a block of instructions for the structure. The block is separated by the branch instruction (like brtrue) and the branch label (like IL_0019 - where the code jumps). And our condition is on the stack. If we find a true condition branch we negate it and put it as a if statement condition. The block is initially a block of MSIL that we will convert to C# code later (
may need recursion here??). Please note that the labels are not stored in MSIL. It is just the byte offset of the MSIL in a method.

If you have not already read you are requested to read my previous post.


We take a simple method to test our theory.

public int IfStructure(int a, int b)
{
bool CS_4_0001;
if (a < b)
{
System.Console.Write("Condition is true");
}
return b;
}

Here is the MSIL code generated by Visual Studio 2005 compiler.

.method public hidebysig instance int32 IfStructure(int32 a, int32 b) cil managed
{
// Code size 31 (0x1f)
.maxstack 2
.locals init ([0] int32 CS$1$0000,
[1] bool CS$4$0001)
IL_0000: nop
IL_0001: ldarg.1
IL_0002: ldarg.2
IL_0003: clt
IL_0005: ldc.i4.0
IL_0006: ceq
IL_0008: stloc.1
IL_0009: ldloc.1
IL_000a: brtrue.s IL_0019

IL_000c: nop
IL_000d: ldstr "Condition is true"
IL_0012: call void [mscorlib]System.Console::Write(string)
IL_0017: nop
IL_0018: nop

IL_0019: ldarg.2
IL_001a: stloc.0
IL_001b: br.s IL_001d
IL_001d: ldloc.0
IL_001e: ret
} // end of method ControlStructures::IfStructure

We now start parsing:
-------------------------------------------------------------------------------------
.method public hidebysig instance int32 IfStructure(int32 a, int32 b) cil managed
{
// Code size 31 (0x1f)
.maxstack 2
.locals init ([0] int32 CS$1$0000,
[1] bool CS$4$0001)
--

These lines generate output that we do without any processing of MSIL. There is method definition and local variables. We change the local variable names to C# current names without conflict. For simplicity here we just replace '$' with '_'. One thing to evaluate the MSIL instructions we must keep a map of local variables with variable number. In this example CS$1$0000 is local variable 0 of type int. For clarity we do not show the map here. A simple STL map should work.

Output:
public int IfStructure(int a, int b)
{
int32 CS_1_0000;
bool CS_4_0001;

Stack: [Empty]

-------------------------------------------------------------------------------------
IL_0000: nop
IL_0001: ldarg.1
IL_0002: ldarg.2
--
So we push method argument 1 and 2 to stack:

Output: [None]

Stack:
a,b

-------------------------------------------------------------------------------------
IL_0003: clt
--

This instructs us if stack top-1 is less than stack top. The two elements are popped from stack and result goes to stack. No output of course.

Output: [None]

Stack:
a<b

-------------------------------------------------------------------------------------
IL_0005: ldc.i4.0
--

Load (aka push) constant integer of value 0 on stack.

Output: [None]
Stack: a<b,0
-------------------------------------------------------------------------------------
IL_0006: ceq
--

Check if stack top-1 equals stack top. Result goes to stack.
Output: [None]
Stack: a < b == 0

-------------------------------------------------------------------------------------
IL_0008: stloc.1
IL_0009: ldloc.1
--

What else? Store stack top in local variable 1 and load that on tack again. We decided previously when we store some value in a local variable we use assignment to that variable and output that code. For clarity I added parentheses:

Output:CS_4_0001 = (a < b == 0)
Stack:CS_4_0001

-------------------------------------------------------------------------------------
IL_000a: brtrue.s IL_0019
--

We have got a conditional branch. We create a block starting from here to IL_0019. And put them in curly braces. And our condition is on the stack. We find a true condition branch so we negate it and put it as if structure as I said at the beginning.

Output:
if(!CS_4_0001)
{
IL_000c: nop
IL_000d: ldstr "Condition is true"
IL_0012: call void [mscorlib]System.Console::Write(string)
IL_0017: nop
IL_0018: nop
}

Stack: [Empty]

-------------------------------------------------------------------------------------
IL_0019: ldarg.2
IL_001a: stloc.0
IL_001b: br.s IL_001d
IL_001d: ldloc.0
IL_001e: ret

We do not parse them here. They are very simple to understand.
===========================================================

Ok we now can work on little more complex code3s than easiest. This will also produce codes that was generated by "for structure" but in a funny way. If we add a little more intelligence to produce "goto" output code for special branching that we cannot handle with if, we get following result.

The code like this:


for(int i=0;i<10;i++)
{
...
}
...

will be converted to:

int i;
i=0;
label_1:

if(i<10)
{
---
i++;
goto label_1;
}
...

For is not our today’s material. We'll look at loops next time.

=====================================================================================
Now you may find that our theory generates a funny code block like:

CS$4$0001=((a<b)==0);

if(!CS$4$0001)
{

---

}


Here the optimization comes to scene. But we skip them for future.