I wish, as a decompiler writer, there would be no loops. Programmers use thousands of goto statements with if statements. But as it is not the case I must understand how to parse MSIL instructions that were generated from the loops.
Of the three types of most common loops (
for ,
while ,
do-while ) the
while loop is the basic one. The block that is generated from any type of these loops usually has a conditional jump (usually a brtrue.s
) as the last instruction of the block. The difference from if structure is- the instruction jumps to an offset less than current instruction offset. The for and while loop has a unconditional branch (br.s ) to an offset that is between start and end of the block. The jump target usually at the beginning of the condition checking instructions. The do-while loop lacks this branch for the reason - it does not test the condition before it is at the end of the block. So, we get instruction block like following MSIL block: ------------------------------------------------------------------------------------- IL_0010: br.s IL_005a ;do-while loop does not have this line IL_0012: nop [any type and number of instructions] IL_0059: nop [condition check instruction- results boolean value on stack] IL_0060: brtrue.s IL_0012 ------------------------------------------------------------------------------------- On my previous post on if structure I showed how to create a boolean condition for if structure. Things are similar here for the loops. Follow the instructions - get the top stack element when conditional jump found - reverse it (add just an !) for brtrue.s jump and put it as the loop statements condition statement. Please note that the conditional jump targets the instruction just at or after the starting instruction of the block. Here we find we can not have a single passing decompiler. We must identify the code blocks in an iteration before final iteration. Till now we can identify blocks of if , for , while , do-while structures by using conditional jump instructions and their destination. If has destination offset after the current offset and others have destination before the current offset. The for and while can not be distinguished very clearly but the do-while does not have a jump at the beginning. And of course there can be nested blocks that are generates from nested loops. There is some complex variation of the loops - like infinite loops, forcach loop etc. They are not much different. But I want to discuss the control structure later in more detail. This post is little bit more theoretical- see you again very soon with more interesting things.
No comments:
Post a Comment