Saturday, March 18, 2006

Life through ELF loaders.....

Hey its a long time I have been blogging...... yes its really a tough last two weeks. From last two weeks I have been trying to get a work around to the 'exec-shield' problem we were facing. I have got a lot of new ideas on implementing this. The following are some of the them pretty generic though
o My current problem boils down to create a executable from the running program itsself. To get this working I have been studying the kernels code in 'fs/binfmt_elf.c' especially code around 'load_elf_binary', The following are my finding might find it useful (for myself to look after some time).
-----1.)The kernels loader does not do any great , it basically gets all the metadata from the elf headers and just does the dirty work of just mapping and transferring the control. The summary of what excatly the kernel does is
a.) set the entry point from the (Elf32_Ehdr *).e_entry as the start jump to the program
b.) Load all the segments (PHDRS) the loader just deals with the program headers, it does not use any section headers. It loads all the segments with type (Elf32_Phdr *).type == PT_LOAD. If you do a 'readelf --segments a.out' you can see the segments

(gdb) shell readelf --segments a.out

Elf file type is EXEC (Executable file)
Entry point 0x80482a0
There are 7 program headers, starting at offset 52

Program Headers:
Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
PHDR           0x000034 0x08048034 0x08048034 0x000e0 0x000e0 R E 0x4
INTERP         0x000114 0x08048114 0x08048114 0x00013 0x00013 R   0x1
    [Requesting program interpreter: /lib/]
LOAD           0x000000 0x08048000 0x08048000 0x004cc 0x004cc R E 0x1000
LOAD           0x0004cc 0x080494cc 0x080494cc 0x00104 0x00198 RW  0x1000

DYNAMIC        0x0004dc 0x080494dc 0x080494dc 0x000c8 0x000c8 RW  0x4
NOTE           0x000128 0x08048128 0x08048128 0x00020 0x00020 R   0x4
STACK          0x000000 0x00000000 0x00000000 0x00000 0x00000 RWE 0x4

Section to Segment mapping:
Segment Sections...
 01     .interp
02 .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame
 03     .data .dynamic .ctors .dtors .jcr .got .bss
 04     .dynamic
 05     .note.ABI-tag

c.) One more important thing about how excatly it sets the 'brk' base I guess to set up the 'brk' base the kernel only (also see the copy of the email posted on linux-kernel mailing list
Hello All,

I have been working on an idea of creating an executable from a
running process image.

Process migration among the nodes in distributed computing,
checkpointing process state.


The basis of my idea would be update the existing executable with
extra PHDRS (Program Headers) with type PT_LOAD and each of these
headers corresponding the vaddr mapping from /proc//maps.

I have done some basic study of kernels loders code in
'fs/binfmt_elf.c' especially code in 'load_elf_binary' function, the
following is my understanding.

foreach (phdr in elf_header){

if(phdr->type == PT_LOAD){
if( phdr->filesize <>memsize){
/* Segment with .bss, so update brk and bss*/
else {
/* Just map it*/
/*Update brk bss*/

from the above the kernel is updating brk, thus creating the start of
sbrk(0) only when it sees a PT_LOAD segment with filesize less than memsize. 
The kernel will set brk base i.e sbrk(0) to the value phdr.vaddr+phdr.memsize 
of the last PT_LOAD
segment its mapping? so do I need to reoder my PT_LOAD segments so
that the heap goes as the last PT_LOAD segment?

Is there any way we can tell the elf loader to force the vaddr for
sbrk(0) i.e brk base ?

Let me know your suggestion on this idea?

Really appreciate your valuable comments.


[PS: I dont know if some one has already implemented this idea??]

-----2.) Also found out that the virtual address's for the sections in the segments are the excat virtual address if they are within the range of corresponding phdr. that is I found that if .bss section has a vaddr of 0x00001000 and .data has 0x00000010 and there is a corresponding mapping (rw-p) in /proc//maps as 0x00000004-0x00010000 which includes segment to section mapping in the order '.data ...... .bss' note that .data will not start at 0x00000004 it infact still starts at 0x00000010 same with .bss. This is very logical since if the kernel's loader changes the mapping of the .data section the all the code referencing the virtual address's has to be changes. So the segments in /proc//maps file are not the segments excatly corresponding to 'readelf --segments a.out' infact they are bigger carousels with wrap around the the excat segment address's for page alignment.
More ideas next time..........