发现很多的朋友经常用到PE格式相关的开发,如解析PE文件的格式,获取相关的内容。
比如常常用到的静态的病毒启发式检测模型的建立、病毒样本分类、查壳脱壳等。
搜索了一下发现论坛里面没有我要讲的这个东西,于是我在这里向大家推荐pefile这个python库。
这个是基于MIT licence的一个开源项目,你可以在上面做更多的开发。
开发包的下载地址:http://code.google.com/p/pefile/
我觉得有以下几点大家可以注意:
话不多说,直接教大家使用,看完后,方可知道pefile的强大。
1. 当然是要安装python开发包。
2. 下载pefile到本地,解压,新建一个文件petest.py
import os, string, shutil,re import pefile ##记得import pefile PEfile_Path = r"C:\temp\test.exe" pe = pefile.PE(PEfile_Path) print PEfile_Path print pe
实验一结果
C:\temp\test.exe ----------DOS_HEADER---------- [IMAGE_DOS_HEADER] e_magic: 0x5A4D e_cblp: 0x90 e_cp: 0x3 e_crlc: 0x0 e_cparhdr: 0x4 e_minalloc: 0x0 e_maxalloc: 0xFFFF e_ss: 0x0 e_sp: 0xB8 e_csum: 0x0 e_ip: 0x0 e_cs: 0x0 e_lfarlc: 0x40 e_ovno: 0x0 e_res: e_oemid: 0x0 e_oeminfo: 0x0 e_res2: e_lfanew: 0xD0 ----------NT_HEADERS---------- [IMAGE_NT_HEADERS] Signature: 0x4550 ----------FILE_HEADER---------- [IMAGE_FILE_HEADER] Machine: 0x14C NumberOfSections: 0x2 TimeDateStamp: 0x46A8C07C [Thu Jul 26 15:40:44 2007 UTC] PointerToSymbolTable: 0x0 NumberOfSymbols: 0x0 SizeOfOptionalHeader: 0xE0 Characteristics: 0x10F Flags: IMAGE_FILE_LOCAL_SYMS_STRIPPED, IMAGE_FILE_32BIT_MACHINE, IMAGE_FILE_EXECUTABLE_IMAGE, IMAGE_FILE_LINE_NUMS_STRIPPED, IMAGE_FILE_RELOCS_STRIPPED ----------OPTIONAL_HEADER---------- [IMAGE_OPTIONAL_HEADER] Magic: 0x10B MajorLinkerVersion: 0x6 MinorLinkerVersion: 0x0 SizeOfCode: 0x420 SizeOfInitializedData: 0x130 SizeOfUninitializedData: 0x0 AddressOfEntryPoint: 0x522 BaseOfCode: 0x220 BaseOfData: 0x640 ImageBase: 0x400000 SectionAlignment: 0x10 FileAlignment: 0x10 MajorOperatingSystemVersion: 0x4 MinorOperatingSystemVersion: 0x0 MajorImageVersion: 0x0 MinorImageVersion: 0x0 MajorSubsystemVersion: 0x4 MinorSubsystemVersion: 0x0 Reserved1: 0x0 SizeOfImage: 0x768 SizeOfHeaders: 0x420 CheckSum: 0x0 Subsystem: 0x2 DllCharacteristics: 0x0 SizeOfStackReserve: 0x100000 SizeOfStackCommit: 0x1000 SizeOfHeapReserve: 0x100000 SizeOfHeapCommit: 0x1000 LoaderFlags: 0x0 NumberOfRvaAndSizes: 0x10 DllCharacteristics: ----------PE Sections---------- [IMAGE_SECTION_HEADER] Name: .text Misc: 0x418 Misc_PhysicalAddress: 0x418 Misc_VirtualSize: 0x418 VirtualAddress: 0x220 SizeOfRawData: 0x420 PointerToRawData: 0x420 PointerToRelocations: 0x0 PointerToLinenumbers: 0x0 NumberOfRelocations: 0x0 NumberOfLinenumbers: 0x0 Characteristics: 0x60000020 Flags: IMAGE_SCN_CNT_CODE, IMAGE_SCN_MEM_EXECUTE, IMAGE_SCN_MEM_READ Entropy: 6.385628 (Min=0.0, Max=8.0) MD5 hash: 37ae973124ba5655ce156536f4018759 SHA-1 hash: 6354d772105b66ac33fb8950b76a289edafa230f SHA-256 hash: f6dfe337c6c6278e60a687552d8fc3be2a2ed41a4278713cfd0dc631296befdc SHA-512 hash: 9d22cdd011d7276f47e3b1844804d58be2e73eef826ad285769d449f03dbfcde743303b31a9172e513be571432b7b2080afe571e5819ec7968acd76c0d82207a [IMAGE_SECTION_HEADER] Name: .rsrc Misc: 0x128 Misc_PhysicalAddress: 0x128 Misc_VirtualSize: 0x128 VirtualAddress: 0x640 SizeOfRawData: 0x130 PointerToRawData: 0x840 PointerToRelocations: 0x0 PointerToLinenumbers: 0x0 NumberOfRelocations: 0x0 NumberOfLinenumbers: 0x0 Characteristics: 0x40000040 Flags: IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ Entropy: 2.905524 (Min=0.0, Max=8.0) MD5 hash: cfd4f1a98445485c616ea2ff9390278e SHA-1 hash: 7480ffe5427a540e17353df9c490dbba86fd0c3b SHA-256 hash: 93f9ad56e464614b6aa9521f2b80f3f7f2fd5e2b6d8d6fd6489a0b1cdb1f948e SHA-512 hash: b054ba77825a4bb92d9beecb606d04f7a4bf4d16529d909e03e6b882175e23fb495c1c3dc9d921c3124210a6567bf68e70879d3163ece1a1cbb786f3ec94af43 ----------Directories---------- [IMAGE_DIRECTORY_ENTRY_EXPORT] VirtualAddress: 0x0 Size: 0x0 [IMAGE_DIRECTORY_ENTRY_IMPORT] VirtualAddress: 0x574 Size: 0x3C [IMAGE_DIRECTORY_ENTRY_RESOURCE] VirtualAddress: 0x640 Size: 0x128 [IMAGE_DIRECTORY_ENTRY_EXCEPTION] VirtualAddress: 0x0 Size: 0x0 [IMAGE_DIRECTORY_ENTRY_SECURITY] VirtualAddress: 0x0 Size: 0x0 [IMAGE_DIRECTORY_ENTRY_BASERELOC] VirtualAddress: 0x0 Size: 0x0 [IMAGE_DIRECTORY_ENTRY_DEBUG] VirtualAddress: 0x0 Size: 0x0 [IMAGE_DIRECTORY_ENTRY_COPYRIGHT] VirtualAddress: 0x0 Size: 0x0 [IMAGE_DIRECTORY_ENTRY_GLOBALPTR] VirtualAddress: 0x0 Size: 0x0 [IMAGE_DIRECTORY_ENTRY_TLS] VirtualAddress: 0x0 Size: 0x0 [IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG] VirtualAddress: 0x0 Size: 0x0 [IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT] VirtualAddress: 0x0 Size: 0x0 [IMAGE_DIRECTORY_ENTRY_IAT] VirtualAddress: 0x220 Size: 0x1C [IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT] VirtualAddress: 0x0 Size: 0x0 [IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR] VirtualAddress: 0x0 Size: 0x0 [IMAGE_DIRECTORY_ENTRY_RESERVED] VirtualAddress: 0x0 Size: 0x0 ----------Imported symbols---------- [IMAGE_IMPORT_DESCRIPTOR] OriginalFirstThunk: 0x5B0 Characteristics: 0x5B0 TimeDateStamp: 0x0 [Thu Jan 01 00:00:00 1970 UTC] ForwarderChain: 0x0 Name: 0x5E0 FirstThunk: 0x220 KERNEL32.dll.GetModuleHandleA Hint[294] [IMAGE_IMPORT_DESCRIPTOR] OriginalFirstThunk: 0x5B8 Characteristics: 0x5B8 TimeDateStamp: 0x0 [Thu Jan 01 00:00:00 1970 UTC] ForwarderChain: 0x0 Name: 0x62C FirstThunk: 0x228 USER32.dll.EndDialog Hint[185] USER32.dll.GetDlgItemTextA Hint[260] USER32.dll.DialogBoxParamA Hint[147] USER32.dll.MessageBoxA Hint[446] ----------Resource directory---------- [IMAGE_RESOURCE_DIRECTORY] Characteristics: 0x0 TimeDateStamp: 0x0 [Thu Jan 01 00:00:00 1970 UTC] MajorVersion: 0x0 MinorVersion: 0x0 NumberOfNamedEntries: 0x0 NumberOfIdEntries: 0x1 Id: [0x5] (RT_DIALOG) [IMAGE_RESOURCE_DIRECTORY_ENTRY] Name: 0x5 OffsetToData: 0x80000018 [IMAGE_RESOURCE_DIRECTORY] Characteristics: 0x0 TimeDateStamp: 0x0 [Thu Jan 01 00:00:00 1970 UTC] MajorVersion: 0x0 MinorVersion: 0x0 NumberOfNamedEntries: 0x0 NumberOfIdEntries: 0x1 Id: [0x65] [IMAGE_RESOURCE_DIRECTORY_ENTRY] Name: 0x65 OffsetToData: 0x80000030 [IMAGE_RESOURCE_DIRECTORY] Characteristics: 0x0 TimeDateStamp: 0x0 [Thu Jan 01 00:00:00 1970 UTC] MajorVersion: 0x0 MinorVersion: 0x0 NumberOfNamedEntries: 0x0 NumberOfIdEntries: 0x1 [IMAGE_RESOURCE_DIRECTORY_ENTRY] Name: 0x804 OffsetToData: 0x48 [IMAGE_RESOURCE_DATA_ENTRY] OffsetToData: 0x6A0 Size: 0xC8 CodePage: 0x0 Reserved: 0x0
实验一只是做了简简单单的print,但是可以看出pefile对test.exe做了全面的解析从DOS_Header 到 OPTIONAL_HEADER 再到PE SECTIONS。每个结构都可以完全的取得。细心的朋友还可以发现,他甚至可以做对一个section header的hash运算,包括md5, sha1, sha-256, sha-512,对导入导出函数也做了列举。
当然大家会问,未必我们就直接一个print就行了,然后做字符串解析,匹配来获得我们想要的信息?那pefile肯定不至于那么愚昧,当然要提供更多的接口。比如得到entrypoint
代码:
print hex(pe.OPTIONAL_HEADER.AddressOfEntryPoint)
实验二 代码:
import os, string, shutil,re import pefile ##记得import pefile PEfile_Path = r"C:\temp\test.exe" pe = pefile.PE(PEfile_Path) print PEfile_Path for section in pe.sections: print section
代码: 实验二结果
C:\temp\test.exe [IMAGE_SECTION_HEADER] Name: .text Misc: 0x418 Misc_PhysicalAddress: 0x418 Misc_VirtualSize: 0x418 VirtualAddress: 0x220 SizeOfRawData: 0x420 PointerToRawData: 0x420 PointerToRelocations: 0x0 PointerToLinenumbers: 0x0 NumberOfRelocations: 0x0 NumberOfLinenumbers: 0x0 Characteristics: 0x60000020 [IMAGE_SECTION_HEADER] Name: .rsrc Misc: 0x128 Misc_PhysicalAddress: 0x128 Misc_VirtualSize: 0x128 VirtualAddress: 0x640 SizeOfRawData: 0x130 PointerToRawData: 0x840 PointerToRelocations: 0x0 PointerToLinenumbers: 0x0 NumberOfRelocations: 0x0 NumberOfLinenumbers: 0x0 Characteristics: 0x40000040
可以看出此文件有2个节.text 和 .rsrc,并且给出了节的相关信息。当然如果你需要获得某一节的具体的某个信息如Characteristics,可以采用
print hex(pe.sections[i].Characteristics)
实验三 代码:
import os, string, shutil,re import pefile ##记得import pefile PEfile_Path = r"C:\temp\test.exe" pe = pefile.PE(PEfile_Path) print PEfile_Path for importeddll in pe.DIRECTORY_ENTRY_IMPORT: print importeddll.dll ##or use #print pe.DIRECTORY_ENTRY_IMPORT[0].dll for importedapi in importeddll.imports: print importedapi.name ##or use #print pe.DIRECTORY_ENTRY_IMPORT[0].imports[0].name
代码: 实验三-结果
C:\temp\test.exe KERNEL32.dll GetModuleHandleA USER32.dll EndDialog GetDlgItemTextA DialogBoxParamA MessageBoxA
实验三得出test.exe导入了kernel32.dll和user32.dll然后分别导入了1个和4个API函数。
关于pefile的使用和他的强大功能想必大家也是有所体会,他还有很多的其他功能,比如修改PE结构,另外导入PEiD的特征库就可以支持查壳等等。大家可以试着用一下。
希望这个pefile和强大功能和python的简单易用能帮助到大家。