环境

Linux-4.14
Aarch64
 

正文

在前面的分析中调用print_symbol("PC is at %s\n", instruction_pointer(regs))输出当前PC地址的时候,输出的的内容却是:PC is at demo_init+0xc/0x1000 [demo]
下面分析一下这个函数print_symbol。
 static __printf(, )
void __check_printsym_format(const char *fmt, ...)
{
} static inline void print_symbol(const char *fmt, unsigned long addr)
{
__check_printsym_format(fmt, "");
__print_symbol(fmt, (unsigned long)
__builtin_extract_return_addr((void *)addr));
}
 
第8行,格式检查
第9行,__builtin_extract_return_addr((void *)addr)返回实际的addr,这里返回的还是addr,这个函数的说明可以参考GCC文档:
内核中dump_stack的实现原理(2) —— symbol-LMLPHP
下面分析__print_symbol
 /* Look up a kernel symbol and print it to the kernel messages. */
void __print_symbol(const char *fmt, unsigned long address)
{
char buffer[KSYM_SYMBOL_LEN]; sprint_symbol(buffer, address); printk(fmt, buffer);
}
 
第6行就是核心,这个函数完成了将address转换成对应的内核符号字符串,并将字符串存入buffer中
 
下面分析sprint_symbol:
 /**
* sprint_symbol - Look up a kernel symbol and return it in a text buffer
* @buffer: buffer to be stored
* @address: address to lookup
*
* This function looks up a kernel symbol with @address and stores its name,
* offset, size and module name to @buffer if possible. If no symbol was found,
* just saves its @address as is.
*
* This function returns the number of bytes stored in @buffer.
*/
int sprint_symbol(char *buffer, unsigned long address)
{
return __sprint_symbol(buffer, address, , );
}
根据注释,这个函数用于查找一个地址为address的内核符号,然后将查找到的符号名字,偏移,大小以及模块名存放到buffer中,如果没有找到的话,只是将address按字符串的格式存入buffer。
这里说明一下:demo_init+0xc/0x1000 [demo]
符号名字:demo_init
偏移:0xc
大小:0x1000
模块名:demo
上面这行的意思是:传入的address处于函数demo_init中,距离demo_init起始地址的偏移为0xC,demo_init函数占用的代码空间是0x1000。所在的内核模块是demo
 
下面分析__sprint_symbol
 /* Look up a kernel symbol and return it in a text buffer. */
static int __sprint_symbol(char *buffer, unsigned long address,
int symbol_offset, int add_offset)
{
char *modname;
const char *name;
unsigned long offset, size;
int len; address += symbol_offset;
name = kallsyms_lookup(address, &size, &offset, &modname, buffer);
if (!name)
return sprintf(buffer, "0x%lx", address - symbol_offset); if (name != buffer)
strcpy(buffer, name);
len = strlen(buffer);
offset -= symbol_offset; if (add_offset)
len += sprintf(buffer + len, "+%#lx/%#lx", offset, size); if (modname)
len += sprintf(buffer + len, " [%s]", modname); return len;
}

上面的第11行的kallsyms_lookup就是根据address获取size,offset,modname

 
kallsyms_lookup
 /*
* Lookup an address
* - modname is set to NULL if it's in the kernel.
* - We guarantee that the returned name is valid until we reschedule even if.
* It resides in a module.
* - We also guarantee that modname will be valid until rescheduled.
*/
const char *kallsyms_lookup(unsigned long addr,
unsigned long *symbolsize,
unsigned long *offset,
char **modname, char *namebuf)
{
const char *ret; namebuf[KSYM_NAME_LEN - ] = ;
namebuf[] = ; if (is_ksym_addr(addr)) {
unsigned long pos; pos = get_symbol_pos(addr, symbolsize, offset);
/* Grab name */
kallsyms_expand_symbol(get_symbol_offset(pos),
namebuf, KSYM_NAME_LEN);
if (modname)
*modname = NULL; ret = namebuf;
goto found;
} /* See if it's in a module or a BPF JITed image. */
ret = module_address_lookup(addr, symbolsize, offset,
modname, namebuf);
if (!ret)
ret = bpf_address_lookup(addr, symbolsize,
offset, modname, namebuf); found:
cleanup_symbol_name(namebuf);
return ret;
}
上面会从三个地方去查找符号,首先是内核中,如果没有找到,就从内核模块中查找,如果还是没有找到的话,最后就从bpf中查找。
 
下面分析第18~30行,即从内核中查找,其他的以后再分析。
第18行,判断addr是否位于内核的代码段
第21行,要分析get_symbol_pos需要用到内核代码编译时生成的的.tmp_kallsyms2.S,其中存放了符号信息。
大致说明一下这个文件:
这个文件是动态生成的,使用的工具是scripts/kallsyms.c,下面说明一下.tmp_kallsyms2.S中的变量作用:
 
内核中dump_stack的实现原理(2) —— symbol-LMLPHP
 
kallsyms_offsets数组中存放的是每个符号距离_text地址的偏移量,对于一下System.map:
 
内核中dump_stack的实现原理(2) —— symbol-LMLPHP
 
可以看到System.map中的符号地址减去_text的地址,就是kallsyms_offsets数组中的值。
 
内核中dump_stack的实现原理(2) —— symbol-LMLPHP
 
kallsyms_relative_base中存放的是符号的基地址,这个值加上kallsyms_offsets数组中的offset就是符号的实际地址
kallsyms_num_syms存放的是内核符号的个数
kallsyms_names中存放的是每个符号的名字,每一行对应一个,不过这里为了压缩字符串,第一列表示后面的字节数,第二列开始表示的都是索引,索引的是kallsyms_token_index数组中的元素,而kallsyms_token_index数组中存放的也是索引,它索引的是kallsyms_token_table
 
内核中dump_stack的实现原理(2) —— symbol-LMLPHP
 
kallsyms_token_index:
 
内核中dump_stack的实现原理(2) —— symbol-LMLPHP
 
kallsyms_token_table:
 
内核中dump_stack的实现原理(2) —— symbol-LMLPHP
 
在遍历kallsyms_names时为了加快索引速度,又引入了kallsyms_markers数组,这个数组每一个成员都是kallsyms_names中每256行的首地址,所以将来在根据address获得内核符号的索引下标后,将这个索引除以256,然后再在这个256行中找到对应的那行就快多了。
 
下面分析get_symbol_pos:
 static unsigned long get_symbol_pos(unsigned long addr,
unsigned long *symbolsize,
unsigned long *offset)
{
unsigned long symbol_start = , symbol_end = ;
unsigned long i, low, high, mid; /* This kernel should never had been booted. */
if (!IS_ENABLED(CONFIG_KALLSYMS_BASE_RELATIVE))
BUG_ON(!kallsyms_addresses);
else
BUG_ON(!kallsyms_offsets); /* Do a binary search on the sorted kallsyms_addresses array. */
low = ;
high = kallsyms_num_syms; while (high - low > ) {
mid = low + (high - low) / ;
if (kallsyms_sym_address(mid) <= addr)
low = mid;
else
high = mid;
} /*
* Search for the first aliased symbol. Aliased
* symbols are symbols with the same address.
*/
while (low && kallsyms_sym_address(low-) == kallsyms_sym_address(low))
--low; symbol_start = kallsyms_sym_address(low); /* Search for next non-aliased symbol. */
for (i = low + ; i < kallsyms_num_syms; i++) {
if (kallsyms_sym_address(i) > symbol_start) {
symbol_end = kallsyms_sym_address(i);
break;
}
} /* If we found no next symbol, we use the end of the section. */
if (!symbol_end) {
if (is_kernel_inittext(addr))
symbol_end = (unsigned long)_einittext;
else if (IS_ENABLED(CONFIG_KALLSYMS_ALL))
symbol_end = (unsigned long)_end;
else
symbol_end = (unsigned long)_etext;
} if (symbolsize)
*symbolsize = symbol_end - symbol_start;
if (offset)
*offset = addr - symbol_start; return low;
}
第18~24,根据addr查找kallsyms_offsets,获取addr在哪两个符号之间。这里用到了二分法的查找方式,最后addr就位于索引为low和high的两个符号之间,其实就是位于索引为low的函数内部
第30,在kallsyms_offsets中可以看到有很多符号的地址是相同的,这行用于获取相同address的符号中的第一个对应的索引,即low
第33,获取索引为low的符号的地址symbol_start
第36~41,获取紧接着比symbol_start大的一个符号地址,symbol_end
第54行,获取地址为symbol_start内核函数的占用的空间的大小
第56行,获取address相对于symbol_start的偏移量
第58行,返回address所在的内核函数的首地址对应的索引号
 
接着分析kallsyms_lookup:
第21行,获取了address所在的内核函数的首地址对应的索引号
第23行,get_symbol_offset获取pos对应的内核符号字符串的地址相对于kallsyms_names的偏移量,可以结合之前对.tmp_kallsyms2.S的分析理解
 /*
* Find the offset on the compressed stream given and index in the
* kallsyms array.
*/
static unsigned int get_symbol_offset(unsigned long pos)
{
const u8 *name;
int i; /*
* Use the closest marker we have. We have markers every 256 positions,
* so that should be close enough.
*/
name = &kallsyms_names[kallsyms_markers[pos >> ]]; /*
* Sequentially scan all the symbols up to the point we're searching
* for. Every symbol is stored in a [<len>][<len> bytes of data] format,
* so we just need to add the len to the current pointer for every
* symbol we wish to skip.
*/
for (i = ; i < (pos & 0xFF); i++)
name = name + (*name) + ; return name - kallsyms_names;
}
kallsyms_expand_symbol:
 /*
* Expand a compressed symbol data into the resulting uncompressed string,
* if uncompressed string is too long (>= maxlen), it will be truncated,
* given the offset to where the symbol is in the compressed stream.
*/
static unsigned int kallsyms_expand_symbol(unsigned int off,
char *result, size_t maxlen)
{
int len, skipped_first = ;
const u8 *tptr, *data; /* Get the compressed symbol length from the first symbol byte. */
data = &kallsyms_names[off];
len = *data;
data++; /*
* Update the offset to return the offset for the next symbol on
* the compressed stream.
*/
off += len + ; /*
* For every byte on the compressed symbol data, copy the table
* entry for that byte.
*/
while (len) {
tptr = &kallsyms_token_table[kallsyms_token_index[*data]];
data++;
len--; while (*tptr) {
if (skipped_first) {
if (maxlen <= )
goto tail;
*result = *tptr;
result++;
maxlen--;
} else
skipped_first = ;
tptr++;
}
} tail:
if (maxlen)
*result = '\0'; /* Return to offset to the next symbol. */
return off;
}
 
 
最后会将转换得到的内核符号的字符串名字拷贝到namebuf中。
 
完。
 
 
 
 
05-28 22:43