原创文章,欢迎转载。转载请注明:转载自淘宝核心系统团队博客,谢谢!
原文链接地址:下面介绍我调试时经常遇到的三种问题,如果大家也有类似的问题交流一下解决方法:
情景1:在不中止程序服务的情况下,怎么调试正在运行时的程序 情景2:需要同时看几个变量的值或者批量查看多个core文件的堆栈信息怎么办 情景3:遇到需要查看、队列、链表、树、堆等数据结构里的变量怎么办 1. 情景1:在不中止程序服务的情况下,怎么调试正在运行时的程序 我们在生产环境或者测试环境,会遇到一些异常,我们需要知道程序中的变量或者内存的值来确定程序运行状态 之前听过@淘宝褚霸讲过用systemstap可以实现这种功能,但systamstap写起来复杂一些, 还有时候在低内核版本的操作系统上用stap之后,程序或者操作系统都有可能死掉。看过多隆调试程序时用pstack(修改了pstack代码,用gdb实现的,详见http://blog.yufeng.info/archives/873)查看和修改一个正在
执行程序的全局变量,感觉很神奇,尝试用gdb实现这种功能:保存下面代码到文件runstack.sh
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #!/bin/sh if test $ # -ne 2; then echo "Usage: `basename $0 .sh` <process-id> cmd" 1>&2 echo "For exampl: `basename $0 .sh` 1000 bt" 1>&2 exit 1 fi if test ! -r /proc/ $1 ; then echo "Process $1 not found." 1>&2 exit 1 fi result= "" GDB=${GDB:-/usr/bin/gdb} # Run GDB, strip out unwanted noise. result=` $GDB - -quiet -nx /proc/ $1 /exe $1 <<EOF 2>&1 $2 EOF` echo "$result" | egrep -A 1000 -e "^\(gdb\)" | egrep -B 1000 -e "^\(gdb\)" |
用于测试runstack.sh调试的c代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> typedef struct slist { struct slist *next; char data[4096]; } slist; slist input_list = {NULL, { '\0' }}; int count = 0; static void stdin_read (int fd) { char buf[4096]; int ret; memset(buf, 0, 4096); fprintf(stderr, "please input string:" ); while (ret = read(fd, buf, 4096)) { slist *node = calloc(1, sizeof(slist)); memcpy(node->data, buf, ret); node->next = input_list.next; input_list.next = node; count ++; if (memcmp(buf, "quit" , 4) == 0) { fprintf(stdout, "input quit:\n" ); return; } fprintf(stderr, "ret: %d, there is %d strings, current is %s\nplease input string:" , ret, count, buf); } } int main() { fprintf(stderr, "main run!\n" ); stdin_read(STDIN_FILENO); slist *nlist; slist *list = input_list.next; while (list) { fprintf(stderr, "%s\n" , list->data); nlist = list->next; free(list); list = nlist; } return 0; } |
编译c代码:gcc -g -o read_input read_input.c
执行./read_input 我们开始使用runstack.sh来调试 使用方法:sh ./runstack.sh pid “command”来试验一下:
[shihao@xxx]$ ps aux |grep read_input|grep -v grep shihao 10933 0.0 0.0 3668 332 pts/4 S+ 09:41 0:00 ./read_input 10933是一个read_input程序的进程号1)打印代码
sudo sh ./runstack.sh 10933 “list main” 结果 (gdb) 35 fprintf(stderr, “ret: %d, there is %d strings, current is %s\nplease input string:”, ret, count, buf); 36 } 37 } 38 39 int main() 40 { 41 fprintf(stderr, “main run!\n”); 42 43 stdin_read(STDIN_FILENO); 44 (gdb) quit2)显示程序全局变量值
./runstack.sh 10933 “p count” (gdb) $1 = 1 (gdb) quit3)修改变量值
执行下面命令前 [shihao@tfs036097 gdb]$ runstack.sh 11190 “set count=100″ 结果: (gdb) (gdb) quit我们可以用上面命令看我们修改成功没有
[shihao@tfs036097 gdb]$ runstack.sh 11190 “p count” (gdb) $1 = 100 (gdb) quit 全局变量count变成100了。注:1)有一些程序经过操作系统优化过,直接用上面的方法可能有找不到符号表的情况
1 2 3 | result=` $GDB - -quiet -nx /proc/ $1 /exe $1 <<EOF 2>&1 $2 EOF` |
可以把上面的代码改成下面的试试,如果不行可能是其他原因
1 2 3 4 | BIN=`readlink -f /proc/ $1 /exe` result=` $GDB - -quiet -nx $BIN $1 <<EOF 2>&1 $2 EOF` |
2)需要有查看和修改运行的进程的权限
2. 情景2:需要同时看几个变量的值或者批量查看多个core文件的信息怎么办 1)多个变量的情景 我们同时看一下count和input_list里面的值和堆栈信息,我们可以写一个script.gdb $ cat script.gdb 1 2 3 4 5 | p input_list p count bt f 1 p buf |
执行 runstack.sh 10933 “source script.gdb”
(gdb) $1 = {next = 0x597c020, data = ” } $2 = 2 #0 0x0000003fa4ec5f00 in __read_nocancel () from /lib64/libc.so.6 #1 0x00000000004007c7 in stdin_read (fd=0) at read_input.c:23 #2 0×0000000000400803 in main () at read_input.c:43 #1 0x00000000004007c7 in stdin_read (fd=0) at read_input.c:23 23 while (ret = read(fd, buf, 4096)) { $3 = “12345\n”, ” (gdb) quit 这样就可以同时做多个操作 2)批处理查看core的情况 有的时候会出现很多core文件,我们想知道哪些core文件是因为相同的原因,哪些是不相同的,看一个两个的时候还比较轻松 $ ls core.* core.12281 core.12282 core.12283 core.12284 core.12286 core.12287 core.12288 core.12311 core.12313 core.12314 像上面有很多core文件,一个一个用gdb去执行bt去看core在哪里有点麻烦,我们想有把所有的core文件的堆栈和变量信息打印出来 我对runstack稍作修改就可以实现我们的需求,我们起名叫corestack.sh 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | #!/bin/sh if test $ # -ne 3; then echo "Usage: `basename $0 .sh` program core cmd" 1>&2 echo "For example: `basename $0 .sh` ./main core.1111 bt" 1>&2 exit 1 fi if test ! -r $1 ; then echo "Process $1 not found." 1>&2 exit 1 fi result= "" GDB=${GDB:-/usr/bin/gdb} # Run GDB, strip out unwanted noise. result=` $GDB - -quiet -nx $1 $2 <<EOF 2>&1 $3 EOF` echo "$result" | egrep -A 1000 -e "^\(gdb\)" | egrep -B 1000 -e "^\(gdb\)" |
我们可以这样执行:
./corestack.sh ./read_input core.12281 “bt” 执行结果: (gdb) #0 0x0000003fa4e30265 in raise (sig=) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x0000003fa4e31d10 in abort () at abort.c:88 #2 0x0000003fa4e296e6 in __assert_fail (assertion=, file=, line=, function=) at assert.c:78 #3 0x00000000004008ba in main () at read_input.c:55 (gdb) quit 查看多个core文件堆栈信息的准备工作差不多了,我们写个脚本就可以把所有的core文件堆栈打印出来了执行以下:for i in `ls core.*`;do ./corestack.sh ./read_input $i “bt”; done
(gdb) #0 0x0000003fa4e30265 in raise (sig=) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x0000003fa4e31d10 in abort () at abort.c:88 #2 0x0000003fa4e296e6 in __assert_fail (assertion=, file=, line=, function=) at assert.c:78 #3 0x00000000004008ba in main () at read_input.c:55 (gdb) quit …… (gdb) #0 0x0000003fa4e30265 in raise (sig=) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x0000003fa4e31d10 in abort () at abort.c:88 #2 0x0000003fa4e296e6 in __assert_fail (assertion=, file=, line=, function=) at assert.c:78 #3 0x00000000004008ba in main () at read_input.c:55 (gdb) quit ok, 我们看到了所有core文件的堆栈。3. 情景3:遇到需要查看、队列、链表、树、堆等数据结构里的变量怎么办?
下面介绍链表怎么处理,对其他数据结构感兴趣的同学可以自己尝试编写一些gdb脚本(麻烦@周哲士豪一下我,我也学习学习), 希望我们可以实现一个gdb调试工具箱gdb是支持编写的脚本的 http://sourceware.org/gdb/onlinedocs/gdb/Command-Files.html
我们写个plist.gdb,用while循环来遍历链表 $ cat plist.gdb 1 2 3 4 5 6 | set $list =&input_list while( $list ) p * $list set $list = $list ->next end |
我们执行一下:runstack.sh 13434 “source plist.gdb”
(gdb) $1 = {next = 0x3d61040, data = ” } $2 = {next = 0x3d60030, data = “123456\n”, ” } $3 = {next = 0x3d5f020, data = “12345\n”, ” } $4 = {next = 0x3d5e010, data = “1234\n”, ” } $5 = {next = 0×0, data = “123\n”, ” } (gdb) quit实际上我们可以把plist写成自定义函数,执行gdb的时候会在当前目下查找.gdbinit文件加载到gdb:
$ cat .gdbinit 1 2 3 4 5 6 7 8 9 | define plist set $list = $arg0 while( $list ) p * $list set $list = $list ->next end end |
这样就可以用plist命令遍历list的值
$ runstack.sh 13434 “plist &input_list” (gdb) $1 = {next = 0x3d61040, data = ” } $2 = {next = 0x3d60030, data = “123456\n”, ” } $3 = {next = 0x3d5f020, data = “12345\n”, ” } $4 = {next = 0x3d5e010, data = “1234\n”, ” } $5 = {next = 0×0, data = “123\n”, ” } (gdb) quit参考资料:
霸爷的博客:http://blog.yufeng.info/archives/873 gdb从脚本加载命令:http://blog.lifeibo.com/?p=380 gdb官方文档:http://sourceware.org/gdb/onlinedocs/gdb/Command-Files.html
[...] 使用gdb调试运行时的程序小技巧 发表于 十月 15, 2012 由 flychen QQ空间 新浪微博 腾讯微博 人人网 更多 document.getElementById("bdshell_js").src = "; + new Date().getHours(); 原创文章,欢迎转载。转载请注明:转载自淘宝核心系统团队博客,谢谢! 原文链接地址:使用gdb调试运行时的程序小技巧 [...]
[...] 原创文章,欢迎转载。转载请注明:转载自淘宝核心系统团队博客,谢谢! 原文链接地址:使用gdb调试运行时的程序小技巧 [...]
好文!
遇到了问题,在网上搜了很多,也没解决,求大神们看看,这怎么解决,同样的实验,我在我的CentOS 6.2上实验是正常。
~ # ./runstack.sh 18641 “p count” (gdb) Hangup detected on fd 0~ # gdb -v
GNU gdb Fedora (6.8-37.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type “show copying” and “show warranty” for details. This GDB was configured as “x86_64-redhat-linux-gnu”.把runstack.sh最后一行
echo “$result” | egrep -A 1000 -e “^\(gdb\)” | egrep -B 1000 -e “^\(gdb\)” 改成echo “$result”,看一下结果!试过了,当时就试过了,主要原因是我在我的centos 6.2上是OK的,但是在我的2.6.24内核上就不能正常运行。两者的gdb不是同一个版本。
把runstack.sh最后一行
echo “$result” | egrep -A 1000 -e “^\(gdb\)” | egrep -B 1000 -e “^\(gdb\)” 改成echo “$result”,看一下结果! 这样改一下,gdb会有一些提示信息,可以贴出来看看。 比如像下面的信息:Attaching to program: /proc/31881/exe, process 31881
Reading symbols from /lib64/libcrypto.so.6…done. Loaded symbols for /lib64/libcrypto.so.6 Reading symbols from /lib64/libpcre.so.0…done. Loaded symbols for /lib64/libpcre.so.0 Reading symbols from /lib64/libresolv.so.2…Reading symbols from /usr/lib/debug/lib64/libresolv-2.5.so.debug…done. done. Loaded symbols for /lib64/libresolv.so.2 Reading symbols from /lib64/libpthread.so.0…Reading symbols from /usr/lib/debug/lib64/libpthread-2.5.so.debug…done. [Thread debugging using libthread_db enabled] [New Thread 0x2ae9b41bfea0 (LWP 31881)] [New Thread 0x48cff940 (LWP 31893)] [New Thread 0x482fe940 (LWP 31892)] [New Thread 0x478fd940 (LWP 31891)] [New Thread 0x46efc940 (LWP 31890)] [New Thread 0x464fb940 (LWP 31889)] [New Thread 0x45afa940 (LWP 31888)] [New Thread 0x450f9940 (LWP 31887)] [New Thread 0x446f8940 (LWP 31886)] [New Thread 0x43cf7940 (LWP 31885)] [New Thread 0x432f6940 (LWP 31884)] [New Thread 0x428f5940 (LWP 31883)] [New Thread 0x41220940 (LWP 31882)] done. Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libm.so.6…Reading symbols from /usr/lib/debug/lib64/libm-2.5.so.debug…done. done. Loaded symbols for /lib64/libm.so.6 Reading symbols from /lib64/libc.so.6…Reading symbols from /usr/lib/debug/lib64/libc-2.5.so.debug…done. done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2…Reading symbols from /usr/lib/debug/lib64/ld-2.5.so.debug…done. done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib64/libdl.so.2…Reading symbols from /usr/lib/debug/lib64/libdl-2.5.so.debug…done. done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /usr/lib64/libz.so.1…done. Loaded symbols for /usr/lib64/libz.so.1 Reading symbols from /lib64/libnss_files.so.2…Reading symbols from /usr/lib/debug/lib64/libnss_files-2.5.so.debug…done. done. Loaded symbols for /lib64/libnss_files.so.2 0x0000003fa56077e5 in pthread_join (threadid=, thread_return=) at pthread_join.c:89 89 lll_wait_tid (pd->tid); (gdb (gdb) quit The program is running. Quit anyway (and detach it)? (y or n) [answered Y; input not from terminal] Detaching from program: /proc/31881/exe, process 31881~ # gdb -v
GNU gdb Fedora (6.8-37.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type “show copying” and “show warranty” for details. This GDB was configured as “x86_64-redhat-linux-gnu”.看来这里面的都是技术大亨!
给你个脚本,人家写的,跑C++调试
不错,用stl的同学有福了!