一、背景

  近期有一个项目在运行当中出现一些问题,程序顺利启动,但是观察一阵子后发现内存使用总量在很缓慢地升高,

虽然偶尔还会往下降一些,但是总体还是不断上升;内存运行6个小时候从33M上升到80M;

  程序存在内存泄漏是确定无疑的了,大概出问题的方向也知道,就是程序新加入一个采集协议(BACnet协议,MSTP_DLL),

但是怎么把具体泄漏位置找出来却非常麻烦,因为这个协议是封装在一个C语言写的动态库中,想要单步调试好像不太可能,

况且源码也不再我这里;

  如果到此为止,推脱给其他同事找问题,那联合调试费时不说。其他同事也身兼数职,不大可能有时间调试,

那项目推进肯定停滞;那没办法了,只能硬着头皮上;网上了解一番,对于这种内存泄漏问题,比较好的处理方式就是

抓取内存快照,然后分析数据提交记录,使用查看使用堆栈等信息;所以基于以上原因,选择了windbg内核调试工具;

先分析一下看看,说不定可以发现问题;

二、windbg注意事项

1、首先要安装对版本,即你的程序是32位还是64位,对于的windbg版本也要一致,否则会报错;详情了解:点击这里

2、需要用64位的任务管理器抓32位的dump文件,那不能直接在任务管理器右键“创建转储文件“,需要运行(C:\Windows\SysWOW64\taskmgr.exe)

3、或者直接在windbg上使用命令存储,先附加到进程,然后使用命令:(.dump /ma c:\xxx.dmp),这样就将快照保存在C盘了;

4、最重要的,要确保你的机器能连接外网;由于windbg的使用需要在线更新符号文件,但是这个地址刚好被国家防火墙屏蔽;

三、windbg必要设置

1、首先我先抓取2个内存快照文件(中间相隔一段时间),如下

记一次使用windbg排查内存泄漏的过程-LMLPHP

 记一次使用windbg排查内存泄漏的过程-LMLPHP

 记一次使用windbg排查内存泄漏的过程-LMLPHP

2、打开windbg,设置符号下载路径

将33.dmp直接拖进工作区即可,然后打开菜单File -> Symbol File Path

输入地址:SRV*c:\symbols*http://msdl.microsoft.com/download/symbols

记一次使用windbg排查内存泄漏的过程-LMLPHP

四、分析文件

1、分别打开两个dmp文件,输入命令!dumpheap -stat查看各种类型的内存分配情况

33.dmp
>.load C:\Windows\Microsoft.NET\Framework\v4.0.30319\SOS.dll >!dumpheap -stat ..... 61f87928 2292 34012 System.RuntimeType[] 5d2dbe74 267 34176 System.Data.DataColumn 61fd75e0 668 37408 System.Reflection.RuntimePropertyInfo 61f8426c 702 48976 System.Int32[] 5d2dcc24 70 72520 System.Data.RBTree`1+Node[[System.Data.DataRow, System.Data]][] 61f883e4 1242 84456 System.Reflection.RuntimeParameterInfo 61f8839c 2045 89980 System.Signature 0a7566bc 596 92976 HG.MacamUnit.Entity.TSubSysNodes 61f82788 723 117736 System.Object[] 61f89850 8 131696 System.Int64[] 61fd8938 2792 167520 System.Reflection.RuntimeMethodInfo 007988d0 220 434392 Free 61f824e4 12187 738904 System.String 61f85c40 2138 743067 System.Byte[] 61f82c60 294 6629796 System.Char[] Total 55014 objects
80.dmp
>.load C:\Windows\Microsoft.NET\Framework\v4.0.30319\SOS.dll
>!dumpheap -stat
.....
61f83698      876        24528 System.RuntimeType
61f84ec0      159        26472 System.Collections.Hashtable+bucket[]
61fc9020      631        27764 System.Reflection.RtFieldInfo
61f95be8       46        28392 System.Reflection.Emit.__FixupData[]
61f87928     2292        34012 System.RuntimeType[]
61fd75e0      668        37408 System.Reflection.RuntimePropertyInfo
5d2dcc24       42        43512 System.Data.RBTree`1+Node[[System.Data.DataRow, System.Data]][]
61f8426c      595        45868 System.Int32[]
61f883e4     1242        84456 System.Reflection.RuntimeParameterInfo
61f8839c     2045        89980 System.Signature
61f82788      622       113684 System.Object[]
61f89850        8       131696 System.Int64[]
61fd8938     2769       166140 System.Reflection.RuntimeMethodInfo
61f824e4     9800       676596 System.String
61f85c40     2064       705655 System.Byte[]
61f82c60      195      2369402 System.Char[]
007988d0      114      3338792      Free
Total 47306 objects  

着重分析(红色部分)这两个文件的内存分配情况,似乎差别不大,完全看不出来80-33=近50M的内存消耗在哪里;

但认真思考一下,这样好像也没有问题,因为System.***这种类型是C#环境独有的,已知C#没有内存泄漏,所以这里没有体现应该是正常的;

那C语言接口文件里边的问题该如何找出来呢?

2、再来试试!heap -s,查看各种堆的内存提交数据量

33.dmp

0:047> !heap -s
LFH Key                   : 0x343fce0b
Termination on corruption : ENABLED
  Heap     Flags   Reserv  Commit  Virt   Free  List   UCR  Virt  Lock  Fast
                    (k)     (k)    (k)     (k) length      blocks cont. heap
-----------------------------------------------------------------------------
00780000 00000002    8192   4636   8192    209  2484     4    0      e   LFH
002e0000 00001002     256      4    256      2     1     1    0      0
00280000 00001002    1088     72   1088      5     2     2    0      0
00c70000 00041002     256      4    256      2     1     1    0      0
002d0000 00001002    1088    132   1088      8    23     2    0      0
00450000 00001002     256      4    256      0     1     1    0      0
07230000 00041002     256      4    256      2     1     1    0      0
00c10000 00001002     256    216    256      3    39     1    0      0   LFH
09b50000 00001002     256     80    256     39    28     1    0      0
09d00000 00001002      64      4     64      2     1     1    0      0
09ef0000 00001002    1088     72   1088      6     2     2    0      0
004c0000 00001002    1088    192   1088     15   140     2    0      0
09760000 00041002     256     28    256      4     4     1    0      0
09ed0000 00001002      64     12     64      1     1     1    0      0
0b210000 00001002    3136   1456   3136     52    84     3    0      0   LFH
0a700000 00001002     256    212    256      2     1     1    0      0
0e1e0000 00011002     256      4    256      0     1     1    0      0
0d030000 00001002     256     16    256      3     1     1    0      0
11b30000 00001002    1088    388   1088      0     1     2    0      0
-----------------------------------------------------------------------------
80.dmp

0:051> !heap -s LFH Key : 0x343fce0b Termination on corruption : ENABLED Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast (k) (k) (k) (k) length blocks cont. heap ----------------------------------------------------------------------------- 00780000 00000002 8192 4808 8192 225 2505 4 0 f1 LFH 002e0000 00001002 256 4 256 2 1 1 0 0 00280000 00001002 1088 132 1088 4 6 2 0 0 00c70000 00041002 256 4 256 2 1 1 0 0 002d0000 00001002 1088 168 1088 12 26 2 0 0 00450000 00001002 256 4 256 0 1 1 0 0 07230000 00041002 256 4 256 2 1 1 0 0 00c10000 00001002 256 228 256 26 69 1 0 0 LFH 09b50000 00001002 256 80 256 39 25 1 0 0 09d00000 00001002 64 4 64 2 1 1 0 0 09ef0000 00001002 1088 132 1088 6 5 2 0 0 004c0000 00001002 1088 220 1088 26 173 2 0 0 09760000 00041002 256 28 256 4 8 1 0 0 09ed0000 00001002 64 12 64 1 1 1 0 0 0b210000 00001002 3136 1456 3136 74 71 3 0 0 LFH 0a700000 00001002 256 212 256 2 1 1 0 0 0e1e0000 00011002 256 4 256 0 1 1 0 0 0d030000 00001002 256 16 256 1 1 1 0 0 11b30000 00001002 47808 46068 47808 396 6836 7 0 0 -----------------------------------------------------------------------------

这次有异常了,可以看到11b30000这一行内存提交变化很大 47808 - 1088 = 46720;

这次可以肯定问题就在这个堆里边;

3、进去看看11b30000,使用命令:!heap -stat -h 11b30000

80.dmp


0:051> !heap -stat -h 11b30000
 heap @ 11b30000
group-by: TOTSIZE max-display: 20
    size     #blocks     total     ( %) (percent of total busy bytes)
    1f0 102d9 - 1f58470  (92.48)
    18 102b0 - 184080  (4.47)
    10 102ae - 102ae0  (2.98)
    214 13 - 277c  (0.03)
    1000 2 - 2000  (0.02)
    800 2 - 1000  (0.01)
    220 1 - 220  (0.00)
    1d7 1 - 1d7  (0.00)
    80 3 - 180  (0.00)
    a4 1 - a4  (0.00)
    24 4 - 90  (0.00)
    14 4 - 50  (0.00)
    4a 1 - 4a  (0.00)
    25 2 - 4a  (0.00)
    48 1 - 48  (0.00)
    46 1 - 46  (0.00)
    41 1 - 41  (0.00)
    3e 1 - 3e  (0.00)
    3c 1 - 3c  (0.00)
    37 1 - 37  (0.00)

可以看到前面3项几乎占据99%的内存提交记录;尤其以内存块大小为1f0的数据块使用最多内存;

到目前为止,我们知道了几项有效信息,有大小分别为1f0、18、10的三种数据块,不断申请出新空间;

但是这样还不够,根据一个内存块的大小并不能准确定位是哪里出了问题,这是一个结构体?还是字符串?还是数组?

都不知道,所以有必要进去看看,有哪些地方使用到了这些数据块

4、查看使用了1f0数据块大小的位置列表,使用命令:!heap -flt s [size]

80.dmp

0:051> !heap -flt s 1f0
    _DPH_HEAP_ROOT @ 5a1000
    Freed and decommitted blocks
      DPH_HEAP_BLOCK : VirtAddr VirtSize
    Busy allocations
      DPH_HEAP_BLOCK : UserAddr  UserSize - VirtAddr VirtSize
    _HEAP @ 780000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        0078e5b8 0045 0000  [00]   0078e5e0    001f0 - (busy)
    _DPH_HEAP_ROOT @ 9d11000
    Freed and decommitted blocks
      DPH_HEAP_BLOCK : VirtAddr VirtSize
    Busy allocations
      DPH_HEAP_BLOCK : UserAddr  UserSize - VirtAddr VirtSize
    _HEAP @ 4c0000
    _DPH_HEAP_ROOT @ af41000
    Freed and decommitted blocks
      DPH_HEAP_BLOCK : VirtAddr VirtSize
    Busy allocations
      DPH_HEAP_BLOCK : UserAddr  UserSize - VirtAddr VirtSize
    _HEAP @ b210000
        0cf61680 0045 0045  [00]   0cf616a8    001f0 - (busy)
    _DPH_HEAP_ROOT @ d871000
    Freed and decommitted blocks
      DPH_HEAP_BLOCK : VirtAddr VirtSize
    Busy allocations
      DPH_HEAP_BLOCK : UserAddr  UserSize - VirtAddr VirtSize
    _HEAP @ d030000
    _DPH_HEAP_ROOT @ 11631000
    Freed and decommitted blocks
      DPH_HEAP_BLOCK : VirtAddr VirtSize
    Busy allocations
      DPH_HEAP_BLOCK : UserAddr  UserSize - VirtAddr VirtSize
    _HEAP @ 11b30000
        11b312e8 0045 0045  [00]   11b31310    001f0 - (busy)
        11b315a8 0045 0045  [00]   11b315d0    001f0 - (busy)
        11b356f8 0045 0045  [00]   11b35720    001f0 - (busy)
        11b35920 0045 0045  [00]   11b35948    001f0 - (busy)
        11b36f30 0045 0045  [00]   11b36f58    001f0 - (busy)
        11b37b58 0045 0045  [00]   11b37b80    001f0 - (busy)
        11b37e18 0045 0045  [00]   11b37e40    001f0 - (busy)
        11b3e4f0 0045 0045  [00]   11b3e518    001f0 - (busy)
        11b3f570 0045 0045  [00]   11b3f598    001f0 - (busy)
        11b3f830 0045 0045  [00]   11b3f858    001f0 - (busy)
        11b3faf0 0045 0045  [00]   11b3fb18    001f0 - (busy)
        11b3fdb0 0046 0045  [00]   11b3fdd8    001f0 - (busy)
        12890578 0045 0046  [00]   128905a0    001f0 - (busy)
        ......  

可以看到有很多堆都有使用到1f0大小的内存块,但是只有最后一个堆 _DPH_HEAP_ROOT @ 11631000

是记录最多的,满屏都是,这里只能截断,选取一部分看看  

5、查看调用堆栈,使用命令:!heap -p -a [address]

80.dmp

0:051> !heap -p -a 11b3fdd8
    address 11b3fdd8 found in
    _HEAP @ 11b30000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        11b3fdb0 0046 0000  [00]   11b3fdd8    001f0 - (busy)
        Trace: 083a
        7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7
        74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e
        77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030
        77ccab8e ntdll!RtlpAllocateHeap+0x000000c4
        77c73461 ntdll!RtlAllocateHeap+0x0000023a
        664668e5 msvcr90!_calloc_impl+0x00000125
        66463c5a msvcr90!calloc+0x0000001a


0:051> !heap -p -a 11b3fdd8
    address 11b3fdd8 found in
    _HEAP @ 11b30000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        11b3fdb0 0046 0000  [00]   11b3fdd8    001f0 - (busy)
        Trace: 083a
        7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7
        74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e
        77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030
        77ccab8e ntdll!RtlpAllocateHeap+0x000000c4
        77c73461 ntdll!RtlAllocateHeap+0x0000023a
        664668e5 msvcr90!_calloc_impl+0x00000125
        66463c5a msvcr90!calloc+0x0000001a


0:051> !heap -p -a 11b3fb18
    address 11b3fb18 found in
    _HEAP @ 11b30000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        11b3faf0 0045 0000  [00]   11b3fb18    001f0 - (busy)
        Trace: 083a
        7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7
        74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e
        77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030
        77ccab8e ntdll!RtlpAllocateHeap+0x000000c4
        77c73461 ntdll!RtlAllocateHeap+0x0000023a
        664668e5 msvcr90!_calloc_impl+0x00000125
        66463c5a msvcr90!calloc+0x0000001a  

  随意挑选几个查看调用堆栈,似乎没有有用的特征信息,verifier、ntdll、msvcr90这些都是操作系统内核级别的函数;

并不能暴露出使用1f0大小的数据块大概位置,这就有点难办了,难道此路不通?如果不找到有效堆栈信息,想定位

内心泄漏点,靠单步调试会相当麻烦。。。

  不急,先看看,这些地方内存块内容是什么,说不定能找到一些有效特征信息;

使用命令:db [UserPtr]

80.dmp

0:051> db 11b3fb18
11b3fb18  00 00 04 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fb28  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fb38  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fb48  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fb58  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fb68  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fb78  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fb88  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
0:051> db 11b3fdd8
11b3fdd8  00 00 04 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fde8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fdf8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fe08  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fe18  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fe28  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fe38  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fe48  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
0:051> db 11b3fdd8
11b3fdd8  00 00 04 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fde8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fdf8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fe08  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fe18  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fe28  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fe38  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fe48  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................  

结果是令人失望的;

显示这些基本都是空白内存,里边已经没有任何有效信息,,

陷入死胡同里了,难道到此为止?

还不死心,我们再看看这些地址有没有引用跟,如果有引用跟,也可以打印堆栈信息

使用命令:!gcroot [UserPtr]

80.dmp

0:051> !gcroot 11b3fb18
Found 0 unique roots (run '!GCRoot -all' to see all roots).
0:051> !gcroot 11b3fdd8
Found 0 unique roots (run '!GCRoot -all' to see all roots).
0:051> !gcroot 11b3fdd8
Found 0 unique roots (run '!GCRoot -all' to see all roots).

 愿望是美好的,这个大小位1f0的数据块被申请了0x102d9次,使用!gcroot命令查看得到貌似都是无引用的野数据

我们再来看看,这个 _DPH_HEAP_ROOT @ 11631000堆的创建堆栈

80.dmp

0:051> dt ntdll!_DPH_HEAP_ROOT CreateStackTrace 11631000
   +0x0b8 CreateStackTrace : 0x04d54f8c _RTL_TRACE_BLOCK
0:051> dds 0x04d54f8c
04d54f8c  04d1b714
04d54f90  0000f801
04d54f94  000f0000
04d54f98  74058969 verifier!AVrfDebugPageHeapCreate+0x439
04d54f9c  77cbcea2 ntdll!RtlCreateHeap+0x41
04d54fa0  757356bc KERNELBASE!HeapCreate+0x50
04d54fa4  66463a4a msvcr90!_heap_init+0x1b
04d54fa8  66422bb4 msvcr90!__p__tzname+0x2a
04d54fac  66422d5e msvcr90!_CRTDLL_INIT+0x1e
04d54fb0  77c79264 ntdll!LdrpCallInitRoutine+0x14
04d54fb4  77c7fe97 ntdll!LdrpRunInitializeRoutines+0x26f
04d54fb8  77c7ea4e ntdll!LdrpLoadDll+0x472
04d54fbc  77cbd3df ntdll!LdrLoadDll+0xc7
04d54fc0  75732e6a KERNELBASE!LoadLibraryExW+0x233
04d54fc4  7562483c kernel32!LoadLibraryW+0x11
04d54fc8  6d3d18de*** WARNING: Unable to verify checksum for Win32Project1.dll
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for Win32Project1.dll -
 Win32Project1+0x18de
04d54fcc  6d3d28fc Win32Project1!BACNet::Init+0x5c
04d54fd0  6d3d5925 Win32Project1!Init+0x25
04d54fd4  66639972*** WARNING: Unable to verify checksum for SMDB.dll
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for SMDB.dll -
 SMDB!LogPop+0x12
04d54fd8  66639452 SMDB!CreateSharedMemory+0x12
04d54fdc  6d8e47bd clrjit!Compiler::impImportBlockCode+0x2aac [f:\dd\ndp\clr\src\jit32\importer.cpp @ 10258]
04d54fe0  6d8c2e6b clrjit!Compiler::impImportBlock+0x5f [f:\dd\ndp\clr\src\jit32\importer.cpp @ 13246]
04d54fe4  6d8c306a clrjit!Compiler::impImport+0x235 [f:\dd\ndp\clr\src\jit32\importer.cpp @ 14195]
04d54fe8  6d8c364f clrjit!Compiler::compCompile+0x62 [f:\dd\ndp\clr\src\jit32\compiler.cpp @ 2491]
04d54fec  6d8c4276 clrjit!Compiler::compCompileHelper+0x32f [f:\dd\ndp\clr\src\jit32\compiler.cpp @ 3615]
04d54ff0  6d8c43fc clrjit!Compiler::compCompile+0x2ab [f:\dd\ndp\clr\src\jit32\compiler.cpp @ 3086]
04d54ff4  6d8c45c8 clrjit!jitNativeCode+0x1f6 [f:\dd\ndp\clr\src\jit32\compiler.cpp @ 4057]
04d54ff8  6d8c377d clrjit!CILJit::compileMethod+0x7d [f:\dd\ndp\clr\src\jit32\ee_il_dll.cpp @ 180]
04d54ffc  633b39b3 clr!invokeCompileMethodHelper+0x10b
04d55000  633b3a8b clr!invokeCompileMethod+0x3d
04d55004  633b3ae8 clr!CallCompileMethodWithSEHWrapper+0x39
04d55008  633b3d97 clr!UnsafeJitFunction+0x431

动态库Win32Project1.dll是对MSTP_DLL动态库的再次封装可以确定不存在内存泄漏问题;

看到这个堆是在于硬件设备通信的时候,初始化时CLR创建的线程;

不过知道这个好像也没有什么用,因为我们本来就知道是BACnet协议通信的动态库有问题;

只能说明是初始化之后产生的内存泄漏;

但是为什么这些无跟指针没有被垃圾回收? 

但是仔细一想,好像也是正常,因为这些是可以明确的在C语言编写的动态库里申请的内存,属于不受托管的内存;

C#垃圾回收也只能回收托管内存,所以这部分数据不主动释放,那就会永远在那里;

但是现在,好像陷入死胡同了,找不到思路,既然如此就先放放,先看看其他两个数据块的调用情况;

6、!heap -flt s 18

80.dmp

> !heap -flt s 18
...
        16f45098 000a 000a  [00]   16f450c0    00018 - (busy)
        16f45358 000a 000a  [00]   16f45380    00018 - (busy)
        16f45618 000a 000a  [00]   16f45640    00018 - (busy)
        16f458d8 000a 000a  [00]   16f45900    00018 - (busy)
        16f45b98 000a 000a  [00]   16f45bc0    00018 - (busy)
        16f46080 000a 000a  [00]   16f460a8    00018 - (busy)
        16f46118 000a 000a  [00]   16f46140    00018 - (busy)
        16f461b0 000a 000a  [00]   16f461d8    00018 - (busy)
        16f46248 000a 000a  [00]   16f46270    00018 - (busy)
        16f462e0 000a 000a  [00]   16f46308    00018 - (busy)
        16f46378 000a 000a  [00]   16f463a0    00018 - (busy)
        16f46410 000a 000a  [00]   16f46438    00018 - (busy)
        16f464a8 000b 000a  [00]   16f464d0    00018 - (busy)
        16f46548 000a 000b  [00]   16f46570    00018 - (busy)
        16f46808 000a 000a  [00]   16f46830    00018 - (busy)
        16f46ac8 000a 000a  [00]   16f46af0    00018 - (busy)
        16f46d88 000a 000a  [00]   16f46db0    00018 - (busy)
        16f47048 000a 000a  [00]   16f47070    00018 - (busy)
        16f47308 000a 000a  [00]   16f47330    00018 - (busy)
...

7、随意挑几个看看,命令:!heap -p -a [UserPtr]  

80.dmp

0:051> !heap -p -a
invalid address  passed to `-p -a'0:051> !heap -p -a 16f460a8
    address 16f460a8 found in
    _HEAP @ 11b30000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        16f46080 000a 0000  [00]   16f460a8    00018 - (busy)
        Trace: 074b
        7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7
        74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e
        77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030
        77ccab8e ntdll!RtlpAllocateHeap+0x000000c4
        77c73461 ntdll!RtlAllocateHeap+0x0000023a
        664668e5 msvcr90!_calloc_impl+0x00000125
        66463c5a msvcr90!calloc+0x0000001a
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for MSTP_DLL.dll -
        669baea1 MSTP_DLL!MSTP_Get_RPM_ACK_Data+0x00000091


0:051> !heap -p -a 16f46570
    address 16f46570 found in
    _HEAP @ 11b30000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        16f46548 000a 0000  [00]   16f46570    00018 - (busy)
        7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7
        74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e
        77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030
        77ccab8e ntdll!RtlpAllocateHeap+0x000000c4
        77c73461 ntdll!RtlAllocateHeap+0x0000023a
        664668e5 msvcr90!_calloc_impl+0x00000125
        66463c5a msvcr90!calloc+0x0000001a
        669baea1 MSTP_DLL!MSTP_Get_RPM_ACK_Data+0x00000091


0:051> !heap -p -a 16f46308
    address 16f46308 found in
    _HEAP @ 11b30000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        16f462e0 000a 0000  [00]   16f46308    00018 - (busy)
        Trace: 074b
        7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7
        74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e
        77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030
        77ccab8e ntdll!RtlpAllocateHeap+0x000000c4
        77c73461 ntdll!RtlAllocateHeap+0x0000023a
        664668e5 msvcr90!_calloc_impl+0x00000125
        66463c5a msvcr90!calloc+0x0000001a
        669baea1 MSTP_DLL!MSTP_Get_RPM_ACK_Data+0x00000091  

这次很顺利,这个内存使用的地方实在MSTP_DLL的 MSTP_Get_RPM_ACK_Data里边;这个就是我们要找的最终的内存泄漏点信息;

同样操作堆10大小的数据块操作一遍

80.dmp

> !heap -flt s 10
...
        15359fa0 0009 0009  [00]   15359fc8    00010 - (busy)
        1535a2a0 0009 0009  [00]   1535a2c8    00010 - (busy)
        1535a560 0009 0009  [00]   1535a588    00010 - (busy)
        1535aee8 0009 0009  [00]   1535af10    00010 - (busy)
        1535af80 0009 0009  [00]   1535afa8    00010 - (busy)
        1535b018 0009 0009  [00]   1535b040    00010 - (busy)
        1535b360 0009 0009  [00]   1535b388    00010 - (busy)
        1535b620 0009 0009  [00]   1535b648    00010 - (busy)
        1535c420 0009 0009  [00]   1535c448    00010 - (busy)
        1535d220 0009 0009  [00]   1535d248    00010 - (busy)
        1535d4e0 0009 0009  [00]   1535d508    00010 - (busy)
        1535d7a0 0009 0009  [00]   1535d7c8    00010 - (busy)
        1535da60 0009 0009  [00]   1535da88    00010 - (busy)
        1535dd20 0009 0009  [00]   1535dd48    00010 - (busy)
        1535dfe0 0009 0009  [00]   1535e008    00010 - (busy)
        1535e2a0 0009 0009  [00]   1535e2c8    00010 - (busy)
        1535e560 0009 0009  [00]   1535e588    00010 - (busy)
        1535e820 0009 0009  [00]   1535e848    00010 - (busy)
        1535eae0 0009 0009  [00]   1535eb08    00010 - (busy)
        1535eda0 0009 0009  [00]   1535edc8    00010 - (busy)
        1535f060 0009 0009  [00]   1535f088    00010 - (busy)
        1535f320 0009 0009  [00]   1535f348    00010 - (busy)
        1535f5e0 0009 0009  [00]   1535f608    00010 - (busy)
...
80.dmp

0:051> !heap -p -a 1535eb08
    address 1535eb08 found in
    _HEAP @ 11b30000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        1535eae0 0009 0000  [00]   1535eb08    00010 - (busy)
        Trace: 0817
        7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7
        74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e
        77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030
        77ccab8e ntdll!RtlpAllocateHeap+0x000000c4
        77c73461 ntdll!RtlAllocateHeap+0x0000023a
        664668e5 msvcr90!_calloc_impl+0x00000125
        66463c5a msvcr90!calloc+0x0000001a
        669bb07b MSTP_DLL!MSTP_Get_RP_ACK_Data+0x0000003b


0:051> !heap -p -a 1535f088
    address 1535f088 found in
    _HEAP @ 11b30000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        1535f060 0009 0000  [00]   1535f088    00010 - (busy)
        Trace: 0817
        7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7
        74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e
        77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030
        77ccab8e ntdll!RtlpAllocateHeap+0x000000c4
        77c73461 ntdll!RtlAllocateHeap+0x0000023a
        664668e5 msvcr90!_calloc_impl+0x00000125
        66463c5a msvcr90!calloc+0x0000001a
        669bb07b MSTP_DLL!MSTP_Get_RP_ACK_Data+0x0000003b


0:051> !heap -p -a 1535f348
    address 1535f348 found in
    _HEAP @ 11b30000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        1535f320 0009 0000  [00]   1535f348    00010 - (busy)
        Trace: 0817
        7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7
        74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e
        77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030
        77ccab8e ntdll!RtlpAllocateHeap+0x000000c4
        77c73461 ntdll!RtlAllocateHeap+0x0000023a
        664668e5 msvcr90!_calloc_impl+0x00000125
        66463c5a msvcr90!calloc+0x0000001a
        669bb07b MSTP_DLL!MSTP_Get_RP_ACK_Data+0x0000003b  

这次也顺利拿到另一个内存泄漏的位置信息在MSTP_DLL的 MSTP_Get_RP_ACK_Data里边;  

MSTP_Get_RP_ACK_Data

MSTP_Get_RPM_ACK_Data

这两个方法其实是读取模块点数值或者收集模块信息的时候返回的一个数据指针;

现在很明显这两个方法返回的指针可能是有问题的,里边非常大的可能存在内存泄漏;

7、验证

跟同事找到原来的MSTP_DLL的源码,找到以上两个方法体

记一次使用windbg排查内存泄漏的过程-LMLPHP

 可以看到当初那位同事设计这个方法的时候,很明显有2个错误;

1)返回的指针只见声明内存空间,不见释放;

2)返回数据的指针不应该在方法体中的返回值中传出来,应该写在方法参数中,外部声明,传进去赋值,然后外部使用,再外部释放

3)两个方法体都一样的问题

五、整理

1)我们知道有三处内存泄漏,分别大小是1f0、18、10 

2)三者占据99%的新增不释放的内存消耗

3)我们已经找到其中两个泄漏位置,还剩下一个

4)1f0是重中之重,占据内存消耗92%,不解决这个BUG,问题基本就相当于没解决

5)无法找到1f0的调用堆栈信息,无明显特征信息,无引用跟;

5)emmmmm? (第二声)

好像被我们错过了一个信息, 

是否还记得最开始那一段?

80.dmp

0:051> !heap -stat -h 11b30000
 heap @ 11b30000
group-by: TOTSIZE max-display: 20
    size     #blocks     total     ( %) (percent of total busy bytes)
    1f0 102d9 - 1f58470  (92.48)
    18 102b0 - 184080  (4.47)
    10 102ae - 102ae0  (2.98)  

 这几个数据很接近,都是申请次数大小,也就是说着三个数据块被申请的次数差不多。。

鉴于此,我们再去看看33M内存的时候这几个次数的值是多少

33.dmp

0:047> !heap -s
LFH Key                   : 0x343fce0b
Termination on corruption : ENABLED
  Heap     Flags   Reserv  Commit  Virt   Free  List   UCR  Virt  Lock  Fast
                    (k)     (k)    (k)     (k) length      blocks cont. heap
-----------------------------------------------------------------------------
00780000 00000002    8192   4636   8192    209  2484     4    0      e   LFH
002e0000 00001002     256      4    256      2     1     1    0      0
00280000 00001002    1088     72   1088      5     2     2    0      0
00c70000 00041002     256      4    256      2     1     1    0      0
002d0000 00001002    1088    132   1088      8    23     2    0      0
00450000 00001002     256      4    256      0     1     1    0      0
07230000 00041002     256      4    256      2     1     1    0      0
00c10000 00001002     256    216    256      3    39     1    0      0   LFH
09b50000 00001002     256     80    256     39    28     1    0      0
09d00000 00001002      64      4     64      2     1     1    0      0
09ef0000 00001002    1088     72   1088      6     2     2    0      0
004c0000 00001002    1088    192   1088     15   140     2    0      0
09760000 00041002     256     28    256      4     4     1    0      0
09ed0000 00001002      64     12     64      1     1     1    0      0
0b210000 00001002    3136   1456   3136     52    84     3    0      0   LFH
0a700000 00001002     256    212    256      2     1     1    0      0
0e1e0000 00011002     256      4    256      0     1     1    0      0
0d030000 00001002     256     16    256      3     1     1    0      0
11b30000 00001002    1088    388   1088      0     1     2    0      0
-----------------------------------------------------------------------------
0:047> !heap -stat -h 11b30000
 heap @ 11b30000
group-by: TOTSIZE max-display: 20
    size     #blocks     total     ( %) (percent of total busy bytes)
    1f0 1f2 - 3c4e0  (86.13)
    18 1c9 - 2ad8  (3.82)
    1000 2 - 2000  (2.86)
    10 1c7 - 1c70  (2.54)
    214 c - 18f0  (2.23)
    800 2 - 1000  (1.43)
    220 1 - 220  (0.19)
    1d7 1 - 1d7  (0.16)
    80 3 - 180  (0.13)
    a4 1 - a4  (0.06)
    24 4 - 90  (0.05)
    14 4 - 50  (0.03)
    4a 1 - 4a  (0.03)
    25 2 - 4a  (0.03)
    48 1 - 48  (0.03)
    46 1 - 46  (0.02)
    41 1 - 41  (0.02)
    3e 1 - 3e  (0.02)
    3c 1 - 3c  (0.02)
    37 1 - 37  (0.02)  

 分别是1f2、1c9、1c7;

1f0:102d9 - 1f2 = 65767

18:102b0 - 1c9 = 65767

10:102ae - 1c7 = 65767

居然申请的次数一模一样!

稳了!这个1f0可以断定与其他两个紧密相关;首先怀疑的就是

MSTP_Get_RP_ACK_Data

MSTP_Get_RPM_ACK_Data

1)这两个方法体中使用到的所有子方法体有没有申请空间的语句;

2)申请的空间大小是不是就是1f0;

依据上面的推测,再次阅读那2个方法体;

记一次使用windbg排查内存泄漏的过程-LMLPHP

 记一次使用windbg排查内存泄漏的过程-LMLPHP

经过分析BACNET_APPLICATION_DATA_VALUE结构体大小刚好就是1f0 

好了,搞定

如果对你有帮助,请点赞、评论;  

 

06-02 01:28