Windbg内核调试之五: 一次利用Dump文件调试Deadlock的实战
时间:2010-09-01 来源:Da Vinci
最近遇到项目的一个bug,安装产品之后系统hang住。大致的现象是系统logon之后,Taskbar显示,然后hang住,Ctrl+Alt+Del不能调出TaskManager,Mouse不能移动,硬盘LED不闪烁。由于机器不在手边,远程调试又极其慢(实际上是国外的机器),所以只好请求对方手动触发一个蓝屏(有关手动触发蓝屏的方式,请参考:http://msdn.microsoft.com/en-us/library/cc266483.aspx),得到一个完全dump文件(有关完全dump文件的设置,请参考:Windbg内核调试之四: Dump文件分析),利用该dump文件分析系统hang住时刻的状态。
注:这个事例中,利用Ctrl+Scroll Lock+Scroll Lock手动触发蓝屏,此键盘消息可以响应。
祭起windbg,Open Crash Dump:
Symbol search path is: srv*c:\symbols*http://msdl.microsoft.com/download/symbols;E:\Win7.Symbols
Executable search path is:
Windows 7 Kernel Version 7201 MP (2 procs) Free x86 compatible
Product: WinNt, suite: TerminalServer SingleUserTS Personal
Built by: 7201.0.x86fre.winmain_win7ids.090601-1516
Kernel base = 0x8363e000 PsLoadedModuleList = 0x83786810
Debug session time: Thu Aug 6 17:04:40.745 2009 (GMT+8)
System Uptime: 0 days 0:09:38.352
Loading Kernel Symbols............................................................................................................
Loading User Symbols
Loading unloaded module list.....................................................................................................
系统是Window 7的RC版本7201,运行在x86系统上。先analyze –v一下,看看调试器给定的信息是否可用:
0: kd> !analyze -v
******************************************************************************
*
* Bugcheck Analysis
*
******************************************************************************
MANUALLY_INITIATED_CRASH (e2)
The user manually initiated this crash dump.
Arguments:
Arg1: 00000000
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000
Debugging Details:
------------------
DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT
BUGCHECK_STR: 0xE2
PROCESS_NAME: System
CURRENT_IRQL: 2
LAST_CONTROL_TRANSFER: from 9066a1aa to 8371ac88
STACK_TEXT:
837649e4 9066a1aa 000000e2 00000000 00000000 nt!KeBugCheckEx+0x1e
83764a0c 9066a2ba 88a60b48 000000c6 00000000 kbdhid!KbdHidProcessCrashDump+0x1e2
83764a30 8de20d50 00000001 83764a4c 000000c6 kbdhid!KbdHid_InsertCodesIntoQueue+0x88
83764a44 8de20edd 000000c6 00000000 9066a232 HIDPARSE!HidP_KbdPutKey+0x30
83764a68 8de20dbd 8de216a8 00000009 9066a232 HIDPARSE!HidP_ModifierCode+0x81
83764a8c 8de210c0 00000047 00000000 88a56890 HIDPARSE!HidP_TranslateUsage+0x61
83764abc 9066a50f 88a5692d 0000000e 00000000 HIDPARSE!HidP_TranslateUsageAndPagesToI8042ScanCodes+0x66
83764aec 836a6ab3 00000000 014b49d0 00000000 kbdhid!KbdHid_ReadComplete+0x18d
83764b34 8de0e424 83764b3c 83764b3c 88883020 nt!IopfCompleteRequest+0x128
83764b50 8de0e5df 02883008 88a28508 00000009 HIDCLASS!HidpDistributeInterruptReport+0xce
83764b8c 836a6ab3 00000000 8838c060 88a56de4 HIDCLASS!HidpInterruptReadComplete+0x1a7
83764bd0 92753868 8367e474 8902a6e0 00000000 nt!IopfCompleteRequest+0x128
83764c00 92754178 89147d70 8838c060 88c82b28 USBPORT!USBPORT_Core_iCompleteDoneTransfer+0x6e0
83764c2c 927579af 8683c028 8683c0f0 8683ca98 USBPORT!USBPORT_Core_iIrpCsqCompleteDoneTransfer+0x33b
83764c54 92751d18 8683c028 8683ca98 8683c002 USBPORT!USBPORT_Core_UsbIocDpc_Worker+0xbc
83764c78 836a6335 8683caa4 8683c002 00000000 USBPORT!USBPORT_Xdpc_Worker+0x173
83764cd4 836a6198 83767d20 83771280 00000000 nt!KiExecuteAllDpcs+0xf9
83764d20 836a5fb8 00000000 0000000e 00000000 nt!KiRetireDpcList+0xd5
83764d24 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x38
STACK_COMMAND: kb
FOLLOWUP_IP:
kbdhid!KbdHidProcessCrashDump+1e2
9066a1aa 837d1000 cmp dword ptr [ebp+10h],0
SYMBOL_STACK_INDEX: 1
SYMBOL_NAME: kbdhid!KbdHidProcessCrashDump+1e2
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: kbdhid
IMAGE_NAME: kbdhid.sys
DEBUG_FLR_IMAGE_TIMESTAMP: 4a2487af
FAILURE_BUCKET_ID: 0xE2_kbdhid!KbdHidProcessCrashDump+1e2
BUCKET_ID: 0xE2_kbdhid!KbdHidProcessCrashDump+1e2
Followup: MachineOwner
---------
给出的信息似乎并不充分,只是针对目标系统蓝屏的一些信息,并不是想要的系统如果hang住的分析。看看进程列表:
0: kd> !process 0 0
**** NT ACTIVE PROCESS DUMP ****
PROCESS 85f22920 SessionId: none Cid: 0004 Peb: 00000000 ParentCid: 0000
DirBase: 00185000 ObjectTable: 8cc01c58 HandleCount: 2369.
Image: System
PROCESS 86c568a0 SessionId: none Cid: 0178 Peb: 7ffdf000 ParentCid: 0004
DirBase: b9c6d020 ObjectTable: 8cd97f30 HandleCount: 31.
Image: smss.exe
PROCESS 884de8f0 SessionId: 0 Cid: 0210 Peb: 7ffdd000 ParentCid: 01c4
DirBase: b9c6d060 ObjectTable: 8cc0ab68 HandleCount: 721.
Image: csrss.exe
............
进程太多就没全列出来。死锁有两种情况,一种是系统死锁,像本例一样,系统没有任何响应,这种死锁多由内核驱动引起,调试相对困难,也不易找到到底是哪个进程中的线程引发死锁,哪个进程中的线程持有资源;另一种是应用程序引发的死锁,这种情况系统多数是可以响应的,鼠标或桌面,只是某个应用程序hang住,也许是TaskManager,也许是其它应用程序。这种问题相对系统级死锁要简单的多,只要attach到hang住的应用程序上,一般都可排查出问题所在,多数情况下都是应用程序进程中的某个线程持有Critical Section,引起其它线程等待资源;另一种情况是并不是多个线程的相互等待,而是线程自旋无法继续执行。所以,判断应用程序的挂死,最有效的方法就是观察CPU的使用率(Ctrl+Alt+Del),看是否达到峰值或为0。
系统级死锁不容易找到引起问题的进程,需要关注的System、Csrss.exe、Service.exe这样比较重要的进程;应用程序死锁则重点关注applicaton本身就可以了。
本例明显是系统级的死锁,基本可以肯定是第三方驱动程序导致。进程列表是看不出什么可疑的原因的。既然怀疑是死锁的问题,运行一下!locks(!kdext*.locks,Kernal级)看看当前系统持有的锁:
0: kd> !locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks.
Resource @ nt!PnpRegistryDeviceResource (0x837a3e80) Exclusively owned
Contention Count = 48
NumberOfExclusiveWaiters = 3
Threads: 85f55688-01<*>
Threads Waiting On Exclusive Access:
85f59798 85f59a70 8893cd48
KD: Scanning for held locks..................
Resource @ 0x87b286dc Shared 12 owning threads
Threads: 89484568-02<*> 88c5d030-03<*> 88c5a7e0-02<*> 88c5b030-02<*>
88c5cd48-02<*> 88c5c2f8-02<*> 88f858f8-02<*> 88df7030-02<*>
8932ab08-02<*> 889e4d48-02<*> 88afa3e8-02<*> 8936bd48-02<*>
KD: Scanning for held locks........................................................................................................
17099 total locks, 2 locks currently held
当前两个锁被持有,其中一个为Exclusively owned,另一个是共享锁。第一个锁为Exclusively owned访问,即专有访问,这种锁往往是引起死锁问题的原因。死锁的一种情况是由于系统正在执行的一个线程需要一个锁,而这个锁被当前系统未执行的某个线程所持有。系统中大部分是共享锁。
注意到第一个锁当前持有的线程为85f55688-01<*> (-01<*>代表当前持有的进程),等待队列中有三个线程在等待这个锁;第二个锁为共享锁,所有线程列表中的thread都持有它。
这两个锁实际上是两个resource的形式给出,通过命令看一下这两个resouce的结构:
0: kd> dt nt!_ERESOURCE -b 0x837a3e80
+0x000 SystemResourcesList : _LIST_ENTRY [ 0x8377f3c0 - 0x8377eac0 ]
+0x000 Flink : 0x8377f3c0
+0x004 Blink : 0x8377eac0
+0x008 OwnerTable : 0x87bfb068
+0x00c ActiveCount : 1
+0x00e Flag : 0x84
+0x010 SharedWaiters : 0x87db2088
+0x014 ExclusiveWaiters : 0x87db20c0
+0x018 OwnerEntry : _OWNER_ENTRY
+0x000 OwnerThread : 0x85f55688
+0x004 IoPriorityBoosted : 0y0
+0x004 OwnerReferenced : 0y0
+0x004 OwnerCount : 0y000000000000000000000000000001 (0x1)
+0x004 TableSize : 4
+0x020 ActiveEntries : 1
+0x024 ContentionCount : 0x30
+0x028 NumberOfSharedWaiters : 0
+0x02c NumberOfExclusiveWaiters : 3
+0x030 Address : (null)
+0x030 CreatorBackTraceIndex : 0
+0x034 SpinLock : 0
0: kd> dt nt!_ERESOURCE -b 0x87b286dc
+0x000 SystemResourcesList : _LIST_ENTRY [ 0x87b286a4 - 0x87b11234 ]
+0x000 Flink : 0x87b286a4
+0x004 Blink : 0x87b11234
+0x008 OwnerTable : 0x892d0130
+0x00c ActiveCount : 1
+0x00e Flag : 0
+0x010 SharedWaiters : (null)
+0x014 ExclusiveWaiters : (null)
+0x018 OwnerEntry : _OWNER_ENTRY
+0x000 OwnerThread : 0x89484568
+0x004 IoPriorityBoosted : 0y0
+0x004 OwnerReferenced : 0y0
+0x004 OwnerCount : 0y000000000000000000000000000010 (0x2)
+0x004 TableSize : 8
+0x020 ActiveEntries : 0xc
+0x024 ContentionCount : 0
+0x028 NumberOfSharedWaiters : 0
+0x02c NumberOfExclusiveWaiters : 0
+0x030 Address : (null)
+0x030 CreatorBackTraceIndex : 0
+0x034 SpinLock : 0
能看到ERESOURCE结构的OwnerThread域正是上述分析的持有锁的线程。ERESOURCE结构定义大致如下:
typedef struct _ERESOURCE {
LIST_ENTRY SystemResourcesList;
POWNER_ENTRY OwnerTable;
SHORT ActiveCount;
USHORT Flag;
PKSEMAPHORE SharedWaiters;
PKEVENT ExclusiveWaiters;
OWNER_ENTRY OwnerThreads[2];
ULONG ContentionCount;
USHORT NumberOfSharedWaiters;
USHORT NumberOfExclusiveWaiters;
union{
PVOID Address;
ULONG_PTR CreatorBackTraceIndex;
};
KSPIN_LOCK SpinLock;
} ERESOURCE, *PERESOURCE;
现在比较可疑的是线程0x85f55688所持有的资源锁,列出这个线程的stack trace看看有没有hot spot:
0: kd> !thread 0x85f55688
THREAD 85f55688 Cid 0004.0024 Teb: 00000000 Win32Thread: 00000000 WAIT: (WrLpcReply) KernelMode Non-Alertable
85f558bc Semaphore Limit 0x1
Waiting for reply to ALPC Message 8e4a46b0 : queued at port 88c5cc18 : owned by process 88c09400
Not impersonating
DeviceMap 8cc08a00
Owning Process 0 Image: <Unknown>
Attached Process 85f22920 Image: System
Wait Start TickCount 6525 Ticks: 30548 (0:00:07:56.551)
Context Switch Count 9141
UserTime 00:00:00.000
KernelTime 00:00:00.546
Win32 Start Address nt!ExpWorkerThread (0x836abb1a)
Stack Init 8e907fd0 Current 8e907220 Base 8e908000 Limit 8e905000 Call 0
Priority 13 BasePriority 13 PriorityDecrement 0 IoPriority 2 PagePriority 5
ChildEBP RetAddr Args to Child
8e907238 836aca58 85f55688 00000000 83767d20 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])
8e907270 836ab113 85f55748 85f55688 85f558bc nt!KiSwapThread+0x266
8e907298 836a524f 85f55688 85f55748 00000000 nt!KiCommitThreadWait+0x1df
8e907310 836f4ad7 85f558bc 00000011 00000000 nt!KeWaitForSingleObject+0x393
8e907338 838b3bae 85f558bc 00000000 00000000 nt!AlpcpSignalAndWait+0x7b
8e90735c 838a9c2f 00000000 8e9073c8 00000000 nt!AlpcpReceiveSynchronousReply+0x27
8e9073ec 8389d28f 88c5a6b0 00020000 8e907628 nt!AlpcpProcessSynchronousRequest+0x276
8e907448 838b4ad7 88c5a6b0 8e907628 00000000 nt!LpcpRequestWaitReplyPort+0x6a
8e907470 8368157a 80000c54 8e907628 8e907528 nt!NtRequestWaitReplyPort+0x4c
8e907470 8368024d 80000c54 8e907628 8e907528 nt!KiFastCallEntry+0x12a (FPO: [0,3] TrapFrame @ 8e907484)
8e9074f4 98b63434 80000c54 8e907628 8e907528 nt!ZwRequestWaitReplyPort+0x11 (FPO: [3,0,0])
WARNING: Stack unwind information not available. Following frames may be wrong.
8e90772c 98b638a1 00000000 00000000 98b7d6ec tmcomm!CWorkerThreadPool::CWorkerThreadPool+0xa54
8e907744 98b9d97e 8e90775c 00000000 98ba6238 tmcomm!KmCallUm+0x75
8e907784 98b9ee09 9530af7c 162a554d 98ba6238 tmactmon+0x397e
8e9077c0 98b9ef4d 0000000a 80000002 99ccc490 tmactmon+0x4e09
8e9077ec 98b9efbc 8e907844 00000000 0000000a tmactmon+0x4f4d
8e907800 98b8c450 8e907844 00000000 8e907950 tmactmon+0x4fbc
8e907828 98b91f97 8e907802 8e9078a4 8e907950 tmevtmgr+0x1450
8e90789c 98b921d7 00000000 88be10a4 8e907950 tmevtmgr!TMEvtCommunicateRoutine+0x2569
8e9078b4 98b8c450 8e907900 88be0edc 8e9079c8 tmevtmgr!TMEvtCommunicateRoutine+0x27a9
8e9078dc 98b90d0d 8e907903 8e9079a0 8e907a0c tmevtmgr+0x1450
8e9079ac 98b8e7b0 8e9079c8 8e9079e4 98b8e944 tmevtmgr!TMEvtCommunicateRoutine+0x12df
8e9079b8 98b8e944 88be0f5c 8e9079c8 8e907acc tmevtmgr+0x37b0
8e9079e4 8368157a 88be0f5c 8e907acc 82000000 tmevtmgr+0x3944
8e9079e4 8367f069 88be0f5c 8e907acc 82000000 nt!KiFastCallEntry+0x12a (FPO: [0,3] TrapFrame @ 8e907a0c)
8e907a7c 838266b5 8e907acc 82000000 8e907ad8 nt!ZwCreateKey+0x11 (FPO: [7,0,0])
8e907af0 8380e477 8e907cb8 82000000 00000000 nt!IopCreateRegistryKeyEx+0x61
8e907c18 8380e0bd 85f4d530 00000002 82000000 nt!IopOpenOrCreateDeviceRegistryKey+0x3b1
8e907c30 8c07e594 85f4d530 00000002 82000000 nt!IoOpenDeviceRegistryKey+0x18
8e907cc8 8c016489 87dbb0e0 00000000 00000000 ndis!ndisCheckAdapterBindings+0xa7 (FPO: [Non-Fpo])
8e907cf0 8c07d013 87dbbc70 87dbb0e0 8e907d50 ndis!ndisQueuedRestoreMiniportstack+0x36 (FPO: [Non-Fpo])
8e907d00 836abc27 87dbbc70 00000000 85f55688 ndis!ndisWorkItemHandler+0xe (FPO: [Non-Fpo])
8e907d50 8384c643 00000000 a63db72a 00000000 nt!ExpWorkerThread+0x10d
8e907d90 836fe059 836abb1a 00000000 00000000 nt!PspSystemThreadStartup+0x9e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x19
上述stack trace没有看出什么可疑的状态。再看看另外一个线程0x89484568(持有共享锁):
0: kd> !thread 0x89484568
THREAD 89484568 Cid 0cd0.0f50 Teb: 7ffa6000 Win32Thread: ffb39180 WAIT: (Executive) KernelMode Non-Alertable
b44c6680 NotificationEvent
IRP List:
89493980: (0006,0244) Flags: 00000884 Mdl: 00000000
894d6db8: (0006,0244) Flags: 00000884 Mdl: 00000000
88c4e9a0: (0006,0244) Flags: 00000884 Mdl: 00000000
894d39a0: (0006,0244) Flags: 00000884 Mdl: 00000000
Not impersonating
DeviceMap 8e12ea70
Owning Process 89020508 Image: IMJPCMNT.EXE
Wait Start TickCount 4518 Ticks: 32555 (0:00:08:27.861)
Context Switch Count 330
UserTime 00:00:00.000
KernelTime 00:00:00.031
Win32 Start Address 0x697ee751
Stack Init b44c7fd0 Current b44c6550 Base b44c8000 Limit b44c5000 Call 6a0
Priority 10 BasePriority 8 PriorityDecrement 16 IoPriority 2 PagePriority 5
*** ERROR: Module load completed but symbols could not be loaded for flyfs.sys
ChildEBP RetAddr Args to Child
b44c6568 836aca58 89484568 00000000 807c2120 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])
b44c65a0 836ab113 89484628 89484568 b44c6680 nt!KiSwapThread+0x266
b44c65c8 836a524f 89484568 89484628 00000000 nt!KiCommitThreadWait+0x1df
b44c6640 8bd6054f b44c6680 00000000 00000000 nt!KeWaitForSingleObject+0x393
b44c66a8 8bd5c54a 8bd5c4aa 894d5c90 00000003 fltmgr!FltpPostSyncOperation+0x8d (FPO: [Non-Fpo])
b44c66cc 8bd59765 894d5c90 89435060 894d5c90 fltmgr!FltpGetNormalizedFileName+0x5e (FPO: [Non-Fpo])
b44c66e4 8bd43773 894d5c90 00000000 894d5c90 fltmgr!FltpCreateFileNameInformation+0x81 (FPO: [Non-Fpo])
b44c6704 8bd438c7 877273c0 00000000 00000000 fltmgr!HandleStreamListNotSupported+0x125 (FPO: [Non-Fpo])
b44c6734 8bd43fa3 c00000bb 89435068 87b287d8 fltmgr!FltpGetFileNameInformation+0xc7 (FPO: [Non-Fpo])
b44c675c 8bd97a1d 00435068 00000101 b3187e2c fltmgr!FltGetFileNameInformation+0x12b (FPO: [Non-Fpo])
b44c67d4 8bd892be 89435068 00000101 b3187e2c syscow32v!SysCowGetFileNameInformation+0xcd (FPO: [Non-Fpo]) (CONV: stdcall) [d:\code\engineering\products\systemok\restore\driver\support.cpp @ 2397]
b44c6868 8bd3daeb 89435068 b44c6888 b44c68b4 syscow32v!SysCowPreCreate+0x31e (FPO: [Non-Fpo]) (CONV: stdcall) [d:\code\engineering\products\systemok\restore\driver\create.cpp @ 299]
b44c68d4 8bd409f0 b44c6918 89493980 00000000 fltmgr!FltpPerformPreCallbacks+0x34d (FPO: [Non-Fpo])
b44c68ec 8bd541fe b44c6918 8bd57f3c 00000000 fltmgr!FltpPassThroughInternal+0x40 (FPO: [Non-Fpo])
b44c6900 8bd548b7 b44c6918 89493980 89493b98 fltmgr!FltpCreateInternal+0x24 (FPO: [Non-Fpo])
b44c6944 8367a567 87e5d668 87727008 89493bbc fltmgr!FltpCreate+0x2c9 (FPO: [Non-Fpo])
b44c695c 8bda7bd1 89493ba0 89493980 892be848 nt!IofCallDriver+0x63
WARNING: Stack unwind information not available. Following frames may be wrong.
b44c6980 8bda91ec 87e5d7b0 89493980 b44c6a30 flyfs+0x1bd1
b44c699c 8bda94f3 87e5d7b0 0000001b 894dab28 flyfs+0x31ec
b44c6a50 8bda9f60 87e5d7b0 89493980 b44c6a70 flyfs+0x34f3
b44c6a60 8bda9f8a 87e5d7b0 89493980 b44c6a88 flyfs+0x3f60
b44c6a70 8367a567 87e5d7b0 89493980 894dab84 flyfs+0x3f8a
b44c6a88 8387e5f5 9ce1a1da b44c6c30 00000000 nt!IofCallDriver+0x63
b44c6b60 8385f1a7 876f0678 87f6dde8 89403d20 nt!IopParseDevice+0xed7
b44c6bdc 83885215 00000000 b44c6c30 00000240 nt!ObpLookupObjectName+0x4fa
b44c6c3c 8387d573 b44c6dc0 85f6dde8 89484500 nt!ObOpenObjectByName+0x159
b44c6cb8 838b4d59 b44c6e04 00120089 b44c6dc0 nt!IopCreateFile+0x673
b44c6d14 83806c27 b44c6e04 00120089 b44c6dc0 nt!IoCreateFileEx+0x9e
b44c6d70 8bdac7c7 b44c6e04 00120089 b44c6dc0 nt!IoCreateFileSpecifyDeviceObjectHint+0x59
b44c6e10 8bda9593 0000004d b44c6ec0 87e5d7b0 flyfs+0x67c7
b44c6ec8 8bda9f60 87e5d7b0 894d6db8 b44c6ee8 flyfs+0x3593
b44c6ed8 8bda9f8a 87e5d7b0 894d6db8 b44c6f00 flyfs+0x3f60
b44c6ee8 8367a567 87e5d7b0 894d6db8 892e76ac flyfs+0x3f8a
b44c6f00 8387e5f5 9ce1a562 b44c70a8 00000000 nt!IofCallDriver+0x63
b44c6fd8 8385f1a7 876f0678 85f6dde8 894d8698 nt!IopParseDevice+0xed7
b44c7054 83885215 00000000 b44c70a8 00000240 nt!ObpLookupObjectName+0x4fa
b44c70b0 8387d573 b44c71d8 85f6dde8 00010000 nt!ObOpenObjectByName+0x159
b44c712c 838b4d59 894d5490 00100001 b44c71d8 nt!IopCreateFile+0x673
b44c7188 8bd5b8e5 894d5490 00100001 b44c71d8 nt!IoCreateFileEx+0x9e
b44c7220 8bd5bc59 00000000 00000000 0000fffe fltmgr!FltpExpandFilePathWorker+0x167 (FPO: [Non-Fpo])
IMJPCMNT.EXE是Microsoft输入法进程。
注:这个事例中,利用Ctrl+Scroll Lock+Scroll Lock手动触发蓝屏,此键盘消息可以响应。
祭起windbg,Open Crash Dump:
Symbol search path is: srv*c:\symbols*http://msdl.microsoft.com/download/symbols;E:\Win7.Symbols
Executable search path is:
Windows 7 Kernel Version 7201 MP (2 procs) Free x86 compatible
Product: WinNt, suite: TerminalServer SingleUserTS Personal
Built by: 7201.0.x86fre.winmain_win7ids.090601-1516
Kernel base = 0x8363e000 PsLoadedModuleList = 0x83786810
Debug session time: Thu Aug 6 17:04:40.745 2009 (GMT+8)
System Uptime: 0 days 0:09:38.352
Loading Kernel Symbols............................................................................................................
Loading User Symbols
Loading unloaded module list.....................................................................................................
系统是Window 7的RC版本7201,运行在x86系统上。先analyze –v一下,看看调试器给定的信息是否可用:
0: kd> !analyze -v
******************************************************************************
*
* Bugcheck Analysis
*
******************************************************************************
MANUALLY_INITIATED_CRASH (e2)
The user manually initiated this crash dump.
Arguments:
Arg1: 00000000
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000
Debugging Details:
------------------
DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT
BUGCHECK_STR: 0xE2
PROCESS_NAME: System
CURRENT_IRQL: 2
LAST_CONTROL_TRANSFER: from 9066a1aa to 8371ac88
STACK_TEXT:
837649e4 9066a1aa 000000e2 00000000 00000000 nt!KeBugCheckEx+0x1e
83764a0c 9066a2ba 88a60b48 000000c6 00000000 kbdhid!KbdHidProcessCrashDump+0x1e2
83764a30 8de20d50 00000001 83764a4c 000000c6 kbdhid!KbdHid_InsertCodesIntoQueue+0x88
83764a44 8de20edd 000000c6 00000000 9066a232 HIDPARSE!HidP_KbdPutKey+0x30
83764a68 8de20dbd 8de216a8 00000009 9066a232 HIDPARSE!HidP_ModifierCode+0x81
83764a8c 8de210c0 00000047 00000000 88a56890 HIDPARSE!HidP_TranslateUsage+0x61
83764abc 9066a50f 88a5692d 0000000e 00000000 HIDPARSE!HidP_TranslateUsageAndPagesToI8042ScanCodes+0x66
83764aec 836a6ab3 00000000 014b49d0 00000000 kbdhid!KbdHid_ReadComplete+0x18d
83764b34 8de0e424 83764b3c 83764b3c 88883020 nt!IopfCompleteRequest+0x128
83764b50 8de0e5df 02883008 88a28508 00000009 HIDCLASS!HidpDistributeInterruptReport+0xce
83764b8c 836a6ab3 00000000 8838c060 88a56de4 HIDCLASS!HidpInterruptReadComplete+0x1a7
83764bd0 92753868 8367e474 8902a6e0 00000000 nt!IopfCompleteRequest+0x128
83764c00 92754178 89147d70 8838c060 88c82b28 USBPORT!USBPORT_Core_iCompleteDoneTransfer+0x6e0
83764c2c 927579af 8683c028 8683c0f0 8683ca98 USBPORT!USBPORT_Core_iIrpCsqCompleteDoneTransfer+0x33b
83764c54 92751d18 8683c028 8683ca98 8683c002 USBPORT!USBPORT_Core_UsbIocDpc_Worker+0xbc
83764c78 836a6335 8683caa4 8683c002 00000000 USBPORT!USBPORT_Xdpc_Worker+0x173
83764cd4 836a6198 83767d20 83771280 00000000 nt!KiExecuteAllDpcs+0xf9
83764d20 836a5fb8 00000000 0000000e 00000000 nt!KiRetireDpcList+0xd5
83764d24 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x38
STACK_COMMAND: kb
FOLLOWUP_IP:
kbdhid!KbdHidProcessCrashDump+1e2
9066a1aa 837d1000 cmp dword ptr [ebp+10h],0
SYMBOL_STACK_INDEX: 1
SYMBOL_NAME: kbdhid!KbdHidProcessCrashDump+1e2
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: kbdhid
IMAGE_NAME: kbdhid.sys
DEBUG_FLR_IMAGE_TIMESTAMP: 4a2487af
FAILURE_BUCKET_ID: 0xE2_kbdhid!KbdHidProcessCrashDump+1e2
BUCKET_ID: 0xE2_kbdhid!KbdHidProcessCrashDump+1e2
Followup: MachineOwner
---------
给出的信息似乎并不充分,只是针对目标系统蓝屏的一些信息,并不是想要的系统如果hang住的分析。看看进程列表:
0: kd> !process 0 0
**** NT ACTIVE PROCESS DUMP ****
PROCESS 85f22920 SessionId: none Cid: 0004 Peb: 00000000 ParentCid: 0000
DirBase: 00185000 ObjectTable: 8cc01c58 HandleCount: 2369.
Image: System
PROCESS 86c568a0 SessionId: none Cid: 0178 Peb: 7ffdf000 ParentCid: 0004
DirBase: b9c6d020 ObjectTable: 8cd97f30 HandleCount: 31.
Image: smss.exe
PROCESS 884de8f0 SessionId: 0 Cid: 0210 Peb: 7ffdd000 ParentCid: 01c4
DirBase: b9c6d060 ObjectTable: 8cc0ab68 HandleCount: 721.
Image: csrss.exe
............
进程太多就没全列出来。死锁有两种情况,一种是系统死锁,像本例一样,系统没有任何响应,这种死锁多由内核驱动引起,调试相对困难,也不易找到到底是哪个进程中的线程引发死锁,哪个进程中的线程持有资源;另一种是应用程序引发的死锁,这种情况系统多数是可以响应的,鼠标或桌面,只是某个应用程序hang住,也许是TaskManager,也许是其它应用程序。这种问题相对系统级死锁要简单的多,只要attach到hang住的应用程序上,一般都可排查出问题所在,多数情况下都是应用程序进程中的某个线程持有Critical Section,引起其它线程等待资源;另一种情况是并不是多个线程的相互等待,而是线程自旋无法继续执行。所以,判断应用程序的挂死,最有效的方法就是观察CPU的使用率(Ctrl+Alt+Del),看是否达到峰值或为0。
系统级死锁不容易找到引起问题的进程,需要关注的System、Csrss.exe、Service.exe这样比较重要的进程;应用程序死锁则重点关注applicaton本身就可以了。
本例明显是系统级的死锁,基本可以肯定是第三方驱动程序导致。进程列表是看不出什么可疑的原因的。既然怀疑是死锁的问题,运行一下!locks(!kdext*.locks,Kernal级)看看当前系统持有的锁:
0: kd> !locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks.
Resource @ nt!PnpRegistryDeviceResource (0x837a3e80) Exclusively owned
Contention Count = 48
NumberOfExclusiveWaiters = 3
Threads: 85f55688-01<*>
Threads Waiting On Exclusive Access:
85f59798 85f59a70 8893cd48
KD: Scanning for held locks..................
Resource @ 0x87b286dc Shared 12 owning threads
Threads: 89484568-02<*> 88c5d030-03<*> 88c5a7e0-02<*> 88c5b030-02<*>
88c5cd48-02<*> 88c5c2f8-02<*> 88f858f8-02<*> 88df7030-02<*>
8932ab08-02<*> 889e4d48-02<*> 88afa3e8-02<*> 8936bd48-02<*>
KD: Scanning for held locks........................................................................................................
17099 total locks, 2 locks currently held
当前两个锁被持有,其中一个为Exclusively owned,另一个是共享锁。第一个锁为Exclusively owned访问,即专有访问,这种锁往往是引起死锁问题的原因。死锁的一种情况是由于系统正在执行的一个线程需要一个锁,而这个锁被当前系统未执行的某个线程所持有。系统中大部分是共享锁。
注意到第一个锁当前持有的线程为85f55688-01<*> (-01<*>代表当前持有的进程),等待队列中有三个线程在等待这个锁;第二个锁为共享锁,所有线程列表中的thread都持有它。
这两个锁实际上是两个resource的形式给出,通过命令看一下这两个resouce的结构:
0: kd> dt nt!_ERESOURCE -b 0x837a3e80
+0x000 SystemResourcesList : _LIST_ENTRY [ 0x8377f3c0 - 0x8377eac0 ]
+0x000 Flink : 0x8377f3c0
+0x004 Blink : 0x8377eac0
+0x008 OwnerTable : 0x87bfb068
+0x00c ActiveCount : 1
+0x00e Flag : 0x84
+0x010 SharedWaiters : 0x87db2088
+0x014 ExclusiveWaiters : 0x87db20c0
+0x018 OwnerEntry : _OWNER_ENTRY
+0x000 OwnerThread : 0x85f55688
+0x004 IoPriorityBoosted : 0y0
+0x004 OwnerReferenced : 0y0
+0x004 OwnerCount : 0y000000000000000000000000000001 (0x1)
+0x004 TableSize : 4
+0x020 ActiveEntries : 1
+0x024 ContentionCount : 0x30
+0x028 NumberOfSharedWaiters : 0
+0x02c NumberOfExclusiveWaiters : 3
+0x030 Address : (null)
+0x030 CreatorBackTraceIndex : 0
+0x034 SpinLock : 0
0: kd> dt nt!_ERESOURCE -b 0x87b286dc
+0x000 SystemResourcesList : _LIST_ENTRY [ 0x87b286a4 - 0x87b11234 ]
+0x000 Flink : 0x87b286a4
+0x004 Blink : 0x87b11234
+0x008 OwnerTable : 0x892d0130
+0x00c ActiveCount : 1
+0x00e Flag : 0
+0x010 SharedWaiters : (null)
+0x014 ExclusiveWaiters : (null)
+0x018 OwnerEntry : _OWNER_ENTRY
+0x000 OwnerThread : 0x89484568
+0x004 IoPriorityBoosted : 0y0
+0x004 OwnerReferenced : 0y0
+0x004 OwnerCount : 0y000000000000000000000000000010 (0x2)
+0x004 TableSize : 8
+0x020 ActiveEntries : 0xc
+0x024 ContentionCount : 0
+0x028 NumberOfSharedWaiters : 0
+0x02c NumberOfExclusiveWaiters : 0
+0x030 Address : (null)
+0x030 CreatorBackTraceIndex : 0
+0x034 SpinLock : 0
能看到ERESOURCE结构的OwnerThread域正是上述分析的持有锁的线程。ERESOURCE结构定义大致如下:
typedef struct _ERESOURCE {
LIST_ENTRY SystemResourcesList;
POWNER_ENTRY OwnerTable;
SHORT ActiveCount;
USHORT Flag;
PKSEMAPHORE SharedWaiters;
PKEVENT ExclusiveWaiters;
OWNER_ENTRY OwnerThreads[2];
ULONG ContentionCount;
USHORT NumberOfSharedWaiters;
USHORT NumberOfExclusiveWaiters;
union{
PVOID Address;
ULONG_PTR CreatorBackTraceIndex;
};
KSPIN_LOCK SpinLock;
} ERESOURCE, *PERESOURCE;
现在比较可疑的是线程0x85f55688所持有的资源锁,列出这个线程的stack trace看看有没有hot spot:
0: kd> !thread 0x85f55688
THREAD 85f55688 Cid 0004.0024 Teb: 00000000 Win32Thread: 00000000 WAIT: (WrLpcReply) KernelMode Non-Alertable
85f558bc Semaphore Limit 0x1
Waiting for reply to ALPC Message 8e4a46b0 : queued at port 88c5cc18 : owned by process 88c09400
Not impersonating
DeviceMap 8cc08a00
Owning Process 0 Image: <Unknown>
Attached Process 85f22920 Image: System
Wait Start TickCount 6525 Ticks: 30548 (0:00:07:56.551)
Context Switch Count 9141
UserTime 00:00:00.000
KernelTime 00:00:00.546
Win32 Start Address nt!ExpWorkerThread (0x836abb1a)
Stack Init 8e907fd0 Current 8e907220 Base 8e908000 Limit 8e905000 Call 0
Priority 13 BasePriority 13 PriorityDecrement 0 IoPriority 2 PagePriority 5
ChildEBP RetAddr Args to Child
8e907238 836aca58 85f55688 00000000 83767d20 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])
8e907270 836ab113 85f55748 85f55688 85f558bc nt!KiSwapThread+0x266
8e907298 836a524f 85f55688 85f55748 00000000 nt!KiCommitThreadWait+0x1df
8e907310 836f4ad7 85f558bc 00000011 00000000 nt!KeWaitForSingleObject+0x393
8e907338 838b3bae 85f558bc 00000000 00000000 nt!AlpcpSignalAndWait+0x7b
8e90735c 838a9c2f 00000000 8e9073c8 00000000 nt!AlpcpReceiveSynchronousReply+0x27
8e9073ec 8389d28f 88c5a6b0 00020000 8e907628 nt!AlpcpProcessSynchronousRequest+0x276
8e907448 838b4ad7 88c5a6b0 8e907628 00000000 nt!LpcpRequestWaitReplyPort+0x6a
8e907470 8368157a 80000c54 8e907628 8e907528 nt!NtRequestWaitReplyPort+0x4c
8e907470 8368024d 80000c54 8e907628 8e907528 nt!KiFastCallEntry+0x12a (FPO: [0,3] TrapFrame @ 8e907484)
8e9074f4 98b63434 80000c54 8e907628 8e907528 nt!ZwRequestWaitReplyPort+0x11 (FPO: [3,0,0])
WARNING: Stack unwind information not available. Following frames may be wrong.
8e90772c 98b638a1 00000000 00000000 98b7d6ec tmcomm!CWorkerThreadPool::CWorkerThreadPool+0xa54
8e907744 98b9d97e 8e90775c 00000000 98ba6238 tmcomm!KmCallUm+0x75
8e907784 98b9ee09 9530af7c 162a554d 98ba6238 tmactmon+0x397e
8e9077c0 98b9ef4d 0000000a 80000002 99ccc490 tmactmon+0x4e09
8e9077ec 98b9efbc 8e907844 00000000 0000000a tmactmon+0x4f4d
8e907800 98b8c450 8e907844 00000000 8e907950 tmactmon+0x4fbc
8e907828 98b91f97 8e907802 8e9078a4 8e907950 tmevtmgr+0x1450
8e90789c 98b921d7 00000000 88be10a4 8e907950 tmevtmgr!TMEvtCommunicateRoutine+0x2569
8e9078b4 98b8c450 8e907900 88be0edc 8e9079c8 tmevtmgr!TMEvtCommunicateRoutine+0x27a9
8e9078dc 98b90d0d 8e907903 8e9079a0 8e907a0c tmevtmgr+0x1450
8e9079ac 98b8e7b0 8e9079c8 8e9079e4 98b8e944 tmevtmgr!TMEvtCommunicateRoutine+0x12df
8e9079b8 98b8e944 88be0f5c 8e9079c8 8e907acc tmevtmgr+0x37b0
8e9079e4 8368157a 88be0f5c 8e907acc 82000000 tmevtmgr+0x3944
8e9079e4 8367f069 88be0f5c 8e907acc 82000000 nt!KiFastCallEntry+0x12a (FPO: [0,3] TrapFrame @ 8e907a0c)
8e907a7c 838266b5 8e907acc 82000000 8e907ad8 nt!ZwCreateKey+0x11 (FPO: [7,0,0])
8e907af0 8380e477 8e907cb8 82000000 00000000 nt!IopCreateRegistryKeyEx+0x61
8e907c18 8380e0bd 85f4d530 00000002 82000000 nt!IopOpenOrCreateDeviceRegistryKey+0x3b1
8e907c30 8c07e594 85f4d530 00000002 82000000 nt!IoOpenDeviceRegistryKey+0x18
8e907cc8 8c016489 87dbb0e0 00000000 00000000 ndis!ndisCheckAdapterBindings+0xa7 (FPO: [Non-Fpo])
8e907cf0 8c07d013 87dbbc70 87dbb0e0 8e907d50 ndis!ndisQueuedRestoreMiniportstack+0x36 (FPO: [Non-Fpo])
8e907d00 836abc27 87dbbc70 00000000 85f55688 ndis!ndisWorkItemHandler+0xe (FPO: [Non-Fpo])
8e907d50 8384c643 00000000 a63db72a 00000000 nt!ExpWorkerThread+0x10d
8e907d90 836fe059 836abb1a 00000000 00000000 nt!PspSystemThreadStartup+0x9e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x19
上述stack trace没有看出什么可疑的状态。再看看另外一个线程0x89484568(持有共享锁):
0: kd> !thread 0x89484568
THREAD 89484568 Cid 0cd0.0f50 Teb: 7ffa6000 Win32Thread: ffb39180 WAIT: (Executive) KernelMode Non-Alertable
b44c6680 NotificationEvent
IRP List:
89493980: (0006,0244) Flags: 00000884 Mdl: 00000000
894d6db8: (0006,0244) Flags: 00000884 Mdl: 00000000
88c4e9a0: (0006,0244) Flags: 00000884 Mdl: 00000000
894d39a0: (0006,0244) Flags: 00000884 Mdl: 00000000
Not impersonating
DeviceMap 8e12ea70
Owning Process 89020508 Image: IMJPCMNT.EXE
Wait Start TickCount 4518 Ticks: 32555 (0:00:08:27.861)
Context Switch Count 330
UserTime 00:00:00.000
KernelTime 00:00:00.031
Win32 Start Address 0x697ee751
Stack Init b44c7fd0 Current b44c6550 Base b44c8000 Limit b44c5000 Call 6a0
Priority 10 BasePriority 8 PriorityDecrement 16 IoPriority 2 PagePriority 5
*** ERROR: Module load completed but symbols could not be loaded for flyfs.sys
ChildEBP RetAddr Args to Child
b44c6568 836aca58 89484568 00000000 807c2120 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])
b44c65a0 836ab113 89484628 89484568 b44c6680 nt!KiSwapThread+0x266
b44c65c8 836a524f 89484568 89484628 00000000 nt!KiCommitThreadWait+0x1df
b44c6640 8bd6054f b44c6680 00000000 00000000 nt!KeWaitForSingleObject+0x393
b44c66a8 8bd5c54a 8bd5c4aa 894d5c90 00000003 fltmgr!FltpPostSyncOperation+0x8d (FPO: [Non-Fpo])
b44c66cc 8bd59765 894d5c90 89435060 894d5c90 fltmgr!FltpGetNormalizedFileName+0x5e (FPO: [Non-Fpo])
b44c66e4 8bd43773 894d5c90 00000000 894d5c90 fltmgr!FltpCreateFileNameInformation+0x81 (FPO: [Non-Fpo])
b44c6704 8bd438c7 877273c0 00000000 00000000 fltmgr!HandleStreamListNotSupported+0x125 (FPO: [Non-Fpo])
b44c6734 8bd43fa3 c00000bb 89435068 87b287d8 fltmgr!FltpGetFileNameInformation+0xc7 (FPO: [Non-Fpo])
b44c675c 8bd97a1d 00435068 00000101 b3187e2c fltmgr!FltGetFileNameInformation+0x12b (FPO: [Non-Fpo])
b44c67d4 8bd892be 89435068 00000101 b3187e2c syscow32v!SysCowGetFileNameInformation+0xcd (FPO: [Non-Fpo]) (CONV: stdcall) [d:\code\engineering\products\systemok\restore\driver\support.cpp @ 2397]
b44c6868 8bd3daeb 89435068 b44c6888 b44c68b4 syscow32v!SysCowPreCreate+0x31e (FPO: [Non-Fpo]) (CONV: stdcall) [d:\code\engineering\products\systemok\restore\driver\create.cpp @ 299]
b44c68d4 8bd409f0 b44c6918 89493980 00000000 fltmgr!FltpPerformPreCallbacks+0x34d (FPO: [Non-Fpo])
b44c68ec 8bd541fe b44c6918 8bd57f3c 00000000 fltmgr!FltpPassThroughInternal+0x40 (FPO: [Non-Fpo])
b44c6900 8bd548b7 b44c6918 89493980 89493b98 fltmgr!FltpCreateInternal+0x24 (FPO: [Non-Fpo])
b44c6944 8367a567 87e5d668 87727008 89493bbc fltmgr!FltpCreate+0x2c9 (FPO: [Non-Fpo])
b44c695c 8bda7bd1 89493ba0 89493980 892be848 nt!IofCallDriver+0x63
WARNING: Stack unwind information not available. Following frames may be wrong.
b44c6980 8bda91ec 87e5d7b0 89493980 b44c6a30 flyfs+0x1bd1
b44c699c 8bda94f3 87e5d7b0 0000001b 894dab28 flyfs+0x31ec
b44c6a50 8bda9f60 87e5d7b0 89493980 b44c6a70 flyfs+0x34f3
b44c6a60 8bda9f8a 87e5d7b0 89493980 b44c6a88 flyfs+0x3f60
b44c6a70 8367a567 87e5d7b0 89493980 894dab84 flyfs+0x3f8a
b44c6a88 8387e5f5 9ce1a1da b44c6c30 00000000 nt!IofCallDriver+0x63
b44c6b60 8385f1a7 876f0678 87f6dde8 89403d20 nt!IopParseDevice+0xed7
b44c6bdc 83885215 00000000 b44c6c30 00000240 nt!ObpLookupObjectName+0x4fa
b44c6c3c 8387d573 b44c6dc0 85f6dde8 89484500 nt!ObOpenObjectByName+0x159
b44c6cb8 838b4d59 b44c6e04 00120089 b44c6dc0 nt!IopCreateFile+0x673
b44c6d14 83806c27 b44c6e04 00120089 b44c6dc0 nt!IoCreateFileEx+0x9e
b44c6d70 8bdac7c7 b44c6e04 00120089 b44c6dc0 nt!IoCreateFileSpecifyDeviceObjectHint+0x59
b44c6e10 8bda9593 0000004d b44c6ec0 87e5d7b0 flyfs+0x67c7
b44c6ec8 8bda9f60 87e5d7b0 894d6db8 b44c6ee8 flyfs+0x3593
b44c6ed8 8bda9f8a 87e5d7b0 894d6db8 b44c6f00 flyfs+0x3f60
b44c6ee8 8367a567 87e5d7b0 894d6db8 892e76ac flyfs+0x3f8a
b44c6f00 8387e5f5 9ce1a562 b44c70a8 00000000 nt!IofCallDriver+0x63
b44c6fd8 8385f1a7 876f0678 85f6dde8 894d8698 nt!IopParseDevice+0xed7
b44c7054 83885215 00000000 b44c70a8 00000240 nt!ObpLookupObjectName+0x4fa
b44c70b0 8387d573 b44c71d8 85f6dde8 00010000 nt!ObOpenObjectByName+0x159
b44c712c 838b4d59 894d5490 00100001 b44c71d8 nt!IopCreateFile+0x673
b44c7188 8bd5b8e5 894d5490 00100001 b44c71d8 nt!IoCreateFileEx+0x9e
b44c7220 8bd5bc59 00000000 00000000 0000fffe fltmgr!FltpExpandFilePathWorker+0x167 (FPO: [Non-Fpo])
IMJPCMNT.EXE是Microsoft输入法进程。
相关阅读 更多 +