This month’s random crash dump came from a developer’s system here at OSR. He was, innocently enough, programming along and listening to a CD (on his computer, of course) when his system crashed. Of course, being kernel developers we are always looking for decent crash dumps, so his system produced one (a kernel summary dump) for him. In this article we’ll analyze this, talk about the bug we found, and our resolution of it.
Of course, by now we all know the first step in analyzing any crash dump is to use the "!analyze" command. While it is not always right, it generally provides us with the right place to start (See below).
1: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
KERNEL_MODE_EXCEPTION_NOT_HANDLED (8e)
This is a very common bugcheck. Usually the exception address pinpoints
the driver/function that caused the problem. Always note this address
as well as the link date of the driver/image that contains this address.
Some common problems are exception code 0x80000003. This means a hard
coded breakpoint or assertion was hit, but this system was booted
/NODEBUG. This is not supposed to happen as developers should never have
hardcoded breakpoints in retail code, but ...
If this happens, make sure a debugger gets connected, and the
system is booted /DEBUG. This will let us see why this breakpoint is
happening.
An exception code of 0x80000002 (STATUS_DATATYPE_MISALIGNMENT) indicates
that an unaligned data reference was encountered. The trap frame will
supply additional information.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: f6c32190, The address that the exception occurred at
Arg3: ba9be69c, Trap Frame
Arg4: 00000000
Debugging Details:
------------------
Debugger Dbgportaldb Connection::Open failed 80004005
Database Dbgportaldb not connected
ADO ERROR 80004005,11: [DBNETLIB][ConnectionOpen (Connect()).]SQL Server does not exist or
access denied.
EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at
"0x%08lx". The memory could not be "%s".
FAULTING_IP:
smwdm+b190
f6c32190 ff37 push dword ptr [edi]
TRAP_FRAME: ba9be69c -- (.trap ffffffffba9be69c)
ErrCode = 00000000
eax=8675e2ec ebx=ba9be73f ecx=00000000 edx=00000000 esi=85d1b3c8 edi=00000000
eip=f6c32190 esp=ba9be710 ebp=ba9be740 iopl=0 nv up ei pl zr na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246
smwdm+0xb190:
f6c32190 ff37 push dword ptr [edi] ds:0023:00000000=????????
Resetting default scope
DEFAULT_BUCKET_ID: DRIVER_FAULT
BUGCHECK_STR: 0x8E
LAST_CONTROL_TRANSFER: from f6c3164c to f6c32190
STACK_TEXT:
WARNING: Stack unwind information not available. Following frames may be wrong.
ba9be740 f6c3164c 00000002 8675e2f8 ba9be7a4 smwdm+0xb190
ba9be760 f6c0d844 85d1b3c8 00000002 00000002 smwdm+0xa64c
ba9be780 f6c21760 8675e2f8 00000002 00000003 portcls!CPortPinWavePci::SetDeviceState+0x41
ba9be7b8 f6c21e8e 85d93c08 00000003 86470e48
portcls!CPortPinWavePci::DistributeDeviceState+0x4d
ba9be7d4 f6cdf812 86c82f48 85c3d380 85c3d378 portcls!CPortPinWavePci::DeviceIoControl+0x27f
ba9be840 f6cdf7fd 86470e48 00000004 e197f05c ks!KspPropertyHandler+0x625
ba9be860 f6c14f20 86470e48 00000004 e197f048 ks!KsPropertyHandler+0x17
ba9be870 f6c21def 86470e48 00000004 e197f048 portcls!PcHandlePropertyWithTable+0x1a
ba9be8a8 f6c14ec8 8675e2e8 86c86030 86470e48 portcls!CPortPinWavePci::DeviceIoControl+0x1e0
ba9be8c0 f6ce5633 86c86030 86470e48 ba9be90c portcls!DispatchDeviceIoControl+0x44
ba9be8d0 f6c1c7c9 86c86030 86470e48 86432d38 ks!KsDispatchIrp+0xa3
ba9be8e4 f6c1ca2c 86c86030 86470e48 f6c88940 portcls!KsoDispatchIrp+0x40
ba9be8f0 f6c88940 86c86030 86470e48 86432d38 portcls!PcDispatchIrp+0x2a
ba9be90c 804eb3c1 86c86030 86470e48 86c86e90 smwdm+0x61940
ba9be91c f7b033f3 86432d38 86c86dd8 86c87c88 nt!IopfCallDriver+0x31
ba9be968 f77a27e7 86432d38 00000000 002f0003 aeaudio+0x3f3
ba9be9b0 f77a317e 86432d38 00000000 00000002 sysaudio!PinConnectionProperty+0x40
ba9be9dc f77a3d04 00000002 00000002 00000006 sysaudio!CPinNodeInstance::SetState+0xff
ba9be9f8 f77a3df7 00000002 00000003 00000006
sysaudio!CConnectNodeInstance::SetStateBottomUp+0x24
ba9bea18 f77a3e43 00000002 00000003 00000006
sysaudio!CStartNodeInstance::SetStateBottomUp+0x27
ba9bea38 f77a224f 00000002 00000006 8682b8d0 sysaudio!CStartNodeInstance::SetState+0x69
ba9bea4c f6cdf812 e2d6b4f0 8682b8d0 8682b8c8 sysaudio!CPinInstance::PinStateHandler+0x5b
ba9beab8 f6cdf7fd 85f7ee48 00000003 f77a0098 ks!KspPropertyHandler+0x625
ba9bead8 f77a21d8 85f7ee48 00000003 f77a0098 ks!KsPropertyHandler+0x17
ba9beb2c f6cdedfe 86a479c8 85f7ee48 804eb3c1
sysaudio!CPinInstance::PinDispatchIoControl+0x115
ba9beb38 804eb3c1 86a479c8 85f7ee48 85f7ee48 ks!DispatchDeviceIoControl+0x25
ba9beb48 f6cdf002 ba901350 ba9bebb4 00000000 nt!IopfCallDriver+0x31
ba9beb74 ba8f3fb5 86e53e70 00000000 002f0003 ks!KsSynchronousIoControlDevice+0xbb
ba9bebbc ba8f356b 86e53e70 ba901340 00000000 wdmaud!PinProperty+0x49
ba9bebd8 ba8fc56f 86e53e70 00000002 86cd0efc wdmaud!StatePin+0x19
ba9bebf8 ba8f40e9 85c3688c 86cd0ee8 00000002 wdmaud!StateWavePin+0x65
ba9bec14 ba8f4045 8608ae48 85c36000 00000000 wdmaud!Dispatch_State+0x152
ba9bec40 804eb3c1 00000000 85c36000 806bb2cc wdmaud!SoundDispatch+0x2e6
ba9bec50 805644d2 8608aedc 864b0728 8608ae48 nt!IopfCallDriver+0x31
ba9bec64 805651f6 86a55978 8608ae48 864b0728 nt!IopSynchronousServiceTail+0x5e
ba9bed00 8055e288 000001cc 000004bc 00000000 nt!IopXxxControlFile+0x5a6
ba9bed34 805306a4 000001cc 000004bc 00000000 nt!NtDeviceIoControlFile+0x28
ba9bed34 7ffe0304 000001cc 000004bc 00000000 nt!KiSystemService+0xc9
0106ff48 00000000 00000000 00000000 00000000 SharedUserData!SystemCallStub+0x4
FOLLOWUP_IP:
smwdm+b190
f6c32190 ff37 push dword ptr [edi]
SYMBOL_STACK_INDEX: 0
FOLLOWUP_NAME: waddext
SYMBOL_NAME: smwdm+b190
MODULE_NAME: smwdm
IMAGE_NAME: smwdm.sys
DEBUG_FLR_IMAGE_TIMESTAMP: 3cf3d814
STACK_COMMAND: .trap ffffffffba9be69c ; kb
BUCKET_ID: 0x8E_smwdm+b190
Followup: waddext
---------
As is so often the case for those of us out in the real world, we don’t have symbols for this third party driver. Thus, we have to work through this dump, doing a bit of careful analysis in order to figure out what actually happened.
Generally, when I approach a dump like this I focus on the trap frame, since that is the point at which the original exception arose. In this case the trap frame points to our faulting instruction (ah, but we got that from the !analyze in the first place):
f6c32190 ff37 push dword ptr [edi] ds:0023:00000000=????????
Thus, the data contents of EDI do not point to a valid memory location. Of course, by itself this is not enough to tell us why this pointer was invalid in the first place. To do that, we want to see the sequence of instructions leading up to this point. Of course, without symbols, we don’t know where this function even started, so we can either back up a random amount or we can figure out where the function started.
To determine where the current function started, we can look at the address called by the previous frame. Eventually, that will lead to a call into our current function - while often it is the direct call, because of optimization we might have missed some call frames.
In this case the previous call frame indicates the call is from f6c31647:
f6c31647 e8f80b0000 call smwdm+0xb244 (f6c32244)
f6c3164c 83be1006000001 cmp dword ptr [esi+0x610],0x1
We found this by using the Calls window in the debugger to set the focus to the previous call frame. From this we can begin tracing from this point. Notice, however, that this address is above the address of the fault - suggesting that in fact there is some subsequent call to the appropriate function. To confirm this we need to look through the code flow in order to either find a jump or call to the lower address.
For this particular case it takes a bit of time because there are several calls, each of which must be further traced down. But we must start by disassembling from the address we discovered:
1: kd> u f6c32244
smwdm+0xb244:
f6c32244 55 push ebp
f6c32245 8bec mov ebp,esp
f6c32247 83ec0c sub esp,0xc
f6c3224a 56 push esi
f6c3224b 57 push edi
f6c3224c 8bf1 mov esi,ecx
f6c3224e 33ff xor edi,edi
f6c32250 6639bea4000000 cmp [esi+0xa4],di
1: kd> u
smwdm+0xb257:
f6c32257 0f848a010000 je smwdm+0xb3e7 (f6c323e7)
f6c3225d 8b4618 mov eax,[esi+0x18]
f6c32260 83be1006000001 cmp dword ptr [esi+0x610],0x1
f6c32267 8b4024 mov eax,[eax+0x24]
f6c3226a 8945f4 mov [ebp-0xc],eax
f6c3226d 740f jz smwdm+0xb27e (f6c3227e)
f6c3226f 8d8ebc000000 lea ecx,[esi+0xbc]
f6c32275 ff150cdbc6f6 call dword ptr [smwdm+0x46b0c (f6c6db0c)]
The call here is a call into an OS function (calls through a table generally are using the import table for the driver). This is easily seen using the debugger:
1: kd> dd f6c6db0c l1
f6c6db0c 806bb6b0
1: kd> u @$p
hal!KfAcquireSpinLock:
Thus, this is a spin lock acquisition. We resume disassembly of our original function:
1: kd> u f6c32275
smwdm+0xb275:
f6c32275 ff150cdbc6f6 call dword ptr [smwdm+0x46b0c (f6c6db0c)]
f6c3227b 8845ff mov [ebp-0x1],al
f6c3227e 53 push ebx
f6c3227f 0fb69ef8000000 movzx ebx,byte ptr [esi+0xf8]
f6c32286 8b466c mov eax,[esi+0x6c]
f6c32289 83c004 add eax,0x4
f6c3228c 50 push eax
f6c3228d ff75f4 push dword ptr [ebp-0xc]
1: kd> u
smwdm+0xb290:
f6c32290 e8c5deffff call smwdm+0x915a (f6c3015a)
f6c32295 0fb6c0 movzx eax,al
f6c32298 3bd8 cmp ebx,eax
f6c3229a 8945f8 mov [ebp-0x8],eax
f6c3229d 0f8490000000 je smwdm+0xb333 (f6c32333)
f6c322a3 6639bea4000000 cmp [esi+0xa4],di
f6c322aa 767e jbe smwdm+0xb32a (f6c3232a)
f6c322ac 80beb800000000 cmp byte ptr [esi+0xb8],0x0
Notice the call to f6c3015a. This call should be further traced (which we actually did in our analysis) but we’ll omit this for brevity. Thus, we continue our disassembly from this point:
1: kd> u f6c322ac
smwdm+0xb2ac:
f6c322ac 80beb800000000 cmp byte ptr [esi+0xb8],0x0
f6c322b3 7445 jz smwdm+0xb2fa (f6c322fa)
f6c322b5 6a01 push 0x1
f6c322b7 8bcb mov ecx,ebx
f6c322b9 58 pop eax
f6c322ba d3e0 shl eax,cl
f6c322bc 8586b4000000 test [esi+0xb4],eax
f6c322c2 7436 jz smwdm+0xb2fa (f6c322fa)
1: kd> u
smwdm+0xb2c4:
f6c322c4 66ff8ea6000000 dec word ptr [esi+0xa6]
f6c322cb 8d8ee0000000 lea ecx,[esi+0xe0]
f6c322d1 8b11 mov edx,[ecx]
f6c322d3 897cda04 mov [edx+ebx*8+0x4],edi
f6c322d7 8b09 mov ecx,[ecx]
f6c322d9 893cd9 mov [ecx+ebx*8],edi
f6c322dc f7d0 not eax
f6c322de 2186b4000000 and [esi+0xb4],eax
1: kd> u
smwdm+0xb2e4:
f6c322e4 43 inc ebx
f6c322e5 83e31f and ebx,0x1f
f6c322e8 6639bea4000000 cmp [esi+0xa4],di
f6c322ef 7530 jnz smwdm+0xb321 (f6c32321)
f6c322f1 80a6b900000000 and byte ptr [esi+0xb9],0x0
f6c322f8 eb27 jmp smwdm+0xb321 (f6c32321)
f6c322fa 83be1006000001 cmp dword ptr [esi+0x610],0x1
f6c32301 7411 jz smwdm+0xb314 (f6c32314)
1: kd> u
smwdm+0xb303:
f6c32303 8d45ff lea eax,[ebp-0x1]
f6c32306 8bce mov ecx,esi
f6c32308 50 push eax
f6c32309 0fb7c3 movzx eax,bx
f6c3230c 50 push eax
f6c3230d e83afeffff call smwdm+0xb14c (f6c3214c)
f6c32312 eb0b jmp smwdm+0xb31f (f6c3231f)
f6c32314 0fb7c3 movzx eax,bx
And now we see the call back to the address range we were seeking - in this case f6c3214c. We could have also obtained this information by manually examining the stack:
1: kd> dd ba9be710
ba9be710 00000000 85d1b3c8 00000001 f6c32312
ba9be720 00000001 ba9be73f 00000000 00000000
ba9be730 85d1b3c8 86c16000 00000003 00d1b314
ba9be740 ba9be760 f6c3164c 00000002 8675e2f8
ba9be750 ba9be7a4 85d1b448 86c16000 f6c15264
ba9be760 ba9be780 f6c0d844 85d1b3c8 00000002
ba9be770 00000002 8675e2f8 00000000 00000000
ba9be780 ba9be7b8 f6c21760 8675e2f8 00000002
What we were looking for is that return address (f6c32312). From this we can back up in the code stream to find the jump address. While not strictly required, we generally observe that call instructions are generated as five byte opcodes (four bytes for the relative jump offset, one byte for the opcode) and thus we can find the call by looking five bytes prior to the return:
1: kd> u f6c3230d
smwdm+0xb30d:
f6c3230d e83afeffff call smwdm+0xb14c (f6c3214c)
f6c32312 eb0b jmp smwdm+0xb31f (f6c3231f)
This latter approach is generally faster. Regardless of the specific approach, we can now display the entire disassembly range:
1: kd> u f6c3214c f6c32192
smwdm+0xb14c:
f6c3214c 53 push ebx
f6c3214d 56 push esi
f6c3214e 57 push edi
f6c3214f 8bf1 mov esi,ecx
f6c32151 e8d4ffffff call smwdm+0xb12a (f6c3212a)
f6c32156 8bf8 mov edi,eax
f6c32158 85ff test edi,edi
f6c3215a 741f jz smwdm+0xb17b (f6c3217b)
f6c3215c 8b4f10 mov ecx,[edi+0x10]
f6c3215f 8d86c8000000 lea eax,[esi+0xc8]
f6c32165 33d2 xor edx,edx
f6c32167 8b18 mov ebx,[eax]
f6c32169 8b4004 mov eax,[eax+0x4]
f6c3216c 03cb add ecx,ebx
f6c3216e 13d0 adc edx,eax
f6c32170 8d86c8000000 lea eax,[esi+0xc8]
f6c32176 8908 mov [eax],ecx
f6c32178 895004 mov [eax+0x4],edx
f6c3217b 8b5c2414 mov ebx,[esp+0x14]
f6c3217f 8d8ebc000000 lea ecx,[esi+0xbc]
f6c32185 8a13 mov dl,[ebx]
f6c32187 ff1510dbc6f6 call dword ptr [smwdm+0x46b10 (f6c6db10)]
f6c3218d 8b461c mov eax,[esi+0x1c]
f6c32190 ff37 push dword ptr [edi]
It was this final push that led to the invalid memory reference. Any time we see an invalid memory reference like this, we try to find out from whence the address came from. In this case we can see that EDI was loaded earlier (f6c32156). Since this came from EAX immediately following a call, we know that this is the return value from this function. The call is another call into the driver:
smwdm+0xb12a:
f6c3212a 8b8104010000 mov eax,[ecx+0x104]
f6c32130 81c104010000 add ecx,0x104
f6c32136 3bc1 cmp eax,ecx
f6c32138 7503 jnz smwdm+0xb13d (f6c3213d)
f6c3213a 33c0 xor eax,eax
f6c3213c c3 ret
From this code snippet, it would appear they are checking some sort of list. ECX contains the address of a data structure. It looks at the contents of offset 0x104 and if this is equal to the memory address of offset 0x104 then the xor will be executed (setting EAX to zero) and return; thus clearly this is the case.
Finding the actual data structure took a bit of digging because of the use of the fastcall optimization here - ECX contains the address of the data structure. The function that called this has damaged ECX prior to the trap frame being recorded, so we walked up to the previous call - and this gave us a hint:
f6c0d83d 8b08 mov ecx,[eax]
f6c0d83f 57 push edi
f6c0d840 50 push eax
f6c0d841 ff5110 call dword ptr [ecx+0x10]
f6c0d844 85c0 test eax,eax
The call is into this driver (presumably this is the entry point table of the driver). By looking at the stack we can find the value of EAX at the point of the call (since it was pushed onto the stack - assuming it has not been modified since then). The address we found on the stack (you can see it in the earlier stack dump) was 85d1b3c8 (later, we found that this value was preserved in ESI as well). Using the "!pool" command we can see the value of this pool tag:
1: kd> !pool 85d1b3c8
Pool page 85d1b3c8 region is Nonpaged pool
85d1b000 size: 350 previous size: 0 (Free) Io
85d1b350 size: 68 previous size: 350 (Allocated) MmCa
85d1b3b8 size: 8 previous size: 68 (Free) Ntfr
*85d1b3c0 size: 620 previous size: 8 (Allocated) *PcNw
Pooltag PcNw : WDM audio stuff
85d1b9e0 size: 18 previous size: 620 (Allocated) ReEv
85d1b9f8 size: 608 previous size: 18 (Free) Ntfr
Of course, this merely confirms our suspicion that this relates to the WDM audio stack (someone more familiar with this stack probably will recognize this data structure). The value in ECX was the location at that memory address, however:
1: kd> dd 85d1b3c8 l1
85d1b3c8 f6c6e7ac
Using this with the pool command again, leads us into never-never land:
1: kd> !pool f6c6e7ac
Pool page f6c6e7ac region is Nonpaged pool
f6c6e000 is not a valid small pool allocation, checking large pool...
f6c6e000 is freed (or corrupt) pool
Bad allocation size @f6c6e000, zero is invalid
***
*** An error (or corruption) in the pool was detected;
*** Attempting to diagnose the problem.
***
*** Use !poolval f6c6e000 for more details.
***
Pool page [ f6c6e000 ] is INVALID.
Analyzing linked list...
[ f6c6e000 ]: invalid previous size [ 0x195 ] should be [ 0x0 ]
[ f6c6e000 --> f6c6e288 (size = 0x288 bytes)]: Corrupt region
[ f6c6e4a0 --> f6c6e9e8 (size = 0x548 bytes)]: Corrupt region
Scanning for single bit errors...
None found
By using the lm (list modules) command we are able to find this memory block - it falls inside of the driver itself:
f6c27000 f6c9b680 smwdm (no symbols)
Thus, our suspicion is that this is a data structure within the driver itself (perhaps a global structure). Unfortunately, without knowing more about the structure of these drivers we cannot say what this data structure should (or would) represent. Further digging might be necessary, but let’s return back to our original analysis. This function return 0x0 because this list was empty. Thus, we are going to skip to the requisite code:
f6c32156 8bf8 mov edi,eax
f6c32158 85ff test edi,edi
f6c3215a 741f jz smwdm+0xb17b (f6c3217b)
This would correspond to an if block (hence if ( return value != 0) { … }) without a corresponding else block (otherwise we’d see a couple of immediate jmp instructions.) Once outside the if block, we see the cleanup code:
f6c3217b 8b5c2414 mov ebx,[esp+0x14]
f6c3217f 8d8ebc000000 lea ecx,[esi+0xbc]
f6c32185 8a13 mov dl,[ebx]
f6c32187 ff1510dbc6f6 call dword ptr [smwdm+0x46b10 (f6c6db10)]
In this code we load the first parameter (on the stack, so third parameter to this function since it uses fastcall) into EBX, load the value of ESI+0xbc into ECX and then call the function pointed to by f6c6db10, which is:
1: kd> dd f6c6db10 l1
f6c6db10 806bb780
1: kd> u @$p
hal!KfReleaseSpinLock:
Thus, this suggests that we’re releasing the spin lock. Recall that earlier we saw a spin lock acquisition - checking back, we note that this was at the same byte offset (and hence likely to be the same spin lock). This then leads us into the terminal push operation. Thus, even without symbols for this driver, we can see the fundamental flaw in this driver - and presumably in its testing.
The driver code in this case returns a NULL pointer (because the list is empty) and yet after releasing its own spin lock dereferences this NULL pointer. With this level of information it would seem that a problem of this type would be straight-forward to resolve and fix with the developers.
Of course, we did report it to the developers of this audio driver. While they acknowledged that this is a bug in their driver, we were told that they would not commit to fixing it. Guess it is time to find a new sound card!