Bugchecks Explained: NO_MORE_IRP_STACK_LOCATIONS
OSR Staff | Published: 24-Aug-04| Modified: 24-Aug-04
All I/O operations in Windows are described by an I/O Request Packet (IRP). If you’re familiar with IRP handling, you already know that there is a fixed part of the IRP and also a variable length part of the IRP. This variable-length part of the IRP contains all the IRP’s I/O stack locations.
When an I/O request is issued, the I/O Manager (or a kernel-mode driver) allocates an IRP with at least as many I/O stack locations as there are drivers that it thinks will need to handle the request. The number of stack locations in the IRP defines the maximum number of devices that will be allowed to handle the request.
When an IRP is allocated either by the I/O Manager or a driver, the StackSize member of the target DeviceObject is used to determine the number of stack locations that are needed for the IRP. In other words, the StackSize member reflects the number of devices that are logically in the stack that is going to process this request. If an IRP with too few I/O stack locations is allocated, when an attempt is made to forward the IRP on to a lower device and all of the stack locations have been exhausted, the system bugchecks with NO_MORE_IRP_STACK_LOCATIONS.
Who Did It?
All drivers in the system are responsible for maintaining the StackSize member of the device objects they create. Drivers do this by ensuring that whenever they attach one of their devices above any other device, the driver sets their device object’s StackSize to that of the of lower device object’s, plus one. If any driver in the chain breaks this, its error bubbles up the stack and the StackSize of all the higher devices in the stack will be off. Note that the standard DDI used to attach one device to another, IoAttachDeviceToDeviceStack, takes care of updating this member for the attaching device. Therefore, one of these bugchecks usually indicates one of three things:
1) There is a driver in the system that tried to do something clever to add its device to the device stack.
2) A driver allocated an IRP with too few stack locations and passed it on to another driver.
3) A driver took an IRP it received and "jumped stacks." In other words, it did not pass it directly to the device it attached to but to some other device.
The first parameter to the bugcheck code is a pointer to the IRP which has run out of stack locations. Supplying the IRP’s address to the !irp command will show all of the devices that have so far processed the IRP. Using this information, you will need to examine all of the device objects in the stack, paying attention to their StackSize members. As you examine the StackSize values, look for a device whose StackSize value is equal to or less than the StackSize of the device below it. If you find such a device, you will potentially have found the portion of the stack where the offending driver exists. Note that because a driver relies on the driver below it following the rules and doing the right thing, the broken driver is not always obvious.
If the entire stack seems to be in order in terms of its StackSize values, the culprit of the bugcheck may be the driver that allocated the IRP. If the driver had ignored the StackSize member of the target device object and did not allocate enough stack locations for the entire stack, it would lead to this bugcheck even though the drivers in the system were following the rules.
How Should I Fix It?
If the offending driver is your driver, then fix it! However, if you are, for example, a filter driver and it is a driver above you that is broken, then you really have two choices:
1) Harass the developer of the driver until they fix it.
2) Allocate a new IRP with the correct number of stack locations and pass that on to the lower driver. When that IRP completes, you would then need to adjust the original IRP to reflect the result of the duplicate IRP and complete the original IRP.
Of course there may be some other solutions based on your particular situation, but the above two are globally applicable.
Related WinDBG Commands
Related O/S Structures
Here’s an example of the steps you can take to try to find the driver in the stack that has broken the StackSize rules.
A higher level driver has attempted to call a lower level driver through
the IoCallDriver() interface, but there are no more stack locations in the
packet, hence, the lower level driver would not be able to access its
parameters, as there are no parameters for it. This is a disasterous
situation, since the higher level driver "thinks" it has filled in the
parameters for the lower level driver (something it MUST do before it calls
it), but since there is no stack location for the latter driver, the former
has written off of the end of the packet. This means that some other memory
has probably been trashed at this point.
Arg1: ffabd1b0, Address of the IRP
f4a0d7b0 80515a29 00000035 ffabd1b0 00000000 nt!KeBugCheckEx+0x19
f4a0d7c8 f92b3c62 80dbbee0 80dbbea0 ffabd1b0 nt!IopfCallDriver+0x18
f4a0d7e0 f92b2f7d 80e77390 80dbbea0 ffabd1b0 CLASSPNP!ClassSendSrbAsynchronous+0x10f
f4a0d870 f92a223e 80e77390 ffabd220 ffabd1b0 CLASSPNP!ClassDeviceControl+0x6ff
f4a0d8f8 f92b2403 80e77390 00abd1b0 ffabd244 disk!DiskDeviceControl+0xb66
f4a0d910 804eca36 80e77390 ffabd1b0 ffabd1b0 CLASSPNP!ClassDeviceControlDispatch+0x45
f4a0d920 f94eb667 ffabd23c 80f31950 ffabd1b0 nt!IopfCallDriver+0x31
f4a0d94c 804eca36 80dc2850 ffabd1b0 ffabd260 PartMgr!PmDeviceControl+0x8c
f4a0d95c f92b2e6a 0007c0c8 ffae0388 ffabd1b0 nt!IopfCallDriver+0x31
f4a0d9d4 f92a223e ffae02d0 ffabd244 ffabd1b0 CLASSPNP!ClassDeviceControl+0x848
f4a0da5c f92b2403 ffae02d0 00abd1b0 00000000 disk!DiskDeviceControl+0xb66
f4a0da74 804eca36 ffae02d0 ffabd1b0 ffabd244 CLASSPNP!ClassDeviceControlDispatch+0x45
f4a0da84 f43296d8 80e74fb8 ffabd1b0 ffabd1b0 nt!IopfCallDriver+0x31
f4a0dab8 f4329564 80da8c88 80dd7d30 80e74fb8 Fastfat!FatCommonDeviceControl+0xe6
f4a0dafc 804eca36 80dd7c38 ffabd1b0 80edbd60 Fastfat!FatFsdDeviceControl+0x3e
f4a0db0c f976933f 804eca36 80dd6410 ffabd1b0 nt!IopfCallDriver+0x31
WARNING: Stack unwind information not available. Following frames may be wrong.
f4a0db64 804eca36 80dbc868 ffabd1b0 80edb130 +0x33f
f4a0db74 f92d2ea0 80e74fb8 80edb130 ffabd1b0 nt!IopfCallDriver+0x31
f4a0dbb0 f92d33b6 80d8dfa0 ffabd1b0 000009c4 +0x1ea0
f4a0dc34 804eca36 80d8dee8 ffabd1b0 806c7fe0 +0x23b6
f4a0dc44 8058b076 ffabd28c ffac7028 ffabd1b0 nt!IopfCallDriver+0x31
f4a0dc58 8058bc62 80d8dee8 ffabd1b0 ffac7028 nt!IopSynchronousServiceTail+0x5e
f4a0dd00 805987ec 00000be0 00000000 00000000 nt!IopXxxControlFile+0x5ec
f4a0dd34 804da140 00000be0 00000000 00000000 nt!NtDeviceIoControlFile+0x28
f4a0dd34 7ffe0304 00000be0 00000000 00000000 nt!KiSystemService+0xc4
006efc9c 00000000 00000000 00000000 00000000 SharedUserData!SystemCallStub+0x4
The first thing to do here is going to be to look at the IRP in question with the !irp command.
kd> !irp ffabd1b0
Irp is active with 4 stacks 0 is current (= 0xffabd220)
No Mdl Thread 80dbbea0: Irp stack trace.
cmd flg cl Device File Completion-Context
[ e, 0] 1 1 80e77390 ffac7028 00000000-00000000 pending
Args: 00000000 00000000 002d4800 00000004
[ e, 0] 1 0 ffae02d0 ffac7028 00000000-00000000
Args: 00000000 00000000 002d4800 00000000
[ e, 0] 1 e0 80dbc868 ffac7028 f92d4ef4-ffb0e728 Success Error Cancel
Args: 00000000 00000000 002d4800 00000000
[ e, 0] 1 1 80d8dee8 ffac7028 00000000-00000000 pending
Args: 00000000 00000000 002d4800 00000000
Using this info, we’ll first assume that this IRP hasn’t "jumped stacks." Taking the top device listed in the stack, we’ll use the !devstack command to get a list of all of the drivers in the stack, not just the ones that consumed stack locations in this IRP.
kd> !devstack 80d8dee8
!DevObj !DrvObj !DevExt ObjectName
> 80d8dee8 \FileSystem\ 80d8dfa0
80dbc868 \FileSystem\ 80dbc920
80dd6410 \Driver\ 80dd64c8
80dd7c38 \FileSystem\Fastfat 80dd7cf0
This is certainly an odd stack, being that there’s a device object from a driver in the \Driver\ namespace in the file system stack, but we’ll now examine each of the device objects.
kd> dt nt!_DEVICE_OBJECT 80d8dee8
+0x030 StackSize : 4 ''
kd> dt nt!_DEVICE_OBJECT 80dbc868
+0x030 StackSize : 3 ''
kd> dt nt!_DEVICE_OBJECT 80dd6410
+0x030 StackSize : 9 ''
kd> dt nt!_DEVICE_OBJECT 80dd7c38
+0x030 StackSize : 8 ''
Well, THAT certainly doesn’t look right. It would appear that in this case, the FSRecognizer device is the culprit, because he was the first to break the chain of its device’s StackSize being that of the lower driver plus one. But, c’mon, debugging is never that easy!
It turned out in this case that FilterA was actually at fault, because it had reused its device object. At an earlier point, the filter had attached itself to a stack with a single device object, which left FilterA’s StackSize at two. The FSRecognizer driver then came along and attached itself to FilterA, leaving FSRecognizer’s StackSize at three. FilterA then detached its device object from the original stack (IoDetachDevice) and reattached its device to another stack with IoAttachDeviceToDeviceStack. As we learned earlier, this routine sets the attaching device’s StackSize member to that of the lower device’s plus one. Note that it does nothing to the StackSize values of device object’s attached above the attaching device.
On the new stack, the lower device’s StackSize was eight, making FilterA’s StackSize nine. But, because the filter played this little trick, the FSRecognizer never realized that it had been moved from one stack to another and so it still had the old StackSize value. When FilterB came along, it did the right thing and set its StackSize to the lower driver’s plus one. Unfortunately, the stack was already horribly broken at that point and was doomed to blue screen.
Rate this article and give us feedback. Do you find anything missing? Share your opinion with the community!
Post Your Comment
"MUP fix for DFS"
If somebody recieves this bugcheck with DFS won't forget to obtain hot fix from MS: http://support.microsoft.com/kb/906866/en-us.
Anyway this article helped me to find buggy driver.
16-Nov-07, Eugene Lomovsky
I found it to be very useful. The Explanation is clear with an example.
30-Aug-04, richard bravo
"Missing wrap-up supporting debug output"
In the final conclusion of the analysis, it is stated that Filter A re-used it's device object, detached it and reattached it at a later stage disrupting the system.
It would be nice to prove this, if possible, with supporting debugger output.
29-Aug-04, Erwin Zoer
(1) Nice article to read. (2) Would love to see the source code that demostrated this bug.
25-Aug-04, William Jones