Paging I/O never completes and Special APC disabled

Hi all,

I have a driver reading files from a file system (here called somefs.sys), but when I open these files for asynchronous I/O, which means not using FILE_SYNCHRONOUS_IO_ALERT flag on IoCreateFile, I?m seeing a hang. I ended up finding eight threads stuck at the same point as described below:

0: kd> kc
*** Stack trace for last set context - .thread/.cxr resets it

Call Site

00 nt!KiSwapContext
01 nt!KiCommitThreadWait
02 nt!KeWaitForSingleObject
03 nt!MiWaitForInPageComplete
04 nt!CcFetchDataForRead
05 nt!CcCopyRead
06 somefs+0x4d49c
07 somefs+0x29c6a
08 nt!PspSystemThreadStartup
09 nt!KiStartSystemThread

All these eight threads have their special APC disabled as follows:

0: kd> dt nt!_KTHREAD @$thread SpecialApcDisable
+0x1c6 SpecialApcDisable : 0n-1

After some investigation, I saw that at the time CcCopyRead was called, special APC was still enabled, but during its execution, the same driver received a paging I/O read request (as expected), but this time the special APC was disabled.

This driver then creates new IRPs using IoMakeAssociatedIrp and call IoCallDriver routine still having special APC disabled.

0: kd> kc
*** Stack trace for last set context - .thread/.cxr resets it

Call Site

00 nt!IoMakeAssociatedIrp
01 somefs+0x4ca6b
02 somefs!DispatchRead+0x60
03 mup!MupiCallUncProvider+0x169
04 mup!MupStateMachine+0x165
05 mup!MupFsdIrpPassThrough+0x12d
06 fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x24f
07 fltmgr!FltpDispatch+0xcf
08 nt!IoPageRead+0x2a9
09 nt!MiPfExecuteReadList+0xff
10 nt!MmPrefetchForCacheManager+0xb5
11 nt!CcFetchDataForRead+0x17d
12 nt!CcCopyRead+0x180
13 somefs+0x4d49c
14 somefs+0x29c6a
15 nt!PspSystemThreadStartup+0x5a
16 nt!KxStartSystemThread+0x16

These threads then return to Cache Manager routines as shown at the first stack trace and they keep waiting for the paging I/O to complete, but they never complete.

I read from the documentation that special APC is used for I/O completion. When handling paging I/O, a driver should check for APCs using KeAreAllApcDisabled routine before creating new IRPs with IoBuildSynchronousFsdRequest, but I?m not sure about using IoMakeAssociatedIrp.

Just to mention, the hang never happens when I open files for synchronous I/O. I don’t think there are special requirements for opening files for asynchronous I/O, right?

Any idea is welcome.

Thanks in advance,
?
Fernando Roberto da Silva
DriverEntry Kernel Development
http:\www.DriverEntry.com.br

What do you see when you type “!apc” into the kernel debugger? That should show you the thread where the special kernel APC is pending.

Tony
OSR

These are results for the eight threads I mentioned.

0: kd> !apc thre fffffa80070b97e0
Thread fffffa80070b97e0 ApcStateIndex 0 ApcListHead fffffa80070b9830 [KERNEL]
Thread fffffa80070b97e0 ApcStateIndex 0 ApcListHead fffffa80070b9840 [USER]

0: kd> !apc thre fffffa80070ba040
Thread fffffa80070ba040 ApcStateIndex 0 ApcListHead fffffa80070ba090 [KERNEL]
Thread fffffa80070ba040 ApcStateIndex 0 ApcListHead fffffa80070ba0a0 [USER]

0: kd> !apc thre fffffa80070ba7a0
Thread fffffa80070ba7a0 ApcStateIndex 0 ApcListHead fffffa80070ba7f0 [KERNEL]
Thread fffffa80070ba7a0 ApcStateIndex 0 ApcListHead fffffa80070ba800 [USER]

0: kd> !apc thre fffffa80070bbb50
Thread fffffa80070bbb50 ApcStateIndex 0 ApcListHead fffffa80070bbba0 [KERNEL]
Thread fffffa80070bbb50 ApcStateIndex 0 ApcListHead fffffa80070bbbb0 [USER]

0: kd> !apc thre fffffa80070bc040
Thread fffffa80070bc040 ApcStateIndex 0 ApcListHead fffffa80070bc090 [KERNEL]
Thread fffffa80070bc040 ApcStateIndex 0 ApcListHead fffffa80070bc0a0 [USER]

0: kd> !apc thre fffffa80070bc660
Thread fffffa80070bc660 ApcStateIndex 0 ApcListHead fffffa80070bc6b0 [KERNEL]
Thread fffffa80070bc660 ApcStateIndex 0 ApcListHead fffffa80070bc6c0 [USER]

0: kd> !apc thre fffffa80070bdb50
Thread fffffa80070bdb50 ApcStateIndex 0 ApcListHead fffffa80070bdba0 [KERNEL]
Thread fffffa80070bdb50 ApcStateIndex 0 ApcListHead fffffa80070bdbb0 [USER]

0: kd> !apc thre fffffa80070be040
Thread fffffa80070be040 ApcStateIndex 0 ApcListHead fffffa80070be090 [KERNEL]
Thread fffffa80070be040 ApcStateIndex 0 ApcListHead fffffa80070be0a0 [USER]

Thanks,
Fernando.

I meant “without arguments”. That would show the thread WITH APCs in the queue - you don’t have any APCs in the threads you looked at.

Tony

So, whatever the reason is for not completing paging I/O, it has nothing to do with APCs, right?

Any special requirement for using files opened for asynchronous I/O that could lead us to a hang?

Thanks Tony!

It might, but if there’s no APC that hasn’t been delivered yet, it won’t involve them. Since you didn’t look at APCs through the entire system, I can’t tell you that it’s NOT an issue.

Paging I/O *does not* use APCs to indicate completion. When the I/O Manager is finishing an Irp with the IRP_PAGING_IO bit set, it explicitly sets the event object the Irp points to, which wakes up any waiting threads. Indeed, the reason for this is that paging I/O has ALWAYS been done in a context where special kernel APCs are blocked (IRQL APC_LEVEL prior to Server 2003, and the special kernel APC disable field non-zero since then).

What you need to do is figure out why the paging operation is not finishing.

Tony
OSR

I tried using !apc with no arguments, but it resulted on a huge output on a infinite loop.
It seems I’m facing a memory or dump corruption here.

I’ll try to track down the associated IRPs and see why they are not completing.

Many thanks for your help, Tony!

Regards,
Fernando.

in function CcFetchDataForRead first called MmPrefetchForCacheManager (internal called IoPageRead) and than MiWaitForInPageComplete, wich wait for some event. this event will be set, when IRP created in IoPageRead completed. this is special type IRP with flags (IRP_SYNCHRONOUS_PAGING_IO|IRP_SET_USER_EVENT). he never use APC, but simply KeSetEvent+IoFreeIrp and exit from IopfCompleteRequest. so error in next - Irp created in IoPageRead - not completed ! as result Event not set and MiWaitForInPageComplete infinite wait. APC state here absolute unrelated. try found this Irp (it must be in hanged thread irp list), and look who own it. usually this Irp go down to storage stack and complete in some DPC.

i be look under what condition you call read file ? are say IoGetTopLevelIrp() return 0

>waiting threads. Indeed, the reason for this is that paging I/O has ALWAYS been done in a context

where special kernel APCs are blocked

Just to add to this:

  • also, paging I/O uses a specially built MDL, which is not created by IoAllocateMdl, but (at least in NT4 when I was deep into this for the last time) instead is a part of a larger structure called “inpage support block” (it was freed by MiFreeInpageSupportBlock or a similarly named function).
  • also, the completion event used for paging I/O is in the same structure.

What makes me wonder is why people are so concerned about APCs, whether special kernel APCs are disabled or not so? The only valid reason for this is locking, and the docs are very much clean about KeEnterCriticalRegion to be called around ERESOURCE, and FAST_MUTEX raises to APC_LEVEL where this stuff is irrelevant.

So, just do as the docs say and forget about APCs completely.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Hi Harald, thanks for your help.

My driver is a layered file system driver, which uses the services provided by a different file system to attend the ones my driver receives. I already had problems with top level rip, and since then I’m taking care of reseting it to NULL before requesting any task to the underlying file system.

About finding the IRP, the paging I/O IRP is linked to the thread because it is shown when I do !thread on WinDbg, but when this driver receives this IRP, it create some associated IRPs from the paging I/O IRP (which are not linked to the thread) and send them to itself via IoCallDriver.

Now I’m going to track the associated IRPs and see what happens.

Hi Maxim, thanks for your help too.

The only reason I started looking at special APC state is because among about 1050 threads in the system, only 8 of them are stuck inside CcCopyRead and they are the only ones that have their special APC disabled. I did some research and I found this thread on NTFSD about paging I/O not completing as well. It lead me to think about how a file system driver should handle IRPs with special APC disabled and what are the conditions to use KeAreAllApcsDisabled() routine.

https://www.osronline.com/showthread.cfm?link=153032

Thank you people for all the help!

Fernando Roberto da Silva
DriverEntry Kernel Development
http://www.driverentry.com.br

>I already had
problems with top level IRP, and since then I’m taking care of reseting it to
NULL before requesting any task to the underlying file system.

if top level IRP != 0 - you already called from filesystem, and than again recursive call to filesystem. reset it to 0 (and than restore to original ?) can be not full correct solution. better pass IRP process to working thread. faster of all problem with recursive filesystem enter

>This driver then creates new IRPs using IoMakeAssociatedIrp and call IoCallDriver

however here can be and another solution - set in self device DeviceObject->StackSize = TargetDevice->StackSize + 1 (TargetDevice is device to where you send AssociatedIrp) and with this - you can not alocate additional AssociatedIrp but simply pass original Irp down to stack. if i correct understand you i also have similar driver (have virtual disk device object, which received Irp(IRP_MJ_SCSI. SCSIOP_WRITE|SCSIOP_READ in my stack location) from filesystem and than send this Irp back to filesystem (with IRP_MJ_WRITE or IRP_MJ_READ in next stack) for read ‘image’ file, on which based virtual disk. (first my think also was use IoMakeAssociatedIrp, but than understand that can simply use original Irp, with enouth stack count). also have problems with hang and first fix was zero/restore TopLevelIrp, but this only partial help. but after i begin always queue this Irp to worked item( and call filesystem from it) all begin work without any problem

OK, let me make sure you have the correct picture here.

My driver is a full file system which has its own name space. Lets call it MYFS. It is not a filter that uses shadow file objects for some file objects, but it always owns the FCB. Instead of using disk as a media, it uses another file system for storage. Lets call this file system driver SOMEFS, which is a third part driver that I don’t have the sources.

Assuming MYFS behaves like an application, I mean, buy just consuming the services provided by SOMEFS and not being attached to its device stack, I always get the current top level IRP into a local variable before resetting it to NULL, then I call IoCallDriver and restore it back when the call returns. So when SOMEFS gets called by Cache Manager to attend a page fault due to a CcCopyRead call for example, MYFS is not called recursively and, by consequence, it doesn’t change the top level IRP.

In this specific case, MYFS sends an IRP to SOMEFS, which will call CcCopyRead to attend the request, but the data is not available at the cache this time and SOMEFS posts the IRP to be executed on a work item. Eventually the IRP is processed by a system thread that calls CcCopyRead again.

0: kd> kc
*** Stack trace for last set context - .thread/.cxr resets it

Call Site

00 nt!CcCopyRead
01 somefs+0x4d49c
02 somefs+0x29c6a
03 nt!PspSystemThreadStartup
04 nt!KiStartSystemThread

This causes a page fault and SOMEFS receives the paging I/O IRP. To attend the paging I/O IRP, SOMEFS creates a set of associated IRPs as we an see below.

0: kd> kc
*** Stack trace for last set context - .thread/.cxr resets it

Call Site

00 nt!IoMakeAssociatedIrp
01 somefs+0x4ca6b
02 somefs!DispatchRead+0x60
03 mup!MupiCallUncProvider+0x169
04 mup!MupStateMachine+0x165
05 mup!MupFsdIrpPassThrough+0x12d
06 fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x24f
07 fltmgr!FltpDispatch+0xcf
08 nt!IoPageRead+0x2a9
09 nt!MiPfExecuteReadList+0xff
10 nt!MmPrefetchForCacheManager+0xb5
11 nt!CcFetchDataForRead+0x17d
12 nt!CcCopyRead+0x180
13 somefs+0x4d49c
14 somefs+0x29c6a
15 nt!PspSystemThreadStartup+0x5a
16 nt!KxStartSystemThread+0x16

Later SOMEFS calls IoCallDriver to send the IRPs it created to its own device. This call return STATUS_PENDING, the thread goes back to Cache Manager and it keeps waiting for the paging I/O to complete.

0: kd> kc
*** Stack trace for last set context - .thread/.cxr resets it

Call Site

00 nt!KiSwapContext
01 nt!KiCommitThreadWait
02 nt!KeWaitForSingleObject
03 nt!MiWaitForInPageComplete
04 nt!CcFetchDataForRead
05 nt!CcCopyRead
06 somefs+0x4d49c
07 somefs+0x29c6a
08 nt!PspSystemThreadStartup
09 nt!KiStartSystemThread

As I mentioned, I don?t have SOMEFS sources. I didn?t check how it is dealing with top level IRP yet, but !locks doesn?t show any thread waiting for some owned resource.

Now I?m working to track down the associated IRPs, see how they?re processed by SOMEFS and take a look how it deals with top level IRP, but it obviously takes some time.

Thanks in advance,
?
Fernando Roberto da Silva
DriverEntry Kernel Development
http://www.driverentry.com.br

On Tue, 22 Dec 2015, xxxxx@driverentry.com.br wrote:

My driver is a full file system which has its own name space. Lets call it MYFS. It is not a filter that uses shadow file objects for some file objects, but it always owns the FCB. Instead of using disk as a media, it uses another file system for storage. Lets call this file system driver SOMEFS, which is a third part driver that I don’t have the sources.

That other file must be non cached. This is a well known limitation/bug in
Windows that also has hit Virtual Disks.

Bo Branten

Hi Bo,

Are you saying the file at SOMEFS should not use the cache? I mean, should MYFS open the target file using FILE_NO_INTERMEDIATE_BUFFERS flag on IoCreateFile?

That’s is an interesting point. This is the first time I heard about this well known limitation/bug. Do you know any documentation or article about this I can take a look about this?

Just to let you know, I’m not using the target file as a virtue disk, but as a regular file. My driver is not using the cache and since it resets the top level IRP to NULL, what would be the difference between MYFS and a regular driver reading a file?

Thanks for your help,

Fernando Roberto da Silva
DriverEntry Kernel Development
http://www.driverentry.com.br

On Tue, 22 Dec 2015, xxxxx@driverentry.com.br wrote:

Are you saying the file at SOMEFS should not use the cache? I mean, should MYFS open the target file using FILE_NO_INTERMEDIATE_BUFFERS flag on IoCreateFile?

Yes.

Do you know any documentation or article about this I can take a look
about this?

A quick search found this thread:
http://www.osronline.com/showThread.cfm?link=160524

I let you know if I found something more.

Bo Branten

Hi Bo,

I was expecting to see something more specific that would explain the reason Cache Manager deadlocks when using cache on both file systems. By the way, my file system is not calling CcInitializeCacheMap to use the cache on the FCB it owns. So only SOMEFS is using cache. Also, we are facing a hang during read operations, and not on write. This should be OK as you mentioned on the thread you sent.

I’ll keep looking forward on this and it doesn’t hurt to make a change on MYFS to start using this flag, which is not that easy because using it would force I/O requests to be rounded to some size I don’t remember, maybe 512 bytes.

I’ll post whatever the result I get.

Thanks Bo,

Fernando Roberto da Silva
DriverEntry Kernel Development
http://www.driverentry.com.br

This isn’t strictly true. What you cannot have is two caches for the same file object (or technically the same SOP). In our own work, we routinely find it necessary to switch between cached and non-cached I/O in layered file systems. We do this via shadow file objects, though, because you can only have one view of the file associated with a given file object.

Once decoupled in this way, you can cache at multiple levels. There’s a separate question of whether or not this is a good idea - after all, it can effectively decrease your cache efficiency by 50% - but sometimes it’s required. For example, when running over the network, you’ll find that non-cached I/O behaves differently. In fact, these differences can vary by redirector implementation.

Tony
OSR