User-Kernel Shared Events Vs Async IOCTL

Sagi_Bar · November 8, 2011, 8:09am

Hi Guys,

I have a design related question.
Suppose I have a KMDF driver which has shared memory (mapped via MDL or whatever) with user mode. Memory sharing is not the issue here.
Also suppose that the driver moves chunks of data (fixed size) to the shared buffer during ISR called DPC routine.
This driver has three channels, each holding a separate shared buffer and each producing around 30-40 chunks per second. The chunks ate quite large - several MByte each.

Also,the project consists of both User and Kernel mode SW (driver + SDK) so I can do whatever I want in both.

It looks like I have two options to implement the “get_memory_chunk” flow from user to kernel mode

The usual mode - using ASYNC IOCTLs.
For each channel :
a. Implement parallel queue and some collection object for get_memory_chunk request. Upon receiving IOCTL in the driver, push the received request (if no chunks are ready) to the collection of requests.
b. When ISR scheduled DPC runs,another chunk is ready. Copy the chunk to shared buffer, pop (head or tail) the list of requests and complete the request, hence notifying the user mode that the chunk is ready.
Using shared event
For each channel :
a. Implement shared event between user and kernel . In user mode there will be thread (for each channel) waiting on the event.
b. Don’t send any get_memory_chunk IOCTLs from user to kernel.
c. In kernel, during ISR scheduled DPC, copy the chunk to shared buffer and then signal user mode using the event that the chunk is ready.

The funny thing is that option 2 actually looks much better - you don’t have to worry about requests, timeouts, holding queues or power state changes.
The driver actually becomes a simply copy+notify machine.

But, isn’t it too good to be true ?

What do you guys think ?

10x
Sagi Bar

Alex_Grig · November 8, 2011, 8:36am

Why does it always have to be some perverted IO model with shared memory and events? What’s wrong with just posting IOCTL buffers of DIRECT type, and completing them as the transfer goes? Do you have proper code to unmap the shared buffer if your client application gets forcibly terminated? It’s more difficult to implement than IRP cancellation.

Don_Burn · November 8, 2011, 8:38am

“xxxxx@walla.co.il” wrote in message
news:xxxxx@ntdev:

> The funny thing is that option 2 actually looks much better - you don’t have to worry about requests, timeouts, holding queues or power state changes.
> The driver actually becomes a simply copy+notify machine.

Well you do have to worry about a number of other things. How does the
user mode know how many chunks are available? How does the driver know
which chunks are processed, so their buffers are reusable? How does the
driver know that the process is dying, so the operations should be
stopped, and the memory dallocated?

You have walked into the trap that most developers who try to use events
find out, there is a lot of book keeping that the event handling does
not take care of. By the time you put in all the support for this type
of stuff, there is a lot less code in an inverted call even if the
driver were WDM, let alone KMDF.

Don Burn
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

OSR_Community_User · November 8, 2011, 8:44am

Sharing events between user and kernel seems a very strange way to handle
this. The inverted call mechanism already discussed within the last day
or two of this newsgroup would be a better choice.

Essentially, when I hear the phrase “shared event”, particularly with the
description you have given, my first response is “the design is wrong”,
and my second response is “What RTOS did this person last use?”

There is somehow an illusion that using events is somehow more elegant,
more efficient, more something than inverted call. I find this hard to
believe.

The phrase “shared memory” has many meanings, but is inappropriate here.
What you have is a user buffer mapped to the kernel address space. I have
not heard this ever referred to as “shared memory”, which has a somewhat
different meaning in Windows (memory shared between user-level processes).

And it is generally agreed that making kernel memory be shared with user
space is a very dangerous technique. I have heard (and perhaps someone
can verify) that such a driver will not pass WHQL certification.

You are trying to build a complex, convoluted, fragile, expensive, most
likely unmaintainable, and quite possibly not WHQL-certifiable solution to
what is already known to be a simple problem.
joe

Hi Guys,

I have a design related question.
Suppose I have a KMDF driver which has shared memory (mapped via MDL or
whatever) with user mode. Memory sharing is not the issue here.
Also suppose that the driver moves chunks of data (fixed size) to the
shared buffer during ISR called DPC routine.
This driver has three channels, each holding a separate shared buffer and
each producing around 30-40 chunks per second. The chunks ate quite large

several MByte each.

Also,the project consists of both User and Kernel mode SW (driver + SDK)
so I can do whatever I want in both.

It looks like I have two options to implement the “get_memory_chunk” flow
from user to kernel mode

The usual mode - using ASYNC IOCTLs.
For each channel :
a. Implement parallel queue and some collection object for
get_memory_chunk request. Upon receiving IOCTL in the driver, push the
received request (if no chunks are ready) to the collection of requests.
b. When ISR scheduled DPC runs,another chunk is ready. Copy the chunk to
shared buffer, pop (head or tail) the list of requests and complete the
request, hence notifying the user mode that the chunk is ready.

Using shared event
For each channel :
a. Implement shared event between user and kernel . In user mode there
will be thread (for each channel) waiting on the event.
b. Don’t send any get_memory_chunk IOCTLs from user to kernel.
c. In kernel, during ISR scheduled DPC, copy the chunk to shared buffer
and then signal user mode using the event that the chunk is ready.

The funny thing is that option 2 actually looks much better - you don’t
have to worry about requests, timeouts, holding queues or power state
changes.
The driver actually becomes a simply copy+notify machine.

But, isn’t it too good to be true ?

What do you guys think ?

10x
Sagi Bar

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Sagi_Bar · November 8, 2011, 9:14am

Hi Guys,

Thank you for your quick answers.
I myself prefer not to use shared events, however my managment seems to like the idea too much.
This is why I am trying to come with list of points proving it’s a very bad solution.
However, I can’t use anything except user space mapped buffer (it’s legal btw. if you use section mappings) due to HW problems.

So far I have the list of points that Don has pointed out :

How does the user mode know how many chunks are available?

How does the driver know which chunks are processed, so their buffers are reusable?

How does the driver know that the process is dying, so the operations should be stopped, and the memory deallocated?

Can anyone think of any more such points ?

10x
Sagi Bar

OSR_Community_User · November 8, 2011, 9:23am

The most serious point is “Windows is not designed to work that way, and
any attempt to make it work that way will cost considerably more and take
considerably longer than a Windows-compliant solution. The solution may
also take considerable support effort after first deployment.”
joe

Hi Guys,

Thank you for your quick answers.
I myself prefer not to use shared events, however my managment seems to
like the idea too much.
This is why I am trying to come with list of points proving it’s a very
bad solution.
However, I can’t use anything except user space mapped buffer (it’s legal
btw. if you use section mappings) due to HW problems.

So far I have the list of points that Don has pointed out :

How does the user mode know how many chunks are available?

How does the driver know which chunks are processed, so their buffers are
reusable?

How does the driver know that the process is dying, so the operations
should be stopped, and the memory deallocated?

Can anyone think of any more such points ?

10x
Sagi Bar

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Daniel_Terhell · November 8, 2011, 9:54am

wrote in message news:xxxxx@ntdev…
>
> 2. Using shared event …
> c. In kernel, during ISR scheduled DPC, copy the chunk to shared buffer
> and then signal user mode using the event that the chunk is ready.
>

See that there is a race here. You driver signals user mode that the chunk
is ready but your driver does not know when usermode has finished processing
it. So it does not know when it can copy a new chunk to the buffer. That is
a first complication, so you will need separate buffers or a second event to
signal your driver that user mode is ready. If you continue thinking from
there, new complications are going to arise. For instance: your ISR and DPC
cannot wait for an event signalled from usermode because they run at
elevated IRQL. etc. etc.

I am not against sharing events between usermode and kernelmode, in some
cases they can be great for simple notifications but there is no way you can
pass data along with them. I’m also not at all against sharing memory
between usermode and kernel mode but it is hard to get right. This can be a
great solution sometimes but you do have to know the
implications and the complications.

The good thing about pended IOCTL is that you pass the notification + the
data without the need of any complicated synchronization schemes.

Regards,

Daniel Terhell
Resplendence Software Projects Sp
http://www.resplendence.com

OSR_Community_User · November 8, 2011, 10:15am

Of course, one of the most common beginner mistakes is to confuse events
with other objects, like mutex and semaphore. The number of times I have
had to rewrite code because of some bizarre notion that events can
substitute for either mutex or semaphore is far too high. I’ve even had
students argue with me in class that you can implement a mutex or
semaphore just by using events. So for “homework” I tell them “prove it”
and the next day they will have a solution I can demolish in less than
five minutes. The classic excuse is “Mutex (or semaphore) is just too
complicated”. Apparently the fact that they actually *work correctly* to
solve the programmer’s problem doesn’t get consideration.

Perhaps the fundamental error here is that a semaphore would be a more
meaningful object to use. We are, after all, looking at a classic
producer/consumer queue model, first described clearly by Edsgar Dijkstra
around 1968 or 1969 (in either case, over 40 years ago!)
joe

wrote in message news:xxxxx@ntdev…
>>
>> 2. Using shared event …
>> c. In kernel, during ISR scheduled DPC, copy the chunk to shared buffer
>> and then signal user mode using the event that the chunk is ready.
>>
>
> See that there is a race here. You driver signals user mode that the chunk
> is ready but your driver does not know when usermode has finished
> processing
> it. So it does not know when it can copy a new chunk to the buffer. That
> is
> a first complication, so you will need separate buffers or a second event
> to
> signal your driver that user mode is ready. If you continue thinking from
> there, new complications are going to arise. For instance: your ISR and
> DPC
> cannot wait for an event signalled from usermode because they run at
> elevated IRQL. etc. etc.
>
> I am not against sharing events between usermode and kernelmode, in some
> cases they can be great for simple notifications but there is no way you
> can
> pass data along with them. I’m also not at all against sharing memory
> between usermode and kernel mode but it is hard to get right. This can be
> a
> great solution sometimes but you do have to know the
> implications and the complications.
>
> The good thing about pended IOCTL is that you pass the notification + the
> data without the need of any complicated synchronization schemes.
>
> Regards,
>
> Daniel Terhell
> Resplendence Software Projects Sp
> http://www.resplendence.com
>
>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

Alex_Grig · November 8, 2011, 11:14am

“bizarre notion that events can substitute for either mutex or semaphore is far too high”

The actual reason is that the usability of Windows semaphores is asymptotically close to 0, and usability of mutexes over a CRITICAL_SECTION is very close to zero, too.

Daniel_Terhell · November 8, 2011, 11:34am

taken to nttalk

wrote in message news:xxxxx@ntdev…
> “bizarre notion that events can substitute for either mutex or semaphore
> is far too high”
>
> The actual reason is that the usability of Windows semaphores is
> asymptotically close to 0, and usability of mutexes over a
> CRITICAL_SECTION is very close to zero, too.
>

Pavel_A1 · November 8, 2011, 11:41am

Designs with a “shared buffer” in Windows have one clear advantage: they
are easier to debug in the hardware bring-up phase.
How? With a shared buffer debugging interaction of hardware
and the driver can be done separately from interaction of driver and
usermode.
So you can make progress with the hardware as soon as you get the driver
starting; even before the usermode side is ready, and delay messing with
IRPs, ioctls, events, timeouts and all that nuisance to later time.

/* You’ve already named another advantage: easier for management to
understand - which can’t be underestimated… but it is not technical. */

However, delaying development of the usermode side is risky, as the same
management, in their wisdom, can let someone else own it. You don’t want
this to happen!
– pa

Tim_Roberts · November 8, 2011, 12:42pm

xxxxx@flounder.com wrote:

Sharing events between user and kernel seems a very strange way to handle
this. The inverted call mechanism already discussed within the last day
or two of this newsgroup would be a better choice.

Your first sentence is a bit harsh. The reality is that the inverted
call mechanism **IS** a shared event scheme. It is merely the case that
the event is hidden in the OVERLAPPED structure and handled by the I/O
manager, rather than managed explicitly by the driver.

That’s a key insight that’s easy to miss in this debate. The inverted
call mechanism and the shared event mechanism are, at an architectural
level, fundamentally the same. They simply use different API
spellings. Because the inverted call mechanism is managed by I/O
manager, there are fewer opportunities for the driver writer to screw it up.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Peter_Wieland · November 8, 2011, 1:04pm

As Don points out, if you have 30-40 chunks coming in per second either you’re putting these into a really large shared buffer (which you have to have mapped in kernel-mode, taking up precious kernel virtual address space) and somehow telling the client where in the shared buffer to look and hoping not to overrun if the client stalls for some reason, or you’re waiting for the client to tell you it’s done with a chunk in which case (a) you’re depending on your client to behave correctly which is always a mistake (b) forcing the client to copy the chunk out of the shared buffer into its own memory anyway and/or (c) limiting the ability to scale out your clients.

Any time you have a design where you take data from the device, copy it to a shared region then tell the client to copy it out of the shared region you’ve made a mistake (at least in Windows - this might be a great design in some other OS). This is the perfect use for I/O requests - can let your clients queue multiple buffers in your driver at a time, and you can copy directly into the client buffers. The client can decide:
* how many buffers it needs to queue based on how much data its willing to lose
* how to allocate those buffers (one big slab, dynamically)
* whether to use synchronous I/O and multiple threads to scale out, or asynchronous I/O and a small thread pool

Since your chunks are big you can use direct I/O. You map only one chunk buffer per client/channel/etc… at a time (saving VA space) and copy your data to exactly where the client wants it.

Better if your device can do scatter-gather DMA you can program it to put the data directly into the client buffers and keep the CPU entirely out of the data transfer path. At a chunk every 25 ms you should be able to turn around and program the next DMA operation in time for the next chunk.

-p

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Don Burn
Sent: Tuesday, November 08, 2011 5:38 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] User-Kernel Shared Events Vs Async IOCTL

“xxxxx@walla.co.il” wrote in message
news:xxxxx@ntdev:

> The funny thing is that option 2 actually looks much better - you don’t have to worry about requests, timeouts, holding queues or power state changes.
> The driver actually becomes a simply copy+notify machine.

Well you do have to worry about a number of other things. How does the user mode know how many chunks are available? How does the driver know which chunks are processed, so their buffers are reusable? How does the driver know that the process is dying, so the operations should be stopped, and the memory dallocated?

You have walked into the trap that most developers who try to use events find out, there is a lot of book keeping that the event handling does not take care of. By the time you put in all the support for this type of stuff, there is a lot less code in an inverted call even if the driver were WDM, let alone KMDF.

Don Burn
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

—
NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Mark_Roddy · November 8, 2011, 1:12pm

Just to totally derail this discussion; your management should not be
managing details like this - it is an implementation detail that
should be decided by you and your peers working on this project.
Management should be involved in overall project architecture, in
scheduling the project, in defining product requirements, in tracking
progress etc. not in writing the bits of code that implement the
product.

That said, there is simply no justification for the event model given
the data transfer requirements - as pointed out, the direct IO async
transfer model is superior in all respects.

Mark Roddy

On Tue, Nov 8, 2011 at 9:13 AM, wrote:
>
> Hi Guys,
>
> Thank you for your quick answers.
> I myself prefer not to use shared events, however my managment seems to like the idea too much.
> This is why I am trying to come with list of points proving it’s a very bad solution.
> However, I can’t use anything except user space mapped buffer (it’s legal btw. if you use section mappings) due to HW problems.
>
> So far I have the list of points that Don has pointed out :
>
> How does the user mode know how many chunks are available?
>
> How does the driver know which chunks are processed, so their buffers are reusable?
>
> How does the driver know that the process is dying, so the operations should be stopped, and the memory deallocated?
>
> Can anyone think of any more such points ?
>
> 10x
> Sagi Bar
>
>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>

Sagi_Bar · November 8, 2011, 1:16pm

Hi All,

Thanks for all the answers - I really appreciate the help.
However, please lets forget about the buffer mapped to user space.
Lets assume it has a smart management mechanism, where all not requested chunks are dropped on time etc.
The thing is, this clients HW writes to several layers of predefined addresses - like, the first layer holds pointers to another layer, which in turn has pointers to actual buffer chunks.
While the third level can be changed (by replacing second level pointer), the first level can’t be w/o stopping the HW.
This means I can’t work with user supplied buffers unless I copy the data from HW buffers (third layer) to them, which proved to be very expansive.
I can however manage a smart queue over shared buffer, dropping not requested chunks.
I know, it sounds weird, , but I have no influence over it.

So the actual question is about notification.
Looking at the memory management mechanism as black box, what would you suggest - shared event notification scheme or async IOCTL.

10x again
Sagi Bar

Anyway,

Peter_Wieland · November 8, 2011, 1:21pm

Still would recommend async IOCTLs. You know when someone is waiting, so you can decide whether to drop data. And you let the client decide how to receive the notification - via a thread wakeup (synchronous I/O) or a completion port (asynchronous I/O).

-p

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@walla.co.il
Sent: Tuesday, November 08, 2011 10:15 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] User-Kernel Shared Events Vs Async IOCTL

Hi All,

Thanks for all the answers - I really appreciate the help.
However, please lets forget about the buffer mapped to user space.
Lets assume it has a smart management mechanism, where all not requested chunks are dropped on time etc.
The thing is, this clients HW writes to several layers of predefined addresses - like, the first layer holds pointers to another layer, which in turn has pointers to actual buffer chunks.
While the third level can be changed (by replacing second level pointer), the first level can’t be w/o stopping the HW.
This means I can’t work with user supplied buffers unless I copy the data from HW buffers (third layer) to them, which proved to be very expansive.
I can however manage a smart queue over shared buffer, dropping not requested chunks.
I know, it sounds weird, , but I have no influence over it.

So the actual question is about notification.
Looking at the memory management mechanism as black box, what would you suggest - shared event notification scheme or async IOCTL.

10x again
Sagi Bar

Anyway,

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · November 8, 2011, 3:22pm

I’ve used semaphores a lot, and I am unaware of any problems.

Mutexes are useful if you need to do cross-process synchronization, which
is fairly rare, or need to handle timeouts as part of deadlock detection.
Or, if you want to block on a lock but have some global “get out of wait”
notification, as might be required to force a thread which has been
blocked too long to terminate. However, WAIT_ABANDONED is a singularly
unpleasant place to find youself; the CRITICAL_SECTION behavior of merely
hanging eternally is often less complex to deal with, unless, of course,
you happen to like robust code that doen’t hang. I once had to write code
that could deal with an abandoned mutex, and It Was Not A Pretty Sight,
trying to guarantee that data structure corruption was detectable and
fixable! (I think there are a couple PhD dissertations lurking behind
that problem!)

I remember one app programmer who thought exit() was a valid respone to an
error, and if he found some inconsistency in his driver, or unexpected
value in a status register in his hardware, the corresponding correct
action was BugCheck[Ex]. I finally explained it as “Your driver is a
guest in the operating system. It is expected to be a polite guest.
Imagine having a guest in your home whose response to running out of
toilet paper was to kill your family and burn your house down!”
joe

“bizarre notion that events can substitute for either mutex or semaphore
is far too high”

The actual reason is that the usability of Windows semaphores is
asymptotically close to 0, and usability of mutexes over a
CRITICAL_SECTION is very close to zero, too.

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · November 8, 2011, 3:28pm

As I said, why develop a needlessly complex solution that requires exotic
coding and complex protocols when the correct answer is simple, obvious,
and intrinsically well-supported by the system?
joe

As Don points out, if you have 30-40 chunks coming in per second either
you’re putting these into a really large shared buffer (which you have to
have mapped in kernel-mode, taking up precious kernel virtual address
space) and somehow telling the client where in the shared buffer to look
and hoping not to overrun if the client stalls for some reason, or you’re
waiting for the client to tell you it’s done with a chunk in which case
(a) you’re depending on your client to behave correctly which is always a
mistake (b) forcing the client to copy the chunk out of the shared buffer
into its own memory anyway and/or (c) limiting the ability to scale out
your clients.

Any time you have a design where you take data from the device, copy it to
a shared region then tell the client to copy it out of the shared region
you’ve made a mistake (at least in Windows - this might be a great design
in some other OS). This is the perfect use for I/O requests - can let
your clients queue multiple buffers in your driver at a time, and you can
copy directly into the client buffers. The client can decide:
* how many buffers it needs to queue based on how much data its willing to
lose
* how to allocate those buffers (one big slab, dynamically)
* whether to use synchronous I/O and multiple threads to scale out, or
asynchronous I/O and a small thread pool

Since your chunks are big you can use direct I/O. You map only one chunk
buffer per client/channel/etc… at a time (saving VA space) and copy your
data to exactly where the client wants it.

Better if your device can do scatter-gather DMA you can program it to put
the data directly into the client buffers and keep the CPU entirely out of
the data transfer path. At a chunk every 25 ms you should be able to turn
around and program the next DMA operation in time for the next chunk.

-p

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Don Burn
Sent: Tuesday, November 08, 2011 5:38 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] User-Kernel Shared Events Vs Async IOCTL

“xxxxx@walla.co.il” wrote in message
> news:xxxxx@ntdev:
>
>> The funny thing is that option 2 actually looks much better - you don’t
>> have to worry about requests, timeouts, holding queues or power state
>> changes.
>> The driver actually becomes a simply copy+notify machine.
>
> Well you do have to worry about a number of other things. How does the
> user mode know how many chunks are available? How does the driver know
> which chunks are processed, so their buffers are reusable? How does the
> driver know that the process is dying, so the operations should be
> stopped, and the memory dallocated?
>
> You have walked into the trap that most developers who try to use events
> find out, there is a lot of book keeping that the event handling does not
> take care of. By the time you put in all the support for this type of
> stuff, there is a lot less code in an inverted call even if the driver
> were WDM, let alone KMDF.
>
>
>
> Don Burn
> Windows Filesystem and Driver Consulting
> Website: http://www.windrvr.com
> Blog: http://msmvps.com/blogs/WinDrvr
>
>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

OSR_Community_User · November 8, 2011, 3:46pm

> xxxxx@flounder.com wrote:

> Sharing events between user and kernel seems a very strange way to
> handle
> this. The inverted call mechanism already discussed within the last day
> or two of this newsgroup would be a better choice.

Your first sentence is a bit harsh. The reality is that the inverted
call mechanism **IS** a shared event scheme. It is merely the case that
the event is hidden in the OVERLAPPED structure and handled by the I/O
manager, rather than managed explicitly by the driver.
*************
Yes, but this is a technique that is well-understood and supported by the
OS. And if you don’t supply an event (provide a NULL handle) then you
don’t have an event model at all; you can use IOCPs. Yet the tendency to
re-invent the wheel by simulating in a complex fashion what is already
there makes me wonder if the driver writers have any clue about how
application I/O works. In my system programming course, I devoted almost
two hours of lecture to this topic. One irate student stated, at the end,
“You have just wasted two hours telling us about techniques I’ve never
seen anyone use!”. In his entire programming career (which I remember as
having been more than ten years), he had never done anything other than
fully synchronous I/O. I pointed out to him that you rarely built
LARGE-scale server-style apps that could deliver the required I/O
performance using only synchronous I/O, and that this was, after all, a
course on effective Windows system programming, and at the start I said
“it’s all about the concurrency”.

This is why I prefer to use built-in facilities, like using IOCPs as
interthread queues. “They have the same complexity [as the
mutex-semaphore lab we just did] but YOU don’t have to reason about. the
correctness”
joe

That’s a key insight that’s easy to miss in this debate. The inverted
call mechanism and the shared event mechanism are, at an architectural
level, fundamentally the same. They simply use different API
spellings. Because the inverted call mechanism is managed by I/O
manager, there are fewer opportunities for the driver writer to screw it
up.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Maxim_S_Shatskih · November 8, 2011, 4:57pm

> It looks like I have two options to implement the “get_memory_chunk” flow from user to kernel mode

The usual mode - using ASYNC IOCTLs.

Throw away the shared buffer and use the async “get next chunk” IOCTL (inverted call).

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com