OSRLogo
OSRLogoOSRLogoOSRLogo x Seminar Ad
OSRLogo
x

Everything Windows Driver Development

x
x
x
GoToHomePage xLoginx
 
 

    Thu, 14 Mar 2019     118020 members

   Login
   Join


 
 
Contents
  Online Dump Analyzer
OSR Dev Blog
The NT Insider
The Basics
File Systems
Downloads
ListServer / Forum
  Express Links
  · The NT Insider Digital Edition - May-June 2016 Now Available!
  · Windows 8.1 Update: VS Express Now Supported
  · HCK Client install on Windows N versions
  · There's a WDFSTRING?
  · When CAN You Call WdfIoQueueP...ously

A New Way to DMA - Busmaster DMA in Windows 2000

 

Those of use who’ve been watching NT morph into Windows 2000 will no doubt be aware of the fact that there are some changes to how busmaster DMA is being handled.  Most of these changes are small, but there are some changes that are significant.  Better yet, some changes have some really interesting undocumented side-effects.

 

DMA Operations

First of all, let’s be clear: You can take an NT V4 driver that does DMA the NT V4 way, and build it with the Windows 2000 DDK and get a relatively well-behaved Windows 2000 driver.  It’ll call all the right functions automatically, due to the USE_DMA_MACROS conditional in NTDDK.H.  Of course, your driver won’t be PnP compliant, and it won’t participate in power management.  But at least it’ll work.  Hey, you can’t expect too much if all you’re willing to do is just recompile and rebuild.

 

However, the DMA model in Windows 2000 has changed a bit due to the restructure of the HAL and bus support.  The biggest change is that individual bus drivers are now implemented as standard kernel mode drivers that use a specific set of APIs.  Via these APIs the bus drivers can export bus-specific implementations of many of the common DMA support functions, including functions to allocate a common buffer, allocate an adapter channel, and map transfers.

 

The key to this change is that the old ADAPTER_OBJECT is now obsolete.  Taking its place is the new DMA_ADAPTER data structure (which is apparently no longer an Object Manager object), the definition for which is shown in Figure 1.

 

typedef struct _DMA_ADAPTER {

    USHORT Version;

    USHORT Size;

    PDMA_OPERATIONS DmaOperations;

    // Private Bus Device Driver data follows,

} DMA_ADAPTER, *PDMA_ADAPTER;

 

Figure 1 -- DMA_ADAPTER Data Structure

 

Not a very interesting definition, is it?  Well, we’ll get to the useful part of the DMA_ADAPTER structure in a minute.  To get a pointer to the DMA_ADAPTER for the appropriate bus, you now call the I/O Manager, instead of calling the HAL as you have in the past to get your Adapter Object pointer. You call the I/O Manager via the function IoGetDmaAdapter(), the prototype for which is shown in Figure 2.

 

struct _DMA_ADAPTER *

IoGetDmaAdapter(

    IN PDEVICE_OBJECT PhysicalDeviceObject,           OPTIONAL // required for PnP drivers

    IN struct _DEVICE_DESCRIPTION *DeviceDescription,

    IN OUT PULONG NumberOfMapRegisters

    );

 

Figure 2 -- DMA_ADAPTER structure

 

Just like it predecessor HalGetAdapter(), IoGetDmaAdapter() takes as input a pointer to a DEVICE_DESCRIPTION data structure and pointer to a ULONG into which to return the maximum number of mapping registers your driver is allowed to request for one transfer.  You get back, as the return value from IoGetDmaAdapter(), a pointer to your DMA Adapter.  Note that unlike its obsolete predecessor, IoGetDmaAdapter() takes a pointer to a Device Object.  The I/O Manager uses this Device Object to send an IRP_MJ_PNP, IRP_MN_QUERY_INTERFACE request to the underlying bus driver.  From the returned BUS_INTERFACE_STANDARD, the I/O Manager calls the GetDmaAdapter() function and returns the DMA Adapter pointer.

 

While the structure of the DMA_ADAPTER is almost entirely bus-specific, the interesting and useful part of it is the pointer it provides to the DMA_OPERATION structure.  This structure contains pointers to the bus’ implementation of the various DMA functions.  The DMA_OPERATIONS structure is shown in Figure 3.

 

 

typedef struct _DMA_OPERATIONS {

    ULONG Size;

    PPUT_DMA_ADAPTER PutDmaAdapter;

    PALLOCATE_COMMON_BUFFER AllocateCommonBuffer;

    PFREE_COMMON_BUFFER FreeCommonBuffer;

    PALLOCATE_ADAPTER_CHANNEL AllocateAdapterChannel;

    PFLUSH_ADAPTER_BUFFERS FlushAdapterBuffers;

    PFREE_ADAPTER_CHANNEL FreeAdapterChannel;

    PFREE_MAP_REGISTERS FreeMapRegisters;

    PMAP_TRANSFER MapTransfer;

    PGET_DMA_ALIGNMENT GetDmaAlignment;

    PREAD_DMA_COUNTER ReadDmaCounter;

    PGET_SCATTER_GATHER_LIST GetScatterGatherList;

    PPUT_SCATTER_GATHER_LIST PutScatterGatherList;

} DMA_OPERATIONS

 

Figure 3 -- The DMA_OPERATIONS Structure

 

With a pointer to the appropriate DMA Adapter, and hence a pointer to the correct DMA_OPERATIONS structure, a driver is now ready to call the bus-specific DMA support routines.  By looking at the DMA_OPERATIONS structure, you’ll see that most of the familiar DMA-related functions are now contained here.  A busmaster DMA driver calls these functions using the supplied pointers in the DMA_OPERATIONS structure, instead of calling the traditional NT V4-defined function names.

 

Thus, instead of calling IoAllocateAdapterChannel() as you did in NT V4, you now call the AllocateAdapterChannel() from the DMA_OPERATIONS structure.  Your code to do this would look like the following:

 

                DmaOps = DmaAdapter->DmaOperations;

               

Status = DmaOps->AllocateAdapterChannel(DmaAdapter,

 DeviceObject,

 NumberOfMapRegisters,

 ExecutionRoutine,

 Context );

 

Similarly, you call the appropriate function in the DMA_OPERATIONS structure in place of your former favorites HalAllocateCommonBuffer(), HalFreeCommonBuffer(), IoFlushAdapterBuffers(), IoFreeAdapterChannel(), IoFreeMapRegisters(), IoMapTransfer(), HalGetDmaAlignmentRequirement(), and HalReadDmaCounter().

 

And Now For Something Completely Different

But now the real fun begins.  You’ll notice two entries in the DMA_OPERATIONS structure that don’t correspond directly to any of the old DMA functions.  These two functions are
DmaOps->GetScatterGatherList() and DmaOps->PutScatterGatherList().  And it’s these two functions that implement an “all new and improved” Windows 2000 busmaster DMA model.

 

In the standard Windows NT DMA model, which is still perfectly valid and useable in Windows 2000, drivers perform the following steps to initiate a busmaster DMA transfer:

  1. Call KeFlushIoBuffers(), passing the MDL and the direction of the transfer.
  2. Determine the number of mapping registers required for this transfer, which is limited to the number returned by IoGetDmaAdapter().  Then call AllocateAdapterChannel() to allocate the Adapter and any map registers needed for the transfer.   When the required resources are available, the I/O Manager calls the driver back at its AdapterControl() routine.
  3. Call MmGetMdlVirtualAddress() to get an index into the MDL, followed by iterative calls to MapTransfer() (or one call if the driver doesn’t implement scatter/gather).  Each call to MapTransfer() returns a base logical address and length to be used for the DMA transfer.  Drivers that implement scatter/gather call MapTransfer() until they have used all their map registers, described the maximum number of scatter/gather fragments possible for one transfer on their device, or described the entire buffer to be DMA’ed, whichever comes first.  When the driver has programmed its device and the DMA operation has been requested, the driver returns from its AdapterControl() routine with the lovely intuitive status DeallocateObjectKeepRegisters.
  4. When the DMA Operation is complete, call FlushAdapterBuffers().  This is typically done in the DpcForIsr().
  5. When the DMA operation is complete, call FreeMapRegisters().

 

While we’ve all gotten used to it, calling MapTransfer()  (or, formerly, IoMapTransfer()) for each scatter/gather fragment can be mighty annoying.  The mechanics of the function, plus the handling of one or two difficult edge conditions, can make this loop a lot more complex than it could be.  In addition, trying to queue up multiple requests for the adapter, by calling AllocateAdapterChannel() again before a prior DMA operation is finished, is fraught with unexpected danger.

 

Now, we have DmaOps->GetScatterGatherList() to solve these problems.  Using DmaOps->GetScatterGatherList(), drivers will be called back with a pointer to a SCATTER_GATHER_LIST structure that contains a vector of logical base address and length entries. These are shown in Figure 4.  Using the SCATTER_GATHER_LIST structure, the driver can easily program its device, start the transfer, and return.  This significantly streamlines bus master DMA support.

 

typedef struct _SCATTER_GATHER_LIST {

  ULONG NumberOfElements;

  ULONG_PTR Reserved;

  SCATTER_GATHER_ELEMENT Elements[];

} SCATTER_GATHER_LIST, *PSCATTER_GATHER_LIST;

 

typedef struct _SCATTER_GATHER_ELEMENT {

    PHYSICAL_ADDRESS Address;  

    ULONG Length;         

    ULONG_PTR Reserved;

} SCATTER_GATHER_ELEMENT, *PSCATTER_GATHER_ELEMENT;

 

Figure 4 -- GetScatterGatherList structures

 

The general steps a driver undertakes for calling DmaOps->GetScatterGatherList() are as follows:

 

1.      The driver calls KeFlushIoBuffers(), passing the MDL and the direction of the transfer.

 

2.      The driver determines the length of the current DMA operation.  While this will typically be the entire length of the buffer as described by the MDL, the transfer length requested when calling DmaOps->GetScatterGatherList() must not utilize more than the number of mapping registers returned by IoGetDmaAdapter().  Thus, if ADDRESS_AND_SIZE_TO_SPAN_PAGES()  indicates that more map registers are required to completely map a transfer than were returned by IoGetDmaAdapter(), the driver must limit the length of the current DMA operation to PAGE_SIZE * (MaximumMapRegisters-1). 

 

3.      The driver calls MmGetMdlVirtualAddress()  to get an index into the MDL at which to start the transfer.

 

4.      The driver calls DmaOps->GetScatterGatherList(), passing:

 

·        A Pointer to the DMA_ADAPTER on which the operation is to be performed;

·        A Pointer to the Device Object for the DMA operation;

·        A Pointer to the MDL that describes the buffer being transferred;

·        An index into the buffer (the “CurrentVa” argument) that indicates the starting location of the DMA transfer in the buffer.  Typically, this will be the value returned to the driver when it called MmGetMdlVirtualAddress() (Editors: It would have been much easier if this function took an offset into the buffer!  Obviously, the NT developers never ask us our opinion on these things.);

·        The length of the transfer;

·        A pointer to the driver’s AdapterListControl() routine, to which the Bus Driver will pass the SCATTER_GATHER_LIST and in which the driver will program the device and start the DMA function.

·        A context value of the driver’s choosing;

·        A BOOLEAN value indicating the direction of the transfer

 

1.        5.      The bus driver allocates the necessary resources, and calls the driver back at its AdapterListControl() routine, at IRQL DISPATCH_LEVEL.  In this function, the driver programs the device, starts the transfer, and returns.  The description of AdapterListControl() is shown in Figure 5.

 

2.        6.      The driver calls DmaOps->PutScatterGatherList() when the transfer is complete.

 

VOID

AdapterListCntrol(IN PDEVICE_OBJECT DeviceObject,
                                                IN PIRP Irp,
                                                IN PSCATTER_GATHER_LIST ScatterGather,
                                                IN PVOID Context);

 

Figure 5 -- Prototype for Driver's AdapterListControl Function

 

OK, so it might look to you at first glance like there is more work involved in calling DmaOps->GetScatterGatherList() rather than doing things the “old way”, but it really is considerably cleaner and easier.  All the nastiness of calling IoMapTransfer() is gone;  All that ugly handling of the returned length value: Gone!  Now, all you have left to do is program your device.

 

The only gotcha that I’m aware of in using this function (well, perhaps there’s another one, too, but I’ll call that one a feature – see below) is that you can’t just blindly call DmaOps->GetScatterGatherList() with any size buffer described by an MDL.  Why they didn’t add the extra ten lines of code necessary to handle this, I don’t know.  But you have to limit the length of your request to that which can be mapped in the number of map registers previously returned by IoGetDmaAdapter().   If you have a transfer that needs more map registers than are available, you’ll need to break the request up into multiple calls to DmaOps->GetScatterGatherList().  The first call will specify the return value of MmGetMdlVirtualAddress() as the starting index (“CurrentVa” argument) in the call to DmaOps->GetScatterGatherList(), and PAGE_SIZE * (MaximumMapRegisters-1) as the length of the transfer.  Subsequent calls will add the previous transfer length to “CurrentVa” and specify a length that, again, does not require more mapping registers than are available.

 

Will this limit be a problem for most devices?  No.  On more than 99.9% of x86 architecture systems (which are all that are really left, anyhow) there are no hardware map registers used.  And since there are no hardware map registers in this case, the driver will never be constrained by the number of map registers available.  Even on Alpha architecture systems that use map registers, map register availability is rarely a problem.

 

Good Stuff

In addition to a just plain cleaner interface, there are a couple of other interesting advantages that you get by calling DmaOps->GetScatterGatherList().  The first is a probable increase in performance for some drivers.  With the original IoAllocateAdapterChannel() model, requests that cannot be immediately satisfied (due to map registers not being available, for example) are queued via the wait block in the Device Object.  Because this block is in the Device Object, only one request may be queued awaiting the Adapter Channel per device.  This limitation has been the source of much woe among driver writers new to the world of NT drivers.

 

The DmaOps->GetScatterGatherList() function actually allocates its own wait block.  Therefore you don’t have to worry about how many requests for DMA are pending.  Hurrah!  Score one for the good guys!

 

In addition, there appears to be another very subtle issue with how DmaOps->GetScatterGatherList() is implemented.  This function appears to handle chained MDLs.  This means that, for the first time in NT history outside the network stack, buffers comprising multiple non-contiguous virtual address ranges will be supported by the operating system.  Of course, this may not be good news for everybody.  I know a few folks who – contrary to my advice – stuff something in Mdl->Next and expect it to be there when they get the MDL back.  Of course, if you attempt to call DmaOps->GetScatterGatherList() on such an MDL… Bang!  Welcome to the new Blue Screen!!  Anyhow, support for chained MDLs being built-into Window 2000 would signal a big shift in the architecture of the storage stack.  Any of our colleagues in Redmond willing to comment on this?

 

Have At It

So there you have it: The new DMA model.  It’s easier to use, cleaner and less problem prone in its implementation.  We recommend most heartily that you use it for your DMA drivers! 

 

 

Related Articles
X-DMA - Extreme DMA for Performance

User Comments
Rate this article and give us feedback. Do you find anything missing? Share your opinion with the community!
Post Your Comment

Post Your Comments.
Print this article.
Email this article.
bottom nav links