In the last issue our file systems article focused on explaining how oplocks worked. In looking at oplocks, there were interesting complications to consider about how the network file systems had a profound impact on the local, physical file systems. Further study reveals that oplocks are only the "tip of the iceberg" with respect to the impact the LanManager file server (SRV) has on Windows NT file systems.
Further, SRV has been the source of significant changes in the NT file systems interface for the 4.0 release and appears to be likely to be the single most significant source of changes for the next release as well.
In this article we describe some of the key tricks played by SRV, mostly for performance reasons, and the requirements on the file systems that are exported by SRV to provide the best possible performance. Keep in mind that many of these tricks are only available to kernel-resident file servers such as SRV. The implementation of many of these tricks is entirely voluntary (at least with respect to the file system) although experience indicates that if an FSD does deviate from the normal NT FSD implementation, it will trigger latent bugs present in the system. Naturally, customers will not look kindly upon your product if you advise them, "the bug is Microsoft's, not ours" when they cannot see this problem when using NTFS or FAT.
The basic LanManager model is quite simple. Remote clients read and write to files across the network. The LanManager redirector, RDR, implements the client side of that protocol by taking Windows NT file system requests such as reading and writing to a particular file, and converts them into an SMB protocol request. These requests are then sent to the particular LanManager file server to retrieve that information. For Windows NT, Microsoft provides SRV to implement the server side of this interface.
As such, SRV is not a file system, but rather a file server. Keep in mind then that a workstation has no more file system access whether or not it is running SRV. The key difference is the access being provided to remote users. A Windows NT system running SRV can allow remote users to access some of the file system data stored on that particular system. Of course, SRV is not the only file serving protocol but it is the only one provided as a matter of course with all versions of Windows NT, be it Server or Workstation.
LanManager file service is implemented as two components: a user mode component (the "server" service) and a kernel mode component ("SRV"). The kernel mode component is simply an installable kernel mode driver. One key difference is that SRV is not an extension of the OS via the I/O Manager which is fundamentally what a device driver or file system is. Instead, SRV logically resides above the I/O Manager interface, implementing its functionality by calling I/O Manager routines and the NT Executive API directly.
This turns out this has some interesting ramifications. First, it eliminates user/kernel mode transitions, which can be an impediment to peak performance. Second, because SRV is a kernel driver, it can take advantage of interfaces and implementation tricks which are simply not available to those file servers running in user-mode.
Thus between them, SRV and RDR provide the fundamental kernel mode components of LanManager on Windows NT. Of course, there are other auxiliary components, such as the Network Provider which communicates between RDR and MPR to create "network drives" (i.e., associations between some drive letter and a specific LanManager name).
Microsoft has been very concerned about the performance of SRV on Windows NT. This concern does not ever seem to go away and each new release of Windows NT seems to have even more "optimizations" for use with Windows NT. 4.0 is, of course, no exception. In fact, there are some very profound changes that are in NT 4.0 for nothing but SRV.
From the perspective of someone who is forever engrossed in NT file systems work, SRV is a significant source of frustration. Because there is such an emphasis on performance, SRV utilizes an amazing array of tricks that are not used by any other component within the system. This is in light of the fact that there is no reason other kernel mode file servers might not use these tricks as well.
FSD Requirements for SRV
Last issue we described how oplocks work to provide cache consistency. Unlike many of the tricks we describe in this article, the use of oplocks is possible from a non-kernel resident file system. For those who might not have read last issue's oplock article, SRV and RDR use a simple scheme for ensuring that data may be stored by RDR to minimize network traffic. The use of oplocks ensures that data does not become "stale" on the remote client. Unfortunately, the FSDs get involved in this process because users expect synchronization of access between both the local and remote cases.
Another interesting set of operations that are used by SRV are those for retrieving MDLs directly from the FSD. This is the case for both read and write. The minor codes for read and write are composed of a bit-mask, where the setting of a particular bit indicates whether or not that operation is being requested. These bits are:
These bits may be combined to form distinct combinations of how information is to be retrieved from the underlying FSD. For instance, IRP_MN_MDL and IRP_MN_COMPRESSED are combined to indicate that the caller (SRV) wishes to have the data returned to it in an MDL and that the data should be in compressed format.
Requesting the data in MDL form turns out to be extremely efficient on Windows NT because of the integration model between the FSDs and the VM system. By returning an MDL to the FSD image of the file, a data copy is avoided between the FSD and SRV. This offers a significant performance improvement.
Interestingly enough, though, Microsoft's implementation of this is a bit odd. It turns out that SRV calls the FSD to create the MDL but calls the Cache Manager to free the MDL (for those of you who attended the IFS Conference in October, 1994, look at the definitions of FsRtlMdlReadComplete and FsRtlMdlWriteComplete for a hint as to why it works this way).
Thus, it isn't possible for a specialized file system to provide its own MDL to SRV; The MDL must have come from the Cache Manager. This was determined when building a "temporary" file system backed by the paging file. An MDL was built for the paged pool we allocated which consequently resulted in a system crash. Of course, we cannot call it a bug since the whole interface isn't documented to begin with.
The IRP_MN_COMPRESSED option was added in NT 4.0 and should lead to substantially improved performance for NT 4.0 servers and workstations. Rather than reading the data off the disk, decompressing it, and then sending it to the remote client, the data is read off the disk and passed to the remote client in compressed format!
Of course, this simple procedure yields tremendous benefit because not only does it minimize the bandwidth consumed on the network (a significant benefit by itself), but it also alleviates any need for the server to decompress the data which frees up CPU bandwidth on the server. This should yield a substantial improvement in overall network performance.
Unfortunately, this can only work between two 4.0 systems. This modification will do no good when the remote client is NT 3.51, Win 95, or Windows for Workgroups. Still, SRV handles these legacy clients without any problems. Further, since NTFS is the only file system that supports file-level data compression, this enhancement only works with the NTFS file system, or perhaps another file system which supports the same compression method used by NTFS.
Another significant trick SRV plays is to create its own IRP pool, allowing it to optimize the network/FSD communications path. Thus, in the ideal case, SRV builds an IRP and sends it to the FSD. The FSD fills it in, providing an MDL for the cached section no doubt, and completes the IRP. SRV, however, has registered a completion routine for the IRP. Thus, the I/O Manager transfers control of the IRP to the SRV completion routine. In that routine, SRV then formats the IRP for sending down the TDI transmission path and returns STATUS_MORE_PROCESSING_REQUIRED to the I/O Manager. Again, when the TDI completes its processing, SRV's completion routine short-circuits the completion processing, this time calling back to the FSD to release the MDL. SRV can continue this process, re-using the IRP repeatedly until the entire I/O operation has been completed.
In fact, this process can continue on until SRV has exhausted its need for the IRP. It can then be returned to its pool, all without having ever been completed by the I/O Manager. Again, because the overhead associated with queuing an APC and indicating completion can be avoided, this approach provides an additional benefit versus using the normal completion path.
This trick is an excellent one to remember as it turns out to be useful when writing your own high-performance drivers. Even a simple exercise such as intercepting the I/O request in a completion routine rather than waiting for the normal I/O completion, turns out to have significant benefits.
SRV itself continues to change significantly. The core technology seems very important to Microsoft as they continue to expand additional new core technology. For instance, in NT 4.0 there are initial signs of Microsoft's extensions of LanManager to include new distributed file system (DFS) capabilities. Of course, these new capabilities are additions to the existing LanManager protocols to provide further enhancements that should prove to be quite useful. For example, a key piece of the DFS technology will be the ability to use logical naming, which will eliminate the tight coupling between a file's path name and its current location. By eliminating this coupling, enhanced capabilities such as data replication or data migration can be added without the user even noticing the changes.
An example of this capability would involve a single file being stored on multiple servers. If one server fails, a second server could then be used to provide access to the contents of that same file, without any impact on the user accessing that file.
A second area where LanManager technology is being extended can be found in Microsoft's Internet File System strategy. A portion of this strategy involves the use of the LanManager protocols as an extension to the current internet server technologies to enhance the file sharing protocols. Thus, SRV could be used instead of FTP in order to take advantage of the additional features present in a file sharing protocol, rather than a file transfer protocol.
There is much to be learned by observing SRV. Many of the tricks SRV uses to boost performance can be used by other kernel mode drivers to similarly boost their performance. Companies building their own file servers (perhaps for non-LanManager file sharing protocols, such as NFS) would do well to pay particularly close attention to how SRV works because Microsoft has implemented these "tricks" to yield a very high performance kernel resident file service. These tricks can naturally be used in other file servers as well.
The goal for all of these tricks is the same: provide the very best possible performance for the critical Windows NT service. So, are you being SRVed?