To many new Windows NT developers the virtual memory system causes considerable confusion. For systems programmers on NT this is even worse, because many of the mechanisms they must use to access memory revolve around the need to manipulate these virtual addresses. In this article, we’ll try to provide a basic rationale for virtual memory and then explain how the Windows NT virtual memory system actually works.
First, let’s describe why virtual memory is useful. It certainly isn’t necessary, since there are plenty of operating systems that don’t use virtual memory and still manage to work correctly. For example, MS-DOS doesn’t support virtual memory, yet it is still a viable operating system. A constant problem for developers has always been the limitations of physical memory. A number of techniques have been created over the years to handle limited physical memory situations. One such mechanism was the "overlay" in which a program was divided into discreet chunks, only one of which could be actively accessed at a time. Whenever a different chunk was needed, it was read into physical memory, overwriting the previous contents of that memory.
Virtual memory takes that "trick" a step further. It divides all of memory into a set of pages. In general, these pages are restricted to one, or perhaps a few, possible page sizes. Thus, instead of only having one overlay region, the memory can conceivable have many overlay regions. Of course, when overlays were first invented the application programmer was responsible for determining the layout of data. Over time, standard packages became available to handle this and eventually this functionality migrated into the operating system.
Another feature was also added over time to yield the current virtual memory system we find in Windows NT – namely, the concept of address spaces. Some systems had very limited memory addressing capabilities. For example, the venerable PDP-11 had a 16 bit address space – 64KB. Some versions of the PDP-11 doubled this address space by separating code from data so that the combined address space was 128KB. Still, applications could be no larger than 64KB and that limit included the operating system code!
This problem was resolved by using a standard "programming trick". Namely, by adding a level of indirection it was possible for the operating system to maintain multiple different address spaces. Of course, only one was active at any point in time. The use of separate address spaces had another advantage – namely that memory owned by one address space couldn’t be modified by another address space. On the PDP-11 the physical address space of the system was quite a bit larger than the virtual address space, so it was possible to have many processes resident in physical memory at one time.
The final problem that had a profound impact on modern virtual memory systems was the scarcity of physical memory and the growing abundance of much less expensive, but much slower data storage. In order to simulate a computer system with more memory, data could be copied from memory onto a disk drive, and fetched from disk back into memory, depending upon the actual needs of the system.
Digital Equipment Corporation built a completely new architecture system around these new concepts of virtual memory – the Virtual Address Extension system (or VAX) was created to allow support for very large virtual addresses. This new system allowed a single address space to contain as much as 4GB of virtual memory. The Windows NT virtual memory system still bears a strong resemblance to the virtual memory architecture of those VAX systems.
The Windows NT VM system had some other goals to achieve, that would not have been a requirement for the original VAX developers. Since Windows NT was to be a portable operating system, the virtual memory architecture couldn’t rely upon specific features within the underlying hardware. Of course, the precise way that virtual memory management is handled by a particular modern computer is a function of the CPU. For example, the Intel CPU has a hybrid MMU that allows support for both segmented and paged memory models. The Digital AXP CPU has support for a very large (264 byte) virtual address space. Note that this is in spite of the physical address space is limited to 48 bit. Windows NT’s challenge was to support these, and other, memory architectures in a platform-independent fashion.
At this point you might still be asking yourself "why support virtual memory at all?" After all, it seems like a lot of complexity – and it is. Still, the benefits far outweigh the costs. Benefits to using virtual memory include:
- Process isolation – two processes have separate address spaces and hence do not interfere with one another.
- Memory protection – the CPU can run in one of at least two "modes". Addresses can be marked as valid only from a particular mode.
- No physical memory limitation – the computer system can simulate the existence of physical memory.
- Support for sparse address spaces – the computer system need not allocate physical memory for regions in an address space that are not currently in use.
- Support for data sharing between address spaces – since virtual addresses are implemented via indirection tables to the actual physical memory, data can be shared.
- Support for copy-on-write memory – the operating system can allow sharing of data so that if it is modified, the data sharing is broken.
Windows NT supports all of these features, and more. With a basic understanding of why virtual memory is useful, we can now turn our attention to a description of how virtual memory works. Once that’s accomplished, we can discuss some of the details that are so important when writing systems software for Windows NT.
Modern CPUs incorporate the "memory management logic" as part of the way they work. Windows NT is designed assuming the presence of hardware support for virtual memory. While it is theoretically possible to build CPUs without support for VM, it turns out that such systems are far too slow to be of much use.
Whenever a CPU is running in "virtual memory" mode, addresses presented to the CPU are normally assumed to be virtual addresses. That is, the CPU attempts to translate the address into a physical address. For example, a CPU has some concept of the instruction pointer (ip). This address corresponds to the currently executing instruction. The value within this instruction pointer is not, in fact, the address of the physical memory containing the instruction being executed, but is rather a virtual address. Before fetching the instructions to be executed, the CPU must translate the virtual address to a physical address, and then fetch the contents of the physical memory.
The precise mechanism used to translate a virtual address to a physical address depends entirely upon the CPU architecture. For example, on the Intel Pentium Pro, this process can actually be quite complicated because the Pentium Pro supports five possible addressing schemes. This is possible because Intel provides support for both segment and offset registers in a variety of combinations, including what they refer to as paging. In general, Windows NT uses the simplest segmentation scheme (the "flat model") with paging. We say in general because there is some limited use of segmentation, notably in the area of the virtual DOS machine. This is very different than the model used by Digital’s AXP (or Alpha) system. For additional information on either of these systems, we suggest reading either the Pentium Pro Family Developer’s Manual, Volume 3: Operating System Writer’s Guide, or the Alpha Architecture Reference Manual.
Regardless of the actual hardware platform, Windows NT implements a virtual memory system that provides a clean abstraction for both applications and systems programmers. Thus, when discussing the Windows NT VM system, it is not usually necessary to refer to the hardware reference manuals.
Returning to our instruction pointer example, we will start our discussion of virtual memory by describing the steps taken to convert that virtual address to a physical address.
First, the CPU keeps a cache of virtual to physical translations it has done recently. Since we divided physical memory up into a series of pages, the CPU knows that any two virtual addresses on the same page are also on the same physical page. Thus, caching this translation information means that almost always, the needed information is cached internally by the CPU itself.
If the internal CPU cache does not contain the necessary information, then the memory management logic portion of the CPU takes over. If the CPU provides hardware page table support (as do both the Intel and AXP) then the CPU has some register that contains the address of the first directory page table. This directory page table consists of a set of entries. Each entry in turn points to a page table. Each page table in turn consists of entries. Each entry in the page table points to a page of physical memory.
The CPU looks at an address as if it were made up of three parts:
- A directory offset
- A page table offset
- A page offset
For the Intel platform, a page is 4K in size. Thus, the page offset consists of 212 bits of addressing information. Since the page table points only to whole pages, those 12 bits of information can be used by the hardware and the operating system to keep track of attributes of the particular page. A page table fits within a single page – again 4K. Since virtual addresses on Windows NT are 32 bits (4 bytes) a total of 1024 pages can be described within a single page table. Each page table describes 1024 * 4KB of virtual memory, or 4MB of virtual memory.
A single page directory consists of 1024 entries as well, with each entry pointing to a page table. Thus, a page directory describes 1024 * 4MB or 4GB of virtual memory. Since Windows NT has a 4GB virtual address space, a single page directory is sufficient to describe the entire address space on the Intel platform.
Again returning to our instruction pointer example, the CPU processes the directory offset (the high order 10 bits of the address) to determine which entry to use to find the page table. Once that page table has been located, it uses the next 10 bits of the address to determine which physical page contains the actual data. Once this step is completed, the CPU stores away the virtual to physical translation for future use.
Independent of how the CPU actually found the correct physical page once it actually has the correct page address, it combines this with the remaining 12 bits of address information to retrieve the necessary data from the page. This process is performed on every single access to memory by the CPU, so ensuring it is fast is essential to performance.
If you’ve made it this far into the description of virtual memory you might be scratching your head, since we haven’t talked about the case when things go wrong – only when things go right. The most obvious thing that can "go wrong" is to find a page table entry that points to nonexistent memory. In such cases, the hardware hands off the problem to the operating system. On the Intel platform, this is accomplished by generating an exception. This exception (number 14) is known as a page fault. Inside the Intel version of NT, a page fault is handled by the page fault handler – KiTrap0E.
In Windows NT, page faults are only allowed when the IRQL of the processor is less than DISPATCH_LEVEL. If the IRQL is equal to or greater than DISPATCH_LEVEL, it is KiTrap0E that generates the KeBugCheck(…) call indicating IRQL_NOT_LESS_OR_EQUAL.
One important point here – the page fault is handled by the same thread. While the CPU might have changed its privilege mode from user mode to kernel mode it will still be the same thread. This turns out to be extremely important because Windows NT has the ability to create multiple address spaces. Every time Windows NT creates a new process it associates a new (empty) address space with that process. Normally, some code, typically from an executable image file, is then placed in that address space.
Threads are in turn associated with a particular process. Hence, there can be many threads within a single process, all sharing the same address space. Note that each of these threads is running independently, and hence can independently generate page faults within this common address space.
Once Windows NT intercepts the page fault and captures the necessary information (such as where the fault occurred and why), it transfers control to the Memory Manager. Specifically, the routine MmAccessFault(…) is called to handle the page fault.
Inside Windows NT, page faults occur for a number of reasons. One very common reason is that the virtual page is not currently resident in physical memory. In this case, the Memory Manager is responsible for allocating a physical page and retrieving the correct contents from wherever it is actually stored. Typically, this is on a disk somewhere, although it could be via some remote connection to a file server.
Whether a page is resident is indicated in its page table entry using the Valid bit. This bit’s actual location within a single page table entry is actually defined by the underlying hardware. Recall that in the Intel example only 20 bits of the 32 bit virtual address were actually needed to locate the physical page, since the last 12 bits of information were the offset within the page. Thus, the other 12 bits of information can be used to keep track of details about the particular physical page (such as if the page is valid or not).
In the cases where the page is not valid, the hardware cannot actually associate any meaning with any of the other bits. The operating system is allowed to use those bits to track the real location of the data this page contains, since it isn’t currently in memory.
Returning to our page fault example, the Memory Manager uses the other 31 bits of information to determine where the page is actually located. For example, were the data inside the paging file, this information would identify which paging file and where within the paging file. Given this information the Memory Manager can retrieve the data from the disk.
This points to two of the vital roles of the Memory Manager: handling page faults and allocating physical memory. In addition, the Memory Manager is also responsible for managing address spaces, the underlying memory management hardware, and even the mapping of files as if they were pieces of memory! In addition to supporting all of these features, the Memory Manager is also responsible for doing this very efficiently, because speed within the VM system is critical to the overall performance of the operating system. This makes the actual implementation of the VM system both vital and complex. Indeed, of all of the pieces of the operating system, the Memory Manager is the single most complex piece.
Virtual Address Spaces
At this point, then, with a basic description of why Virtual Memory is important, we can turn our attention to describing the Windows NT Memory Manager. In Windows NT 4.0, the virtual address space of the system is divided into two components shown in Figure 1.
Figure 1 – NTV4 System VA Space
One component is the range of addresses that are accessible when the processor is running in user mode – that is, the least privileged mode of the CPU. This access is enforced by the underlying hardware using some of those precious "spare bits" in the page table entry. Specifically, each page has as single bit indicating if it is accessible from user mode (by this we mean, the processor is running in user mode, but it’s much easier to simply write "user mode"). All pages are assumed to be accessible from kernel mode, although even from kernel mode pages need not be in use, which is different than not being valid (since valid means "resident in physical memory").
Thus, the 4GB virtual address range of the computer system is divided into two pieces. One piece represents those addresses that can be accessed from user mode. The other piece represents everything else. Any time the operating system is about to run a thread in a different process, it must switch out the set of mapping tables it is using to translate virtual to physical addresses. As a convenience for the operating system, only some of the virtual to physical mappings are actually changed. Thus, some of the mappings don’t change as the operating system switches from one process (and hence on address space) to another process (another address space).
The benefit of this approach is that any virtual addresses within this constant range are valid in any process context. This feature proves to be very useful when writing system software.
Very recently, Microsoft introduced Enterprise Server, a new version of Windows NT that allows you to "tune" the allocation of virtual addresses between user mode and kernel mode. Figure 2 graphically describes this new memory division.
Figure 2 – NT Enterprise Server System VA Space
The fundamental difference is that the range of addresses valid in arbitrary process context is now much smaller. Correspondingly, the range of addresses valid in a specific process context is now much larger. For some specialized applications, such as databases, the ability to have a larger address space allows them to operate much more efficiently.
Additionally, perhaps you have read or heard a bit about some of the features that will be present in Windows NT Version 5.0. One of those features is the extension of the virtual memory address space so that it now allows up to 32 GB of virtual memory, rather than the older 4GB limitation.
While we’ve noted that the virtual address size on Windows NT is only 32 bits, the physical address size on Windows NT has always been 64 bits. In theory, it has always been possible to have more than 4GB of physical memory, but just not very common. During Windows NT’s life, the availability of memory has dramatically improved to the point where even desktop systems routinely have over a hundred megabytes of memory. The future is likely to make this even more common.
Changing to a 64 bit virtual address from a 32 bit virtual address has a profound effect on the operating system. During the transition, the operating system will need to facilitate working with both applications and systems software to ensure backwards compatibility. Eventually, of course, the operating system will be fully 64 bit enabled and the backwards compatibility will not be necessary.
In NT 5.0 a partial step towards full 64 bit support has been made. The virtual address space has been extended to support 64 bit virtual addresses for "aware" application programs. Internally, the Memory Manager provides support to use 35 bits of the total 64 bit virtual address space. To preserve compatibility with existing applications and systems software the additional 28 GB of virtual address space begins at 4GB and continues to addresses of up to 32GB. This is shown graphically in the Figure 3.
Of course, like anything written about Windows NT 5.0 at this point, this information is subject to change and might not even be correct at the point you read this! After all, Version 5.0 is still in active development.
With that said, we note that changing the VM system to support 64 bit addresses is a bit more than simply allocating more storage for the virtual address pointers. Note, for example, that each page table now stores only half as many pages. Similarly, each page directory now represents only one-quarter as much memory. Thus, just to represent 4GB now requires four page directory pages. Representing 32 GB requires 32 directory pages – and full 64 bit support is likely to require an extra level of indirection – an index page of directory pages!
Indeed, it turns out that supporting a vastly larger address space is likely to involve quite a few changes to the way the VM system works on Windows NT and because of this an interim step was used. In this interim step the normal user addresses (between zero and two gigabytes) are still paged. System addresses (between two and four gigabytes) are divided up into a variety of pieces as well, most of which are paged. Addresses in the upper range (between four and 32 gigabytes) are not paged and are shared between address spaces. This allows the Memory Manager to use the same page directories to represent all of these upper pages. Since the Memory Manager doesn’t have to change nearly as much, this interim solution solves the immediate problem (the need for more virtual memory).
These new addresses between four and 32 GB are still subject to the same rules as addresses between zero and four GB. Thus, for example, some of these pages could be marked as inaccessible from user mode, although they are accessible from kernel mode. This might prove to be useful for certain operating system services, such as file caching, in addition to allowing larger application address spaces.
Since page fault handling can’t occur in this range (note that we indicated these are non-paged addresses) none of the page fault logic within the Memory Manager need change to support this new model, either. Finally, existing systems software, such as file systems and device drivers, can be supported by using virtual addresses in the two to four GB system space range. As we will discuss, this turns out to be very straightforward in the Windows NT Memory Manager scheme.
At this stage, it should be a bit clearer what the Windows NT DDK means when it says "arbitrary context." One very important attribute of an arbitrary context is that the precise translation of "user addresses" is, literally, arbitrary. For example, a device driver that writes into a user’s address space in "arbitrary context" is essentially writing into some random location within the current process’s memory. While this might work in some circumstances, in others it might cause a fatal application failure or even an operating system failure!
We’ll continue next time with some more discussion of virtual memory in NT, including interesting discussions on page tables.