The NT Insider:Take a Break - Missed Breakpoints? Here's Why...

Everything Windows Driver Development

Thu, 14 Mar 2019 118020 members

Online Dump Analyzer
OSR Dev Blog
The NT Insider
The Basics
File Systems
Downloads
ListServer / Forum


	Express Links

	·	The NT Insider Digital Edition - May-June 2016 Now Available!
	·	Windows 8.1 Update: VS Express Now Supported
	·	HCK Client install on Windows N versions
	·	There's a WDFSTRING?
	·	When CAN You Call WdfIoQueueP...ously

THE NT INSIDER

Take a Break - Missed Breakpoints? Here's Why...
By Jen-Lung Chiu with Bryce Jonasson The NT Insider, Volume 16, Issue 3, Sept-Oct 2009 | Published: 02-Oct-09| Modified: 02-Oct-09

The current way breakpoints are handled in the Windows kernel debugger has some fundamental flaws that can make breakpoint recognition unreliable, especially in time-critical scenarios. This article describes the problematic scenarios in the kernel debugger, and proposes a work around that developers may use to avoid the problem.

Breakpoints are locations within a program that, when they are accessed by a program, cause the program to halt execution and enter the debugger. The program being controlled via breakpoints could be an application program or the operating system. Breakpoint manipulation enables developers to halt program execution and, using the debugger, collect state information for further diagnosis. Clearly, breakpoints can play an important role in debugging.

Currently there are two documented types of breakpoints (relevant to this discussion) that are supported by the Windows debugger package.

Code breakpoints - Code breakpoints halt execution when the program attempts to execute the instruction at the address of the breakpoint. Code breakpoints are set through "bp", "bm", or "bu" commands. .

Data access breakpoints - Data access breakpoints are set through the "ba" command. This type of breakpoint halts execution when the breakpoint?s memory location is accessed in any way by the program (read/write/execute).

In the x86 as well as the x64 architecture, data access breakpoints are implemented using hardware breakpoint support. The debugger uses Debug Address Registers (DR0, DR1, DR2, DR3) to hold up to four virtual addresses as the data access breakpoints, and uses a Debug Control Register (DR7) to enable/disable data access breakpoints and to set breakpoint conditions (break at read/write/execute). When a program access an address specified in one of the Debug Address Registers, the hardware generates a debug exception trap. When the debugger receives a debug exception, it examines the content of the Debug Status Register (DR6) to determine which data access breakpoint was hit. There are limitations on how many data access breakpoints can be set based on how many Debug Address Registers are in the system (currently there are 4, as specified above).

Code breakpoints are implemented by the debugger using a mix of both hardware and software means. When the user requests a code breakpoint at a given address, the debugger caches the original instruction at that address and replaces it with an "int 3" instruction. When the breakpoint is removed, the debugger restores the original instruction that was cached previously. In the case of a user-

The Missing Breakpoint Problem

One complaint we often hear from users is that breakpoints that they set using the debugger are "missed." That is, the debugger doesn?t stop when a code breakpoint is encountered. This missing breakpoint problem has existed in the Windows kernel and the Windows debugger for a long time, but the root cause of the problem was only recently discovered.

Code breakpoints always work correctly if the breakpoint address is already in memory (not paged out). The problem happens when the virtual address of the code breakpoint is not resident in physical memory (either not loaded yet or paged out).

The two key O/S components involved in setting breakpoints are the kernel debugging stub portion of the O/S, referred to as KD, and the Memory Manager, referred to as Mm. Consider the typical sequence (with some implementation specific information):

KD tries to set a code breakpoint on VA1.

The breakpoint is in paged-out memory; that is, VA1 is not resident.

As a result, MmDbgCopyMemory() fails to update the code stream with an artificial "int 3" instruction; KD then sets KdpOweBreakpoint flag so that trap.asm will call KdSetOwedBreakpoints() for each successful MmAccessFault.

Thread1 faults on VA1 and Memory Manager makes the page valid.

After MmAccessFault finishes, execution returns to trap.asm; at which point the KdSetOwedBreakpoints() is called and in turn calls MmDbgCopyMemory() to read the current instruction at VA1 and replaces the "int 3" instruction to complete the code breakpoint insertion.

Between step 4 and 5, any number of other threads could run the code in VA1 (since no page fault would occur now). If this happens, the code breakpoint is not hit as it was not yet set into the code stream. This behavior is not limited to multi-processor systems as Thread1 could be context-switched out for an indeterminate amount of time between step 4 and step 5.

More Problems

When we reviewed this issue further, we found additional breakpoint problems when the "int 3" instruction was inserted.

Problem 1) When setting user-mode code breakpoints with the kernel debugger.

All code breakpoints (user-mode and kernel-mode) inserted through the kernel debugger will go through the same code path. When KD is notified of a code stream page availability (above step 5), it goes through the internal code breakpoint list and tries to insert the breakpoint if they are in the newly available code page. For each code breakpoint that is within the available code page, KD first checks whether this is a kernel-mode code breakpoint or user-mode code breakpoint (the check is to compare the code breakpoint virtual address with the MmSystemRangeStart value), and proceeds as follows:

For kernel-mode code breakpoints, KD calls MmDbgCopyMemory() immediately (thus the "int 3" instruction is substituted for the existing instruction at the breakpoint location).

For user-mode code breakpoints, KD tests whether the current execution context matches the process context when the specific code breakpoint is added and then only injects the user-mode code breakpoint if they match. Once MmDbgCopyMemory() is called, the "int 3" instruction is substituted in the code stream pages and all processes that execute the code pages will hit the user-mode code breakpoint.

So, it is possible that once user-mode code breakpoints are inserted, other processes will be affected and hit the same code breakpoints even if this was not intended (this could be easily verified by attaching a process to a user-mode debugger and use the "u" command to unassemble code around the set user-mode code breakpoints). It is also possible that user-mode code breakpoints will not be added if the current process context (when KdpSetOwedBreakpoints() is triggered after successful MmAccessFault) does not match the process context when the user-mode code breakpoint was added in the kernel debugger. Thus the user-mode code breakpoint will never be hit, even if properly set by the kernel debugger.

There are cases in which the user-mode debugger will not recover from an inserted "int 3" instruction if the kernel debugger were allowed to insert user-mode code breakpoints. Consider the following sequence:

The kernel debugger inserts user-mode code breakpoints with a specified process context. Once the code page is available with the artificial "int 3" instruction, it is available to all processes.

The user-mode debugger is attached to one process and adds the exact code breakpoint. The user-mode debugger will cache the original instruction before injecting the "int 3" instruction, but the original instruction is already being modified to "int 3" when the kernel debugger inserts the user-mode code breakpoint.

The kernel debugger removes the user-mode code breakpoint, and the original instruction is restored to the code stream.

Unfortunately the specific user-mode process would not recover as the cached original instruction is still an "int 3".

From this, we can see that setting user-mode breakpoints via the kernel debugger is not reliable, and that the unexpected effect on a user-mode process is not recoverable.

Problem 2) When global code pages are copied or written to private pages.

Once code breakpoints are successfully created (that is, the "int 3" instruction is substituted for the original instruction at the location of the breakpoint), they are available globally. This problem occurs when a code page is copied or written to private pages (difficult to track). This can occur in session drivers (for example, win32.sys) as well as user-mode applications (for user-mode code breakpoints).

Consider the following execution sequence:

A developer sets a code breakpoint through the kernel debugger. An "int 3" instruction is inserted into code stream page.

Session drivers or a user-mode application then copies or writes the code page to a private page. The private page now contains the "int 3" instruction that the debugger inserted for the code breakpoint.

The developer now removes the code breakpoint through kernel debugger. The "int 3" instruction is removed and restored with the cached original instruction in the global code page.

Unfortunately, there is no way to notify and restore injected "int 3" instructions in private pages.

The reverse would also apply. That is, there is no way to track and insert code breakpoints in private pages after they are created. This is another reason why code breakpoint insertion/removal is not reliable, due to the timing issue (when private pages are created and when code breakpoints are inserted or removed).

To fix this, we could introduce a complicated internal data structure and notification scheme to track and notify all private pages for code breakpoint changes, and add an execution suspend/resume mechanism to suspend an application/driver execution, then add/remove code breakpoints, and then resume execution. Unfortunately, it is difficult to make all these atomic. Thus similar race condition conditions could potentially surface.

The kernel debugger output listed below illustrates this problem.

kd> bp ntdll!EtwNotificationRegister+7
kd> g

... // start user-mode debugging session using ?ntsd ?d ?? command

CommandLine: typeperf "\processor(_total)\% processor time"

(6ac.b14): Break instruction exception - code 80000003 (first chance)
eax=00000000 ebx=00000000 ecx=0022f690 edx=77930f34 esi=fffffffe edi=77995d14
eip=77912ea8 esp=0022f6a8 ebp=0022f6d8 iopl=0         nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000             efl=00000246
ntdll!DbgBreakPoint:
77912ea8 cc              int     3
0:000> u ntdll!EtwNotificationRegister
ntdll!EtwNotificationRegister:
778ffe3d 8bff            mov     edi,edi
778ffe3f 55              push    ebp
778ffe40 8bec            mov     ebp,esp
778ffe42 51              push    ecx
778ffe43 51              push    ecx
778ffe44 cc              int     3 // KD code breakpoint shows here when user-mode application uses global code stream.
778ffe45 7d08            jge     ntdll!EtwNotificationRegister+0x12 (778ffe4f)
778ffe47 005356          add     byte ptr [ebx+56h],dl
0:000> .breakin

Break instruction exception - code 80000003 (first chance)
nt!RtlpBreakWithStatusInstruction:
81c81760 cc int 3
kd> bl
0 e 778ffe44 0001 (0001) ntdll!EtwNotificationRegister+0x7

kd> bd 0 // disable KD code breakpoint
kd> bl
0 d 778ffe44 0001 (0001) ntdll!EtwNotificationRegister+0x7

kd> g

0:000> u ntdll!EtwNotificationRegister
ntdll!EtwNotificationRegister:
778ffe3d 8bff            mov     edi,edi
778ffe3f 55              push    ebp
778ffe40 8bec            mov     ebp,esp
778ffe42 51              push    ecx
778ffe43 51              push    ecx
778ffe44 837d0800        cmp     dword ptr [ebp+8],0 // everything back to normal
778ffe48 53              push    ebx
778ffe49 56              push    esi
0:000> .breakin

Break instruction exception - code 80000003 (first chance)
nt!RtlpBreakWithStatusInstruction:
81c81760 cc int 3
kd> be 0 // re-enable KD code breakpoint
kd> bl
0 e 778ffe44 0001 (0001) ntdll!EtwNotificationRegister+0x7

kd> g

0:000> u ntdll!EtwNotificationRegister
ntdll!EtwNotificationRegister:
778ffe3d 8bff            mov     edi,edi
778ffe3f 55              push    ebp
778ffe40 8bec            mov     ebp,esp
778ffe42 51              push    ecx
778ffe43 51              push    ecx
778ffe44 cc              int     3 // Again, KD code breakpoint shows as this is from global code stream.
778ffe45 7d08            jge     ntdll!EtwNotificationRegister+0x12 (778ffe4f)
778ffe47 005356          add     byte ptr [ebx+56h],dl

// Set code breakpoint from ntsd to the same function, force private page created.
// The private code page would carry ?int 3? instruction inserted in global code stream for KD code breakpoint.

0:000> bp ntdll!EtwNotificationRegister
0:000> bp typeperf!ParseCmd
...
0:000> bl
0 e 778ffe3d     0001 (0001) 0:**** ntdll!EtwNotificationRegister
1 e 0050607f     0001 (0001) 0:**** typeperf!ParseCmd
0:000> g
Breakpoint 0 hit
eax=000e4bc8 ebx=00000000 ecx=77932447 edx=00000008 esi=000e4bc8 edi=00000000
eip=778ffe3d esp=0022ed9c ebp=0022edc0 iopl=0         nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000             efl=00000202
ntdll!EtwNotificationRegister:
778ffe3d 8bff            mov     edi,edi
0:000> bl
0 e 778ffe3d     0001 (0001) 0:**** ntdll!EtwNotificationRegister
1 e 0050607f     0001 (0001) 0:**** typeperf!ParseCmd
0:000> u ntdll!EtwNotificationRegister
ntdll!EtwNotificationRegister:
778ffe3d 8bff            mov     edi,edi
778ffe3f 55              push    ebp
778ffe40 8bec            mov     ebp,esp
778ffe42 51              push    ecx
778ffe43 51              push    ecx
778ffe44 cc              int     3
778ffe45 7d08            jge     ntdll!EtwNotificationRegister+0x12 (778ffe4f)
778ffe47 005356          add     byte ptr [ebx+56h],dl
0:000> .breakin

Break instruction exception - code 80000003 (first chance)
nt!RtlpBreakWithStatusInstruction:
81c81760 cc int 3
kd> bl
0 e 778ffe44 0001 (0001) ntdll!EtwNotificationRegister+0x7

kd> bd 0 // disable KD code breakpoint; this reverts ?int 3? instruction in global code stream; but ?
kd> bl
0 d 778ffe44 0001 (0001) ntdll!EtwNotificationRegister+0x7

kd> g
0:000> u ntdll!EtwNotificationRegister
ntdll!EtwNotificationRegister:
778ffe3d 8bff            mov     edi,edi
778ffe3f 55              push    ebp
778ffe40 8bec            mov     ebp,esp
778ffe42 51              push    ecx
778ffe43 51              push    ecx
778ffe44 cc              int     3 // The code is in private memory now, reverting to the global code stream will not affect this.
778ffe45 7d08            jge     ntdll!EtwNotificationRegister+0x12 (778ffe4f)
778ffe47 005356          add     byte ptr [ebx+56h],dl
0:000>

Problem 3) When execution resumes after a code breakpoint is hit.

Another race condition happens when the debugger resumes execution after a code breakpoint is in pageable or non-pageable code.

Once the system or application halts execution due to the code breakpoint, the debugger suspends execution and removes or restores all code breakpoints before showing the prompt in the debugger. This is why the "u" (unassembler) command does not show the artificial "int 3" instruction at the code breakpoint location.

Once the developer instructs the debugger to resume execution , the following sequence of events happens. To simplify, we only consider here operations that are directly related to code breakpoints:

The debugger loops through all added breakpoints and re-introduces the artificial "int 3" instruction for all code breakpoints not matching the one that triggered the last break. For the code breakpoint that triggered the last break, the debugger will not insert the "int 3" instruction (otherwise execution will still stop at the same address as IP currently points to); instead the debugger will prepare to execute a single-step.

The debugger then resumes the system or application execution.

Once execution hits a single-step break, the debugger suspends execution and then caches or inserts the "int 3" instruction at the location of the code breakpoint it last hit.

The debugger then resumes execution.

Between step 2 and step 3, threads (and in the kernel debugging case, threads in other processes) could execute past the code breakpoint without hitting it.

In order to resolve this problem, the debugger needs a way to resume execution only on the thread that hits the code breakpoint (all other processes and threads are still suspended) in step 2, and resume execution of all threads in step 4.Because of these problems, we?ve concluded that there are fundamental flaws in the current design and implementation, and they cannot be easily fixed or patched in the Windows kernel or in the Windows debugger package.

At first we investigated whether we could fix the original problem by moving step 5 to an earlier step (for example, the page is still in transition) so that KD could inject an "int 3" instruction before the code stream pages are resident and ready for use. This high-level proposed fix would need complicated code changes, might not totally fix the initial problem, and certainly will not fix all other issues. After considerable review, it became apparent that a solely software based solution seems to be impractical.

Course of Action

As with any problem which doesn?t have a ready remedy, the first step is transparency: Document the findings and describe a work-around.

To that end, developers should be aware of this issue and use the "ba e1 <address>" command to set code breakpoints if breakpoint hitting is critical and cannot be missed. This will instruct the debugger to insert a hardware access breakpoint for execution on that address, which is not subject to the limitations outlined in this article. For details on how to have the debugger automatically convert your software breakpoints to hardware breakpoints, see the accompanying the Automatic Conversion of Breakpoints sidebar, below.

Jen-Lung Chiu is a Senior Developer in the Debugging Tools for Windows group, where he works on all things related to debugging. Jen-Lung can be reached at windbgfb@microsoft.com or alternatively via the OSR WINDBG list where he spends time answering questions about the weather in Redmond.

Bryce Jonasson is a Developer/Manager of the Debugging Tools for Windows, Application Verifier, and Driver Verifier groups.Bryce can be reached at the windbgfb@microsoft.com alias, OSR?s WINDBG list, and other Windows development forums.

Automatic Conversion of Breakpoints

In an attempt to address the issues described in Kernel Debugger Missing Breakpoints a new command has been added to the debugger that will automatically convert your bm/bp/bu breakpoints to ba breakpoints. As described in the article, the debugger is limited by the number of debug registers available on the platform, thus not all of your breakpoints will be converted and you may still experience the issues described in the article.

If you'd like to try out the new command, you can execute .allow_bp_ba_convert in the command window. The command takes a single parameter, either 1 to allow the conversion or 0 to disallow the conversion. The default behavior is that conversion is not allowed.

0: kd> .allow_bp_ba_convert 1
Kernel debugger internally converts bm/bp/bu breakpoints to ba breakpoints: allowed

0: kd> .allow_bp_ba_convert 0
Kernel debugger internally converts bm/bp/bu breakpoints to ba breakpoints: disallowed

User Comments
Rate this article and give us feedback. Do you find anything missing? Share your opinion with the community!
Post Your Comment

	Post Your Comments.
	Print this article.
	Email this article.