Logo Search packages:      
Sourcecode: virtualbox-ose version File versions  Download package

PGM - The Page Manager and Monitor

See also:
The Page Monitor / Manager API, PGM Shadow Page Pool, PGM Physical Guest Memory Management.

Paging Modes

There are three memory contexts: Host Context (HC), Guest Context (GC) and intermediate context. When talking about paging HC can also be refered to as "host paging", and GC refered to as "shadow paging".

We define three basic paging modes: 32-bit, PAE and AMD64. The host paging mode is defined by the host operating system. The mode used in the shadow paging mode depends on the host paging mode and what the mode the guest is currently in. The following relation between the two is defined:

     Host > 32-bit |  PAE   | AMD64  |
   Guest  |        |        |        |
   32-bit   32-bit    PAE     PAE
   PAE       PAE      PAE     PAE
   AMD64    AMD64    AMD64    AMD64

All configuration except those in the diagonal (upper left) are expected to require special effort from the switcher (i.e. a bit slower).

The Shadow Memory Context


Because of guest context mappings requires PDPT and PML4 entries to allow writing on AMD64, the two upper levels will have fixed flags whatever the guest is thinking of using there. So, when shadowing the PD level we will calculate the effective flags of PD and all the higher levels. In legacy PAE mode this only applies to the PWT and PCD bits (the rest are ignored/reserved/MBZ). We will ignore those bits for the present.

The Intermediate Memory Context

The world switch goes thru an intermediate memory context which purpose it is to provide different mappings of the switcher code. All guest mappings are also present in this context.

The switcher code is mapped at the same location as on the host, at an identity mapped location (physical equals virtual address), and at the hypervisor location. The identity mapped location is for when the world switches that involves disabling paging.

PGM maintain page tables for 32-bit, PAE and AMD64 paging modes. This simplifies switching guest CPU mode and consistency at the cost of more code to do the work. All memory use for those page tables is located below 4GB (this includes page tables for guest context mappings).

Guest Context Mappings

During assignment and relocation of a guest context mapping the intermediate memory context is used to verify the new location.

Guest context mappings are currently restricted to below 4GB, for reasons of simplicity. This may change when we implement AMD64 support.


Differences Between Legacy PAE and Long Mode PAE

The differences between legacy PAE and long mode PAE are:
  1. PDPE bits 1, 2, 5 and 6 are defined differently. In leagcy mode they are all marked down as must-be-zero, while in long mode 1, 2 and 5 have the usual meanings while 6 is ignored (AMD). This means that upon switching to legacy PAE mode we'll have to clear these bits and when going to long mode they must be set. This applies to both intermediate and shadow contexts, however we don't need to do it for the intermediate one since we're executing with CR0.WP at that time.
  2. CR3 allows a 32-byte aligned address in legacy mode, while in long mode a page aligned one is required.

Access Handlers


Virtual Access Handlers


Virtual Access Handlers

We currently implement three types of virtual access handlers: ALL, WRITE and HYPERVISOR (WRITE). See PGMVIRTHANDLERTYPE for some more details.

The HYPERVISOR access handlers is kept in a separate tree since it doesn't apply to physical pages (PGMTREES::HyperVirtHandlers) and only needs to be consulted in a special #PF case. The ALL and WRITE are in the PGMTREES::VirtHandlers tree, the rest of this section is going to be about these handlers.

We'll go thru the life cycle of a handler and try make sense of it all, don't know how successfull this is gonna be...

1. A handler is registered thru the PGMR3HandlerVirtualRegister and PGMHandlerVirtualRegisterEx APIs. We check for conflicting virtual handlers and create a new node that is inserted into the AVL tree (range key). Then a full PGM resync is flagged (clear pool, sync cr3, update virtual bit of PGMPAGE).

2. The following PGMSyncCR3/SyncCR3 operation will first make invoke HandlerVirtualUpdate.

2a. HandlerVirtualUpdate will will lookup all the pages covered by virtual handlers via the current guest CR3 and update the physical page -> virtual handler translation. Needless to say, this doesn't exactly scale very well. If any changes are detected, it will flag a virtual bit update just like we did on registration. PGMPHYS pages with changes will have their virtual handler state reset to NONE.

2b. The virtual bit update process will iterate all the pages covered by all the virtual handlers and update the PGMPAGE virtual handler state to the max of all virtual handlers on that page.

2c. Back in SyncCR3 we will now flush the entire shadow page cache to make sure we don't miss any alias mappings of the monitored pages.

2d. SyncCR3 will then proceed with syncing the CR3 table.

3. #PF(np,read) on a page in the range. This will cause it to be synced read-only and resumed if it's a WRITE handler. If it's an ALL handler we will call the handlers like in the next step. If the physical mapping has changed we will - some time in the future - perform a handler callback (optional) and update the physical -> virtual handler cache.

4. #PF(,write) on a page in the range. This will cause the handler to be invoked.

5. The guest invalidates the page and changes the physical backing or unmaps it. This should cause the invalidation callback to be invoked (it might not yet be 100% perfect). Exactly what happens next... is this where we mess up and end up out of sync for a while?

6. The handler is deregistered by the client via PGMHandlerVirtualDeregister. We will then set all PGMPAGEs in the physical -> virtual handler cache for this handler to NONE and trigger a full PGM resync (basically the same as int step 1). Which means 2 is executed again.


There is a bunch of things that needs to be done to make the virtual handlers work 100% correctly and work more efficiently.

The first bit hasn't been implemented yet because it's going to slow the whole mess down even more, and besides it seems to be working reliably for our current uses. OTOH, some of the optimizations might end up more or less implementing the missing bits, so we'll see.

On the optimization side, the first thing to do is to try avoid unnecessary cache flushing. Then try team up with the shadowing code to track changes in mappings by means of access to them (shadow in), updates to shadows pages, invlpg, and shadow PT discarding (perhaps).

Some idea that have popped up for optimization for current and new features:

Generated by  Doxygen 1.6.0   Back to index