Opened 13 years ago

Last modified 13 years ago

#32 started defect

Superpages are not used as often as desired

Reported by: john Owned by: john
Priority: critical Component: kernel
Version: Keywords: vm
Cc: Blocked By:
Blocking: Parent Tickets:
P4 Branch: GIT Branch:
FreeBSD PR: Due Date:

Description

At work we've found that there are some use cases where superpages are not used as often as we'd like. Specifically, if we create several shared memory objects (via shm_open()) that are not exact multiples of a superpage and map them, they are assigned virtually contiguous addresses, meaning that they do not each start on a superpage boundary. This seems to be related to the fact that we only specify VMFS_ALIGNED_SPACE for device VM objects.

Secondly, if we pre-zero a file before mapping it, the file uses random pages from all over the place. This may be harder to fix though alc@ may have a suggestion that might help.

Attachments (1)

spagefile.c (3.7 KB) - added by john 13 years ago.
Test case

Download all attachments as: .zip

Change History (17)

comment:1 Changed 13 years ago by john

  • Status changed from new to accepted

comment:2 Changed 13 years ago by john

  • Status changed from accepted to started

comment:3 Changed 13 years ago by john

Places that call vm_map_find() that should possibly use VMFS_ALIGNED_SPACE:

  • sysv shm (kern/sysv_shm.c)
  • kmem_alloc_nofault? (need to check the callers of this perhaps)
  • vm_mmap

comment:4 Changed 13 years ago by john

Output of test case on stock kernel (idle machine after boot):

Normal page size: 4k
Super page size: 2M
super page sized shm (2): expected 1024 super / 0 small, found 512 / 512
truncated file: expected 16384 super / 0 small, found 15872 / 471
zeroed file: expected 32768 super / 0 small, found 0 / 32768

Changed 13 years ago by john

Test case

comment:5 Changed 13 years ago by john

Bah, testcase had a bug and didn't report superpage misalignment. With that fixed it now outputs:

Normal page size: 4k
Super page size: 2M
super page sized shm (2): start address is not super aligned
truncated file: start address is not super aligned
truncated file: expected 15872 super / 512 small, found 15872 / 497
zeroed file: start address is not super aligned
zeroed file: expected 32256 super / 512 small, found 0 / 32768

comment:6 Changed 13 years ago by john

As expected, changing vm_mmap() to always specify VMFS_ALIGNED_SPACE fixed the "start address is not super aligned" cases above. The output with that is now:

Normal page size: 4k
Super page size: 2M
truncated file: expected 16384 super / 0 small, found 15360 / 979
zeroed file: expected 32768 super / 0 small, found 0 / 32768

I don't know what's up with the 979 pages yet (are they at the front, back, etc.)

comment:7 Changed 13 years ago by john

I tried a suggestion from Alan to force page coloring on for vnode objects but it didn't help:

Normal page size: 4k
Super page size: 2M
truncated file: expected 16384 super / 0 small, found 15360 / 1009
zeroed file: expected 32768 super / 0 small, found 0 / 32768

comment:8 Changed 13 years ago by john

I modified my test program to dump out a map of the individual pages (but in a RLE format) to try to zero in on the truncated file test. This gave me these odd results:

truncated file: expected 16384 super / 0 small, found 15360 / 1009
Dump of truncated file:

15: not in RAM

497: small

15360: super

512: small

I then added a sanity dump before the prefault step (so the file should be empty and have no pages) and I got this:

Dump of before prefault:

16380: not in RAM

4: small

truncated file: expected 16384 super / 0 small, found 15360 / 1009
Dump of truncated file:

15: not in RAM

497: small

15360: super

512: small

I tried changing the prefaulting code to touch every byte instead of the first byte in each page to see if I somehow had a bug in my loop, but it made no difference.

Version 0, edited 13 years ago by john (next)

comment:9 Changed 13 years ago by john

Using madvise() to disable prefetching on faults (via MADV_RANDOM) improved things:

before prefault: expected 16384 super / 0 small, found 0 / 4
Dump of before prefault:
         16380: not in RAM
             4: small
truncated file: expected 16384 super / 0 small, found 15872 / 512
Dump of truncated file:
         15872: super
           512: small

Using MADV_SEQUENTIAL just forced all the pages out:

before prefault: expected 16384 super / 0 small, found 0 / 4
Dump of before prefault:
         16380: not in RAM
             4: small
truncated file: expected 16384 super / 0 small, found 0 / 5
Dump of truncated file:
         16379: not in RAM
             5: small

comment:10 Changed 13 years ago by john

Re-reading the first super page after the first pre-fault run fixes the first page without needing the madvise(), but the last page is still stuck as a non-super page:

before prefault: expected 16384 super / 0 small, found 0 / 4
Dump before prefault:
         16380: not in RAM
             4: small
truncated file: expected 16384 super / 0 small, found 15360 / 1009
Dump truncated file:
            15: not in RAM
           497: small
         15360: super
           512: small
after second prefault: expected 16384 super / 0 small, found 15872 / 512
Dump after second prefault:
         15872: super
           512: small

Last edited 13 years ago by john (previous) (diff)

comment:11 Changed 13 years ago by john

So the last 4 pages being allocated at the beginning are the results of ftruncate() on the file. UFS insists on allocating a block at the end of the file, and these pages are not part of a superpage, so the last page is not superpage-aligned.

comment:12 Changed 13 years ago by john

The pages for this last block are allocated (I believe) via allocbuf() which just calls vm_page_alloc() in a loop. I think that this doesn't attempt to create a reservation.

comment:13 Changed 13 years ago by john

I tried seeking out and writing out the last 2MB hoping it would trigger a superpage, but no such luck.

comment:14 Changed 13 years ago by john

  • Priority changed from major to critical

comment:15 Changed 13 years ago by john

I ended up implementing a new VMFS_OPTIMAL_SPACE as suggested by alc@ to try harder to reuse superpages in subsequent mappings of objects already using superpages in r253471 and r253620.

I also implemented a MAP_ALIGN flag ala Solaris as well as a MAP_ALIGNED flag ala NetBSD (source). Internally, the MAP_ALIGNED flag works better, and when I asked Jason Evans he preferred that API to MAP_ALIGN.

I still need to test the MAP_ALIGNED patch and send it to alc@ for review.

Another open issue I have found is that we do not use superpages for read-only mappings. The reason is that the prefaulting code during a page fault uses pmap_enter_quick() which adds PTEs that do not have PG_A set. pmap_enter_quick() doesn't currently check to see if it could do a promotion, but it also would always fail if it did so since at least one page won't have PG_A set. Haven't heard back from alc@ on if that should be resolved somehow.

The fact that file I/O doesn't succeed in user super page is also still an open question. vm_page_alloc() should in fact attempt to use a super page if OBJ_COLORED is set as it should call vm_reserv_alloc_page() before calling vm_phys_alloc_pages(). I'm not sure why alc@'s hack doesn't succeed in using super pages after a write(2). Hmm, maybe the fact that we won't use superpages for a tail of a file. More investigation is required.

comment:16 Changed 13 years ago by john

Committed MAP_ALIGNED patch in r254430.

Note: See TracTickets for help on using tickets.