linux/include/trace/events
Zach O'Keefe 58ac9a8993 mm/khugepaged: attempt to map file/shmem-backed pte-mapped THPs by pmds
The main benefit of THPs are that they can be mapped at the pmd level,
increasing the likelihood of TLB hit and spending less cycles in page
table walks.  pte-mapped hugepages - that is - hugepage-aligned compound
pages of order HPAGE_PMD_ORDER mapped by ptes - although being contiguous
in physical memory, don't have this advantage.  In fact, one could argue
they are detrimental to system performance overall since they occupy a
precious hugepage-aligned/sized region of physical memory that could
otherwise be used more effectively.  Additionally, pte-mapped hugepages
can be the cheapest memory to collapse for khugepaged since no new
hugepage allocation or copying of memory contents is necessary - we only
need to update the mapping page tables.

In the anonymous collapse path, we are able to collapse pte-mapped
hugepages (albeit, perhaps suboptimally), but the file/shmem path makes no
effort when compound pages (of any order) are encountered.

Identify pte-mapped hugepages in the file/shmem collapse path.  The
final step of which makes a racy check of the value of the pmd to
ensure it maps a pte table.  This should be fine, since races that
result in false-positive (i.e.  attempt collapse even though we
shouldn't) will fail later in collapse_pte_mapped_thp() once we
actually lock mmap_lock and reinspect the pmd value.  Races that result
in false-negatives (i.e.  where we decide to not attempt collapse, but
should have) shouldn't be an issue, since in the worst case, we do
nothing - which is what we've done up to this point.  We make a similar
check in retract_page_tables().  If we do think we've found a
pte-mapped hugepgae in khugepaged context, attempt to update page
tables mapping this hugepage.

Note that these collapses still count towards the
/sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed counter,
and if the pte-mapped hugepage was also mapped into multiple process'
address spaces, could be incremented for each page table update.  Since we
increment the counter when a pte-mapped hugepage is successfully added to
the list of to-collapse pte-mapped THPs, it's possible that we never
actually update the page table either.  This is different from how
file/shmem pages_collapsed accounting works today where only a successful
page cache update is counted (it's also possible here that no page tables
are actually changed).  Though it incurs some slop, this is preferred to
either not accounting for the event at all, or plumbing through data in
struct mm_slot on whether to account for the collapse or not.

Also note that work still needs to be done to support arbitrary compound
pages, and that this should all be converted to using folios.

[shy828301@gmail.com: Spelling mistake, update comment, and add Documentation]
  Link: https://lore.kernel.org/linux-mm/CAHbLzkpHwZxFzjfX9nxVoRhzup8WMjMfyL6Xiq8mZ9M-N3ombw@mail.gmail.com/
Link: https://lkml.kernel.org/r/20220907144521.3115321-3-zokeefe@google.com
Link: https://lkml.kernel.org/r/20220922224046.1143204-3-zokeefe@google.com
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03 14:03:33 -07:00
..
9p.h 9p fid refcount: add a 9p_fid_ref tracepoint 2022-07-02 18:52:21 +09:00
afs.h afs: Fix access after dec in put functions 2022-08-02 18:21:29 +01:00
alarmtimer.h
asoc.h ASoC: soc-core: tidyup jack.h 2020-11-30 12:54:01 +00:00
avc.h
bcache.h block: remove superfluous param in blk_fill_rwbs() 2021-02-22 06:37:41 -07:00
block.h block: introduce block_rq_error tracepoint 2022-02-11 10:00:16 -07:00
bpf_test_run.h
bridge.h
btrfs.h btrfs: add tracepoints for ordered extents 2022-07-25 17:45:34 +02:00
cachefiles.h cachefiles: add tracepoints for on-demand read mode 2022-05-18 00:11:18 +08:00
cgroup.h cgroup: Trace event cgroup id fields should be u64 2021-12-01 07:23:35 -10:00
clk.h clk: Trace clk_set_rate() "range" functions 2020-12-17 01:54:31 -08:00
cma.h mm, tracing: unify PFN format strings 2021-06-29 10:53:52 -07:00
compaction.h tracing: incorrect gfp_t conversion 2022-05-13 07:20:18 -07:00
context_tracking.h
cpuhp.h
damon.h mm/damon: hide kernel pointer from tracepoint event 2022-01-15 16:30:33 +02:00
devfreq.h PM / devfreq: Add tracepoint for frequency changes 2020-10-26 10:52:37 +09:00
devlink.h tracing: devlink: Use static array for string in devlink_trap_report event 2022-07-14 15:05:57 -04:00
dlm.h fs: dlm: add resource name to tracepoints 2022-06-24 11:53:09 -05:00
dma_fence.h treewide: Add missing semicolons to __assign_str uses 2021-06-30 09:19:14 -04:00
erofs.h erofs: clean up erofs_map_blocks tracepoints 2021-12-09 10:02:10 +08:00
error_report.h panic: use error_report_end tracepoint on warnings 2022-01-20 08:52:55 +02:00
ext4.h fs: Remove flags parameter from aops->write_begin 2022-05-08 14:28:19 -04:00
f2fs.h fs/f2fs: Use the enum req_op and blk_opf_t types 2022-07-14 12:14:32 -06:00
fib6.h tracing/ipv4/ipv6: Use static array for name field in fib*_lookup_table event 2022-07-15 13:35:59 -04:00
fib.h tracing/ipv4/ipv6: Use static array for name field in fib*_lookup_table event 2022-07-15 13:35:59 -04:00
filelock.h
filemap.h filemap: Convert tracing of page cache operations to folio 2022-01-04 13:15:33 -05:00
fs_dax.h
fs.h NFS: Move generic FS show macros to global header 2021-11-02 12:31:23 -04:00
fscache.h fscache: add tracepoint when failing cookie 2022-08-09 14:13:59 +01:00
fsi_master_aspeed.h fsi: Add trace events in initialization path 2022-02-21 19:38:54 +10:30
fsi_master_ast_cf.h
fsi_master_gpio.h
fsi.h fsi: Add trace events in initialization path 2022-02-21 19:38:54 +10:30
gpio.h
gpu_mem.h
host1x.h
huge_memory.h mm/khugepaged: attempt to map file/shmem-backed pte-mapped THPs by pmds 2022-10-03 14:03:33 -07:00
hwmon.h
i2c_slave.h i2c: add tracepoints for I2C slave events 2022-03-20 00:11:05 +01:00
i2c.h
ib_mad.h
ib_umad.h
initcall.h
intel_ifs.h trace: platform/x86/intel/ifs: Add trace point to track Intel IFS operations 2022-05-12 15:35:29 +02:00
intel_ish.h
intel-sst.h
io_uring.h io_uring: Add tracepoint for short writes 2022-07-24 18:39:32 -06:00
iocost.h blk-iocost: tracing: atomic64_read(&ioc->vtime_rate) is assigned an extra semicolon 2022-07-12 16:36:37 -04:00
iommu.h iommu: Log iova range in map/unmap trace events 2021-12-06 11:59:31 +01:00
ipi.h
irq_matrix.h
irq.h
iscsi.h scsi: iscsi: tracing: Use the new __vstring() helper 2022-07-19 11:20:25 -04:00
jbd2.h fs/jbd2: Fix the documentation of the jbd2_write_superblock() callers 2022-07-14 12:14:32 -06:00
kmem.h mm/tracing: add 'accounted' entry into output of allocation tracepoints 2022-07-04 17:11:27 +02:00
kvm.h KVM: x86/mmu: rename trace function name for asynchronous page fault 2022-08-10 15:08:26 -04:00
kyber.h kyber: avoid q->disk dereferences in trace points 2021-10-15 21:02:57 -06:00
libata.h ata: libata: add qc->flags in ata_qc_complete_template tracepoint 2022-06-17 16:30:03 +09:00
lock.h locking/mutex: Make contention tracepoints more consistent wrt adaptive spinning 2022-04-05 10:24:36 +02:00
maple_tree.h Maple Tree: add new data structure 2022-09-26 19:46:13 -07:00
mce.h
mctp.h mctp: Add SIOCMCTP{ALLOC,DROP}TAG ioctls for tag control 2022-02-09 12:00:11 +00:00
mdio.h
migrate.h mm/migration: add trace events for base page and HugeTLB migrations 2022-03-24 19:06:45 -07:00
mlxsw.h
mmap_lock.h mm: mmap_lock: use DECLARE_EVENT_CLASS and DEFINE_EVENT_FN 2021-11-06 13:30:36 -07:00
mmap.h mm: start tracking VMAs with maple tree 2022-09-26 19:46:14 -07:00
mmc.h
mmflags.h include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion" 2022-05-25 10:47:47 -07:00
module.h
mptcp.h mptcp: dump infinite_map field in mptcp_dump_mpext 2022-04-23 11:51:05 +01:00
napi.h
nbd.h
neigh.h neighbor: tracing: Have neigh_create event use __string() 2022-07-15 13:35:59 -04:00
net_probe_common.h
net.h net: Print hashed skb addresses for all net and qdisc events 2022-06-27 11:57:06 +01:00
netfs.h netfs: Add a function to consolidate beginning a read 2022-03-18 09:29:05 +00:00
netlink.h netlink: add tracepoint at NL_SET_ERR_MSG 2021-02-04 18:05:59 -08:00
nfs.h NFS: Move NFS protocol display macros to global header 2021-11-02 12:31:23 -04:00
nilfs2.h fs/nilfs2: Use the enum req_op and blk_opf_t types 2022-07-14 12:14:33 -06:00
nmi.h
objagg.h
oom.h
osnoise.h tracing: Fix spelling in osnoise tracer "interferences" -> "interference" 2021-06-28 14:12:27 -04:00
page_isolation.h
page_pool.h mm, tracing: unify PFN format strings 2021-06-29 10:53:52 -07:00
page_ref.h mm: introduce PAGEFLAGS_MASK to replace ((1UL << NR_PAGEFLAGS) - 1) 2021-09-08 11:50:24 -07:00
pagemap.h mm/lru: Convert __pagevec_lru_add_fn to take a folio 2021-10-18 07:49:40 -04:00
percpu.h include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace" 2022-05-25 10:47:48 -07:00
power_cpu_migrate.h
power.h cpuidle: Add cpu_idle_miss trace event 2022-08-03 17:50:58 +02:00
preemptirq.h
printk.h
pwc.h
pwm.h
qdisc.h net: Print hashed skb addresses for all net and qdisc events 2022-06-27 11:57:06 +01:00
qla.h scsi: qla2xxx: tracing: Use the new __vstring() helper 2022-07-19 11:20:25 -04:00
qrtr.h
rcu.h rcu: Refactor rcu_barrier() empty-list handling 2022-02-08 10:12:28 -08:00
rdma_core.h
rdma.h
regulator.h
rpcgss.h sunrpc: fix header include guard in trace header 2021-11-17 18:27:32 -05:00
rpcrdma.h A slow cycle for nfsd: mainly cleanup, including Neil's patch dropping 2021-11-10 16:45:54 -08:00
rpm.h
rseq.h
rtc.h
rv.h rv/monitor: Add the wwnr monitor 2022-07-30 14:01:30 -04:00
rwmmio.h lib: Add register read/write tracing support 2022-06-15 17:41:12 +02:00
rxrpc.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-05-23 21:19:17 -07:00
sched.h sched/tracing: Append prev_state to tp args instead 2022-05-12 00:37:11 +02:00
scmi.h include: trace: Add SCMI fast channel tracing 2022-07-04 14:28:43 +01:00
scsi.h scsi: trace: Print driver_tag and scheduler_tag in SCSI trace 2022-06-21 21:43:23 -04:00
sctp.h
signal.h
siox.h
skb.h net: skb: use auto-generation to convert skb drop reason to string 2022-06-07 12:51:41 +02:00
smbus.h
sock.h net: sock: tracing: Fix sock_exceed_buf_limit not to dereference stale pointer 2022-07-08 12:06:17 +01:00
spi.h spi: Enable tracing of the SPI setup CS selection 2021-05-26 21:22:13 +01:00
spmi.h spmi: trace: fix stack-out-of-bound access in SPMI tracing functions 2022-07-24 16:16:44 +02:00
sunrpc_base.h SUNRPC: Tracepoints should display tk_pid and cl_clid as a fixed-size field 2021-10-20 18:09:54 -04:00
sunrpc.h NFS client updates for Linux 5.20 2022-08-10 14:04:32 -07:00
sunvnet.h
swiotlb.h swiotlb: make the swiotlb_init interface more useful 2022-04-18 07:21:11 +02:00
syscalls.h
target.h
task.h
tcp.h tcp: Add tracepoint for tcp_set_ca_state 2022-04-07 20:33:15 -07:00
tegra_apb_dma.h
thermal_power_allocator.h
thermal_pressure.h arch_topology: Trace the update thermal pressure 2022-05-06 09:57:38 +02:00
thermal.h drivers/thermal/cpufreq_cooling : Refactor thermal_power_cpu_get_power tracing 2022-07-28 17:29:42 +02:00
thp.h mm/migration: add trace events for THP migrations 2022-03-24 19:06:45 -07:00
timer.h tracing/timer: Add missing argument documentation of trace points 2022-04-14 16:14:49 +02:00
tlb.h
udp.h
ufs.h scsi: ufs: core: Enable power management for wlun 2021-05-10 22:28:20 -04:00
v4l2.h
vb2.h
vmscan.h tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate 2022-05-19 14:08:55 -07:00
vsock_virtio_transport_common.h virtio/vsock: update trace event for SEQPACKET 2021-06-11 13:32:47 -07:00
wbt.h
workqueue.h workqueue: Fix type of cpu in trace event 2022-06-07 07:09:47 -10:00
writeback.h remove congestion tracking framework 2022-03-22 15:57:01 -07:00
xdp.h xdp: Extend xdp_redirect_map with broadcast support 2021-05-26 09:46:16 +02:00
xen.h x86/mm/tlb: Flush remote and local TLBs concurrently 2021-03-06 12:59:10 +01:00