IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
DSOs were held on a list for fast iteration and in an rbtree for fast
finds.
Switch to using a lazily sorted array where iteration is just iterating
through the array and binary searches are the same complexity as
searching the rbtree.
The find may need to sort the array first which does increase the
complexity, but add operations have lower complexity and overall the
complexity should remain about the same.
The set name operations on the dso just records that the array is no
longer sorted, avoiding complexity in rebalancing the rbtree.
Tighter locking discipline is enforced to avoid the array being resorted
while long and short names or ids are changed.
The array is smaller in size, replacing 6 pointers with 2, and so even
with extra allocated space in the array, the array may be 50%
unoccupied, the memory saving should be at least 2x.
Committer testing:
On a previous version of this patchset we were getting a lot of warnings
about deleting a DSO still on a list, now it is ok:
root@x1:~# perf probe -l
root@x1:~# perf probe finish_task_switch
Added new event:
probe:finish_task_switch (on finish_task_switch)
You can now use it in all perf tools, such as:
perf record -e probe:finish_task_switch -aR sleep 1
root@x1:~# perf probe -l
probe:finish_task_switch (on finish_task_switch@kernel/sched/core.c)
root@x1:~# perf trace -e probe:finish_task_switch/max-stack=8/ --max-events=1
0.000 migration/0/19 probe:finish_task_switch(__probe_ip: -1894408688)
finish_task_switch.isra.0 ([kernel.kallsyms])
__schedule ([kernel.kallsyms])
schedule ([kernel.kallsyms])
smpboot_thread_fn ([kernel.kallsyms])
kthread ([kernel.kallsyms])
ret_from_fork ([kernel.kallsyms])
ret_from_fork_asm ([kernel.kallsyms])
root@x1:~#
root@x1:~# perf probe -d probe:*
Removed event: probe:finish_task_switch
root@x1:~# perf probe -l
root@x1:~#
I also ran the full 'perf test' suite after applying this one, no
regressions.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Chengen Du <chengen.du@canonical.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Li Dong <lidong@vivo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20240504213803.218974-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
And take it into account when looking up DSOs when we have the dso_id
fields obtained from somewhere, like from PERF_RECORD_MMAP2 records.
Instances of struct map pointing to the same DSO pathname but with
anything in dso_id different are in fact different DSOs, so better have
different 'struct dso' instances to reflect that. At some point we may
want to get copies of the contents of the different objects if we want
to do correct annotation or other analysis.
With this we get 'struct map' 24 bytes leaner:
$ pahole -C map ~/bin/perf
struct map {
union {
struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
struct list_head node; /* 0 16 */
} __attribute__((__aligned__(8))); /* 0 24 */
u64 start; /* 24 8 */
u64 end; /* 32 8 */
_Bool erange_warned:1; /* 40: 0 1 */
_Bool priv:1; /* 40: 1 1 */
/* XXX 6 bits hole, try to pack */
/* XXX 3 bytes hole, try to pack */
u32 prot; /* 44 4 */
u64 pgoff; /* 48 8 */
u64 reloc; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u64 (*map_ip)(struct map *, u64); /* 64 8 */
u64 (*unmap_ip)(struct map *, u64); /* 72 8 */
struct dso * dso; /* 80 8 */
refcount_t refcnt; /* 88 4 */
u32 flags; /* 92 4 */
/* size: 96, cachelines: 2, members: 13 */
/* sum members: 92, holes: 1, sum holes: 3 */
/* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */
/* forced alignments: 1 */
/* last cacheline: 32 bytes */
} __attribute__((__aligned__(8)));
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-g4hxxmraplo7wfjmk384mfsb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>