195 lines
10 KiB
Diff
195 lines
10 KiB
Diff
From 00c006130c6786f2d48fb38eed1a4297e9d73944 Mon Sep 17 00:00:00 2001
|
|
From: John Garry <john.garry@huawei.com>
|
|
Date: Wed, 17 Jun 2020 17:01:54 +0800
|
|
Subject: [PATCH 104/201] perf pmu: Improve CPU core PMU HW event list ordering
|
|
|
|
mainline inclusion
|
|
from mainline-v5.9-rc1
|
|
commit 218ca91df47712a7b2b5a554e39c4a2c819a9e86
|
|
category: feature
|
|
bugzilla: https://gitee.com/openeuler/kernel/issues/I8C0CX
|
|
|
|
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=218ca91df47712a7b2b5a554e39c4a2c819a9e86
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
For perf list, the CPU core PMU HW event ordering is such that not all
|
|
events may will be listed adjacent - consider this example:
|
|
|
|
$ tools/perf/perf list
|
|
|
|
List of pre-defined events (to be used in -e):
|
|
|
|
duration_time [Tool event]
|
|
|
|
branch-instructions OR cpu/branch-instructions/ [Kernel PMU event]
|
|
branch-misses OR cpu/branch-misses/ [Kernel PMU event]
|
|
bus-cycles OR cpu/bus-cycles/ [Kernel PMU event]
|
|
cache-misses OR cpu/cache-misses/ [Kernel PMU event]
|
|
cache-references OR cpu/cache-references/ [Kernel PMU event]
|
|
cpu-cycles OR cpu/cpu-cycles/ [Kernel PMU event]
|
|
cstate_core/c3-residency/ [Kernel PMU event]
|
|
cstate_core/c6-residency/ [Kernel PMU event]
|
|
cstate_core/c7-residency/ [Kernel PMU event]
|
|
cstate_pkg/c2-residency/ [Kernel PMU event]
|
|
cstate_pkg/c3-residency/ [Kernel PMU event]
|
|
cstate_pkg/c6-residency/ [Kernel PMU event]
|
|
cstate_pkg/c7-residency/ [Kernel PMU event]
|
|
cycles-ct OR cpu/cycles-ct/ [Kernel PMU event]
|
|
cycles-t OR cpu/cycles-t/ [Kernel PMU event]
|
|
el-abort OR cpu/el-abort/ [Kernel PMU event]
|
|
el-capacity OR cpu/el-capacity/ [Kernel PMU event]
|
|
|
|
Notice in the above example how the cstate_core PMU events are mixed in
|
|
the middle of the CPU core events.
|
|
|
|
For my arm64 platform, all the uncore events get mixed in, making the list
|
|
very disorganised:
|
|
|
|
page-faults OR faults [Software event]
|
|
task-clock [Software event]
|
|
duration_time [Tool event]
|
|
L1-dcache-load-misses [Hardware cache event]
|
|
L1-dcache-loads [Hardware cache event]
|
|
L1-icache-load-misses [Hardware cache event]
|
|
L1-icache-loads [Hardware cache event]
|
|
branch-load-misses [Hardware cache event]
|
|
branch-loads [Hardware cache event]
|
|
dTLB-load-misses [Hardware cache event]
|
|
dTLB-loads [Hardware cache event]
|
|
iTLB-load-misses [Hardware cache event]
|
|
iTLB-loads [Hardware cache event]
|
|
br_mis_pred OR armv8_pmuv3_0/br_mis_pred/ [Kernel PMU event]
|
|
br_mis_pred_retired OR armv8_pmuv3_0/br_mis_pred_retired/ [Kernel PMU event]
|
|
br_pred OR armv8_pmuv3_0/br_pred/ [Kernel PMU event]
|
|
br_retired OR armv8_pmuv3_0/br_retired/ [Kernel PMU event]
|
|
br_return_retired OR armv8_pmuv3_0/br_return_retired/ [Kernel PMU event]
|
|
bus_access OR armv8_pmuv3_0/bus_access/ [Kernel PMU event]
|
|
bus_cycles OR armv8_pmuv3_0/bus_cycles/ [Kernel PMU event]
|
|
cid_write_retired OR armv8_pmuv3_0/cid_write_retired/ [Kernel PMU event]
|
|
cpu_cycles OR armv8_pmuv3_0/cpu_cycles/ [Kernel PMU event]
|
|
dtlb_walk OR armv8_pmuv3_0/dtlb_walk/ [Kernel PMU event]
|
|
exc_return OR armv8_pmuv3_0/exc_return/ [Kernel PMU event]
|
|
exc_taken OR armv8_pmuv3_0/exc_taken/ [Kernel PMU event]
|
|
hisi_sccl1_ddrc0/act_cmd/ [Kernel PMU event]
|
|
hisi_sccl1_ddrc0/flux_rcmd/ [Kernel PMU event]
|
|
hisi_sccl1_ddrc0/flux_rd/ [Kernel PMU event]
|
|
hisi_sccl1_ddrc0/flux_wcmd/ [Kernel PMU event]
|
|
hisi_sccl1_ddrc0/flux_wr/ [Kernel PMU event]
|
|
hisi_sccl1_ddrc0/pre_cmd/ [Kernel PMU event]
|
|
hisi_sccl1_ddrc0/rnk_chg/ [Kernel PMU event]
|
|
|
|
...
|
|
|
|
hisi_sccl7_l3c21/wr_hit_cpipe/ [Kernel PMU event]
|
|
hisi_sccl7_l3c21/wr_hit_spipe/ [Kernel PMU event]
|
|
hisi_sccl7_l3c21/wr_spipe/ [Kernel PMU event]
|
|
inst_retired OR armv8_pmuv3_0/inst_retired/ [Kernel PMU event]
|
|
inst_spec OR armv8_pmuv3_0/inst_spec/ [Kernel PMU event]
|
|
itlb_walk OR armv8_pmuv3_0/itlb_walk/ [Kernel PMU event]
|
|
l1d_cache OR armv8_pmuv3_0/l1d_cache/ [Kernel PMU event]
|
|
l1d_cache_refill OR armv8_pmuv3_0/l1d_cache_refill/ [Kernel PMU event]
|
|
l1d_cache_wb OR armv8_pmuv3_0/l1d_cache_wb/ [Kernel PMU event]
|
|
l1d_tlb OR armv8_pmuv3_0/l1d_tlb/ [Kernel PMU event]
|
|
l1d_tlb_refill OR armv8_pmuv3_0/l1d_tlb_refill/ [Kernel PMU event]
|
|
|
|
So the events are list alphabetically. However, CPU core event listing is
|
|
special from commit dc098b35b56f ("perf list: List kernel supplied event
|
|
aliases"), in that the alias and full event is shown (in that order).
|
|
As such, the core events may become sparse.
|
|
|
|
Improve this by grouping the CPU core events and ensure that they are
|
|
listed first for kernel PMU events. For the first example, above, this
|
|
now looks like:
|
|
|
|
duration_time [Tool event]
|
|
branch-instructions OR cpu/branch-instructions/ [Kernel PMU event]
|
|
branch-misses OR cpu/branch-misses/ [Kernel PMU event]
|
|
bus-cycles OR cpu/bus-cycles/ [Kernel PMU event]
|
|
cache-misses OR cpu/cache-misses/ [Kernel PMU event]
|
|
cache-references OR cpu/cache-references/ [Kernel PMU event]
|
|
cpu-cycles OR cpu/cpu-cycles/ [Kernel PMU event]
|
|
cycles-ct OR cpu/cycles-ct/ [Kernel PMU event]
|
|
cycles-t OR cpu/cycles-t/ [Kernel PMU event]
|
|
el-abort OR cpu/el-abort/ [Kernel PMU event]
|
|
el-capacity OR cpu/el-capacity/ [Kernel PMU event]
|
|
el-commit OR cpu/el-commit/ [Kernel PMU event]
|
|
el-conflict OR cpu/el-conflict/ [Kernel PMU event]
|
|
el-start OR cpu/el-start/ [Kernel PMU event]
|
|
instructions OR cpu/instructions/ [Kernel PMU event]
|
|
mem-loads OR cpu/mem-loads/ [Kernel PMU event]
|
|
mem-stores OR cpu/mem-stores/ [Kernel PMU event]
|
|
ref-cycles OR cpu/ref-cycles/ [Kernel PMU event]
|
|
topdown-fetch-bubbles OR cpu/topdown-fetch-bubbles/ [Kernel PMU event]
|
|
topdown-recovery-bubbles OR cpu/topdown-recovery-bubbles/ [Kernel PMU event]
|
|
topdown-slots-issued OR cpu/topdown-slots-issued/ [Kernel PMU event]
|
|
topdown-slots-retired OR cpu/topdown-slots-retired/ [Kernel PMU event]
|
|
topdown-total-slots OR cpu/topdown-total-slots/ [Kernel PMU event]
|
|
tx-abort OR cpu/tx-abort/ [Kernel PMU event]
|
|
tx-capacity OR cpu/tx-capacity/ [Kernel PMU event]
|
|
tx-commit OR cpu/tx-commit/ [Kernel PMU event]
|
|
tx-conflict OR cpu/tx-conflict/ [Kernel PMU event]
|
|
tx-start OR cpu/tx-start/ [Kernel PMU event]
|
|
cstate_core/c3-residency/ [Kernel PMU event]
|
|
cstate_core/c6-residency/ [Kernel PMU event]
|
|
cstate_core/c7-residency/ [Kernel PMU event]
|
|
cstate_pkg/c2-residency/ [Kernel PMU event]
|
|
cstate_pkg/c3-residency/ [Kernel PMU event]
|
|
cstate_pkg/c6-residency/ [Kernel PMU event]
|
|
cstate_pkg/c7-residency/ [Kernel PMU event]
|
|
|
|
Signed-off-by: John Garry <john.garry@huawei.com>
|
|
Acked-by: Jiri Olsa <jolsa@redhat.com>
|
|
Acked-by: Namhyung Kim <namhyung@kernel.org>
|
|
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
|
|
Cc: Andi Kleen <ak@linux.intel.com>
|
|
Cc: Ian Rogers <irogers@google.com>
|
|
Cc: Mark Rutland <mark.rutland@arm.com>
|
|
Cc: Peter Zijlstra <peterz@infradead.org>
|
|
Cc: Will Deacon <will@kernel.org>
|
|
Cc: linux-arm-kernel@lists.infradead.org
|
|
Cc: linuxarm@huawei.com
|
|
Link: http://lore.kernel.org/lkml/1592384514-119954-3-git-send-email-john.garry@huawei.com
|
|
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
|
|
---
|
|
tools/perf/util/pmu.c | 7 +++++++
|
|
1 file changed, 7 insertions(+)
|
|
|
|
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
|
|
index cf1b43bddc85..713d1c6204a2 100644
|
|
--- a/tools/perf/util/pmu.c
|
|
+++ b/tools/perf/util/pmu.c
|
|
@@ -1379,6 +1379,7 @@ struct sevent {
|
|
char *pmu;
|
|
char *metric_expr;
|
|
char *metric_name;
|
|
+ int is_cpu;
|
|
};
|
|
|
|
static int cmp_sevent(const void *a, const void *b)
|
|
@@ -1395,6 +1396,11 @@ static int cmp_sevent(const void *a, const void *b)
|
|
if (n)
|
|
return n;
|
|
}
|
|
+
|
|
+ /* Order CPU core events to be first */
|
|
+ if (as->is_cpu != bs->is_cpu)
|
|
+ return bs->is_cpu - as->is_cpu;
|
|
+
|
|
return strcmp(as->name, bs->name);
|
|
}
|
|
|
|
@@ -1486,6 +1492,7 @@ void print_pmu_events(const char *event_glob, bool name_only, bool quiet_flag,
|
|
aliases[j].pmu = pmu->name;
|
|
aliases[j].metric_expr = alias->metric_expr;
|
|
aliases[j].metric_name = alias->metric_name;
|
|
+ aliases[j].is_cpu = is_cpu;
|
|
j++;
|
|
}
|
|
if (pmu->selectable &&
|
|
--
|
|
2.27.0
|
|
|