97 lines
4.4 KiB
Diff
97 lines
4.4 KiB
Diff
From 4f573c490bf3b2880b8c93a20d2a9ef0d0b90c90 Mon Sep 17 00:00:00 2001
|
|
From: Shaokun Zhang <zhangshaokun@hisilicon.com>
|
|
Date: Mon, 8 Mar 2021 14:50:37 +0800
|
|
Subject: [PATCH 35/55] docs: perf: Add new description on HiSilicon uncore PMU
|
|
v2
|
|
|
|
mainline inclusion
|
|
from mainline-v5.13-rc1
|
|
commit 9b86b1b41e0f48b5b25918e07aeceb00e13d1ce2
|
|
bugzilla: https://gitee.com/openeuler/kernel/issues/I8AU2M
|
|
|
|
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9b86b1b41e0f48b5b25918e07aeceb00e13d1ce2
|
|
|
|
--------------------------------------------------------------------
|
|
|
|
Some news functions are added on HiSilicon uncore PMUs. Document them
|
|
to provide guidance on how to use them.
|
|
|
|
Cc: Mark Rutland <mark.rutland@arm.com>
|
|
Cc: Will Deacon <will@kernel.org>
|
|
Cc: John Garry <john.garry@huawei.com>
|
|
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
|
|
Reviewed-by: John Garry <john.garry@huawei.com>
|
|
Co-developed-by: Qi Liu <liuqi115@huawei.com>
|
|
Signed-off-by: Qi Liu <liuqi115@huawei.com>
|
|
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
|
|
Link: https://lore.kernel.org/r/1615186237-22263-10-git-send-email-zhangshaokun@hisilicon.com
|
|
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
|
|
---
|
|
Documentation/admin-guide/perf/hisi-pmu.rst | 49 +++++++++++++++++++++
|
|
1 file changed, 49 insertions(+)
|
|
|
|
diff --git a/Documentation/admin-guide/perf/hisi-pmu.rst b/Documentation/admin-guide/perf/hisi-pmu.rst
|
|
index 404a5c3d9d00..3b3120e2dd9e 100644
|
|
--- a/Documentation/admin-guide/perf/hisi-pmu.rst
|
|
+++ b/Documentation/admin-guide/perf/hisi-pmu.rst
|
|
@@ -53,6 +53,55 @@ Example usage of perf::
|
|
$# perf stat -a -e hisi_sccl3_l3c0/rd_hit_cpipe/ sleep 5
|
|
$# perf stat -a -e hisi_sccl3_l3c0/config=0x02/ sleep 5
|
|
|
|
+For HiSilicon uncore PMU v2 whose identifier is 0x30, the topology is the same
|
|
+as PMU v1, but some new functions are added to the hardware.
|
|
+
|
|
+(a) L3C PMU supports filtering by core/thread within the cluster which can be
|
|
+specified as a bitmap.
|
|
+ $# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_core=0x3/ sleep 5
|
|
+This will only count the operations from core/thread 0 and 1 in this cluster.
|
|
+
|
|
+(b) Tracetag allow the user to chose to count only read, write or atomic
|
|
+operations via the tt_req parameeter in perf. The default value counts all
|
|
+operations. tt_req is 3bits, 3'b100 represents read operations, 3'b101
|
|
+represents write operations, 3'b110 represents atomic store operations and
|
|
+3'b111 represents atomic non-store operations, other values are reserved.
|
|
+ $# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_req=0x4/ sleep 5
|
|
+This will only count the read operations in this cluster.
|
|
+
|
|
+(c) Datasrc allows the user to check where the data comes from. It is 5 bits.
|
|
+Some important codes are as follows:
|
|
+5'b00001: comes from L3C in this die;
|
|
+5'b01000: comes from L3C in the cross-die;
|
|
+5'b01001: comes from L3C which is in another socket;
|
|
+5'b01110: comes from the local DDR;
|
|
+5'b01111: comes from the cross-die DDR;
|
|
+5'b10000: comes from cross-socket DDR;
|
|
+etc, it is mainly helpful to find that the data source is nearest from the CPU
|
|
+cores. If datasrc_cfg is used in the multi-chips, the datasrc_skt shall be
|
|
+configured in perf command.
|
|
+ $# perf stat -a -e hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xE/,
|
|
+ hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xF/ sleep 5
|
|
+
|
|
+(d)Some HiSilicon SoCs encapsulate multiple CPU and IO dies. Each CPU die
|
|
+contains several Compute Clusters (CCLs). The I/O dies are called Super I/O
|
|
+clusters (SICL) containing multiple I/O clusters (ICLs). Each CCL/ICL in the
|
|
+SoC has a unique ID. Each ID is 11bits, include a 6-bit SCCL-ID and 5-bit
|
|
+CCL/ICL-ID. For I/O die, the ICL-ID is followed by:
|
|
+5'b00000: I/O_MGMT_ICL;
|
|
+5'b00001: Network_ICL;
|
|
+5'b00011: HAC_ICL;
|
|
+5'b10000: PCIe_ICL;
|
|
+
|
|
+Users could configure IDs to count data come from specific CCL/ICL, by setting
|
|
+srcid_cmd & srcid_msk, and data desitined for specific CCL/ICL by setting
|
|
+tgtid_cmd & tgtid_msk. A set bit in srcid_msk/tgtid_msk means the PMU will not
|
|
+check the bit when matching against the srcid_cmd/tgtid_cmd.
|
|
+
|
|
+If all of these options are disabled, it can works by the default value that
|
|
+doesn't distinguish the filter condition and ID information and will return
|
|
+the total counter values in the PMU counters.
|
|
+
|
|
The current driver does not support sampling. So "perf record" is unsupported.
|
|
Also attach to a task is unsupported as the events are all uncore.
|
|
|
|
--
|
|
2.27.0
|
|
|