【910B】【MS】跑resnet50模型训练,profiler收集性能数据成功,分析失败
Hardware Environment(Ascend
/GPU
/CPU
) / 硬件环境:
Ascend Training Solution 23.0.RC3.B012
CANN 6.3.RC3.B030
Ascend HDK 23.0.RC2.2.B030
Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 2.0.0) :MindSpore 2.1.0.B130
-- Python version (e.g., Python 3.7.5) :Python 3.7.5
-- OS platform and distribution (e.g., Linux Ubuntu 16.04):openeuler22.03
-- GCC/Compiler version (if compiled from source): 10.3.1
Train_MS_Resnet50_Perf_010
1、修改train.py,加上profiler = ms.Profiler(output_path='./profiler_data')、profiler.analyse()
2、启动训练bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH] [CONFIG_PATH]
profiler收集数据正常,分析正常
2023-08-14 19:34:56,479:INFO:epoch: [1/1] loss: 6.905463, epoch time: 246.382 s, per step time: 24638.249 ms
2023-08-14 19:34:57,568:INFO:If run eval and enable_cache Remember to shut down the cache server via "cache_admin --stop"
Mon 14 Aug 2023 19:37:28 [INFO] [MSVP] [51469] msprof_common.py: Start analyzing data in "/home/ywx1249490/profiler/profiler/PROF_000001_20230814193039475_FJFCBNFLBOOKKNRA/host" ...
Mon 14 Aug 2023 19:37:28 [INFO] [MSVP] [51469] msprof_common.py: It may take few minutes, please be patient ...
Mon 14 Aug 2023 19:37:34 [INFO] [MSVP] [51469] msprof_common.py: Analysis data in "/home/ywx1249490/profiler/profiler/PROF_000001_20230814193039475_FJFCBNFLBOOKKNRA/host" finished.
Mon 14 Aug 2023 19:37:34 [INFO] [MSVP] [51469] msprof_common.py: Start analyzing data in "/home/ywx1249490/profiler/profiler/PROF_000001_20230814193039475_FJFCBNFLBOOKKNRA/device_0" ...
Mon 14 Aug 2023 19:37:34 [INFO] [MSVP] [51469] msprof_common.py: It may take few minutes, please be patient ...
Mon 14 Aug 2023 19:37:35 [INFO] [MSVP] [51469] msprof_common.py: Analysis data in "/home/ywx1249490/profiler/profiler/PROF_000001_20230814193039475_FJFCBNFLBOOKKNRA/device_0" finished.
Traceback (most recent call last):
File "train.py", line 238, in
train_net()
File "/home/ywx1249490/models/official/cv/ResNet/scripts/train_parallel0/src/model_utils/moxing_adapter.py", line 104, in wrapped_func
run_func(*args, **kwargs)
File "train.py", line 234, in train_net
profiler.analyse()
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/profiler/profiling.py", line 579, in analyse
self._ascend_analyse()
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/profiler/profiling.py", line 970, in _ascend_analyse
self._ascend_graph_analyse()
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/profiler/profiling.py", line 1194, in _ascend_graph_analyse
op_summary, op_statistic, steptrace = _ascend_graph_msprof_analyse(source_path)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/profiler/profiling.py", line 277, in _ascend_graph_msprof_analyse
df_op_summary, df_op_statistic, df_step_trace = msprof_analyser.parse()
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/profiler/parser/ascend_msprof_generator.py", line 101, in parse
self._read_steptrace()
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/profiler/parser/ascend_msprof_generator.py", line 181, in _read_steptrace
self.steptrace = np.array(steptrace, dtype=steptrace_dt)
ValueError: could not assign tuple of length 15 to structure with 51 fields.
定位人:籍家荣
Please assign maintainer to check this issue.
请为此issue分配处理人。
@fangwenyi @chengxiaoli @Shawny
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
感谢您的反馈,您可以评论//mindspore-assistant更快获取帮助,更多标签可以查看标签列表:
msprof上报的step_trace.csv格式复杂,mindspore需要适配
该问题已经解决:最新的https://gitee.com/mindspore/mindspore/blob/master/mindspore/python/mindspore/profiler/parser/ascend_msprof_generator.py
可以支持解析,不会报错
收集、分析成功
登录 后才可以发表评论