name | about | labels |
---|---|---|
Bug Report | Use this template for reporting a bug | kind/bug |
qwen_14b网络在910B环境微调时,网络告警日志过多
模型仓地址:https://gitee.com/mindspore/mindformers/blob/dev/research/qwen/run_qwen_7b.yaml
Ascend
/GPU
/CPU
) / 硬件环境:Please delete the backend not involved / 请删除不涉及的后端:
/device ascend/
【CANN版本】:Milan_C17/20240414
【MindSpore版本】:master_eb182b96
【MindFormers版本】:master_ba75de2
PyNative
/Graph
):Please delete the mode not involved / 请删除不涉及的模式:
/mode pynative
/mode graph
用例仓地址:MindFormers_Test/cases/qwen/14b/train/
用例:
test_mf_qwen_14b_train_infer_alpaca_8p_0001
网络训练成功。日志正常
ERROR 2024-04-24 23:55:34 - test_mf_qwen_14b_train_infer_alpaca_8p_0001 - base.py:check_warn_info:996 - 以下是WARNING日志重复次数超过10的日志
WARNING日志重复次数 WARNING日志
80 mindspore/ccsrc/frontend/optimizer/slice_activation_in_recompute.cc:137] InsertSliceAllGatherNode] The output_shape first dim:2 cannot be divisible by the repeated size: 8The slice would not activate to this node: @55999_55990_4182_4165_1_mindspore_train_dataset_helper__DataWrapper_construct_87344:equiv_CNode_16902{[0]: ValueNode<Primitive> PrimFunc_Add,
40 mindspore/ccsrc/frontend/optimizer/slice_activation_in_recompute.cc:137] InsertSliceAllGatherNode] The output_shape first dim:2 cannot be divisible by the repeated size: 8The slice would not activate to this node: @55999_55990_4182_4165_1_mindspore_train_dataset_helper__DataWrapper_construct_87344:equiv_hidden_states{[0]: ValueNode<Primitive> PrimFunc_Add,
40 mindspore/ccsrc/frontend/optimizer/slice_activation_in_recompute.cc:137] InsertSliceAllGatherNode] The output_shape first dim:2 cannot be divisible by the repeated size: 8The slice would not activate to this node: @55999_55990_4182_4165_1_mindspore_train_dataset_helper__DataWrapper_construct_87344:equiv_out{[0]: ValueNode<Primitive> PrimFunc_Add,
ERROR 2024-04-24 23:55:34 - test_mf_qwen_14b_train_infer_alpaca_8p_0001 - base.py:check_warn_info:1000 - Warning logs exceed the threshold.
ERROR 2024-04-24 23:55:34 - test_mf_qwen_14b_train_infer_alpaca_8p_0001 - base.py:check_err_info_in_log:933 - Warning logs exceed the threshold.
ERROR 2024-04-24 23:55:34 - test_mf_qwen_14b_train_infer_alpaca_8p_0001 - train.py:check_log_and_process:481 - Some error log in log files
走给姚逸璠
Please assign maintainer to check this issue.
请为此issue分配处理人。
@zhangjie18
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
感谢您的提问,您可以评论//mindspore-assistant更快获取帮助:
合理的warning,该warning是由于网络打开了slice_activation特性,但是网络里面的每个配置了重计算的cell都不符合要求。warning个数和网络里面配置了重计算的cell个数一致,非重复打印。
登录 后才可以发表评论