99 Star 800 Fork 1.4K

MindSpore / models

 / 详情

[Bug]: Mindspore的YOLOX代码训练失败

REJECTED
创建于  
2023-12-11 16:37

问题描述

在启智平台运行官方的YOLOX代码的时候出现了报错,代码仓位置为
https://openi.pcl.ac.cn/starlight_glim/YOLO-X
报错信息为

Traceback (most recent call last):

  File "/home/work/user-job-dir/code/train.py", line 420, in <module>

    run_train(config)

  File "/cache/user-job-dir/code/model_utils/moxing_adapter.py", line 105, in wrapped_func

    run_func(*args, **kwargs)

  File "/home/work/user-job-dir/code/train.py", line 398, in run_train

    sink_size=-1)

  File "/usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/train/model.py", line 1049, in train

    initial_epoch=initial_epoch)

  File "/usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/train/model.py", line 98, in wrapper

    func(self, *args, **kwargs)

  File "/usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/train/model.py", line 623, in _train

    cb_params, sink_size, initial_epoch, valid_infos)

  File "/usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/train/model.py", line 701, in _train_dataset_sink_process

    outputs = train_network(*inputs)

  File "/usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/nn/cell.py", line 578, in __call__

    out = self.compile_and_run(*args)

  File "/usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/nn/cell.py", line 965, in compile_and_run

    self.compile(*inputs)

  File "/usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/nn/cell.py", line 938, in compile

    jit_config_dict=self._jit_config_dict)

  File "/usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/common/api.py", line 1137, in compile

    result = self._graph_executor.compile(obj, args_list, phase, self._use_vm_mode())

TypeError: The primitive[OneHot]'s input arguments[off_dtype, on_value] must be all tensor and those type must be same. But got input argument[off_dtype]:Float32

 But got input argument[on_value]:Float32
Valid type list: {Tensor[Bool], Tensor[Complex128], Tensor[Complex64], Tensor[Float16], Tensor[Float32], Tensor[Float64], Tensor[Float], Tensor[Int16], Tensor[Int32], Tensor[Int64], Tensor[Int8], Tensor[Int], Tensor[UInt16], Tensor[UInt32], Tensor[UInt64], Tensor[UInt8], Tensor[UInt]}.



----------------------------------------------------

- The Traceback of Net Construct Code:

----------------------------------------------------

The function call stack (See file '/cache/user-job-dir/workspace/device0/rank_0/om/analyze_fail.dat' for more details. Get instructions about `analyze_fail.dat` at https://www.mindspore.cn/search?inputValue=analyze_fail.dat):

# 0 In file /usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/train/dataset_helper.py(106)

        return self.network(*outputs)

               ^

# 1 In file /cache/user-job-dir/code/src/yolox.py(480)

        if self.ema:

# 2 In file /cache/user-job-dir/code/src/yolox.py(485)

            loss = F.depend(loss, self.optimizer(grads))

                            ^

# 3 In file /cache/user-job-dir/code/src/yolox.py(466)

        loss = self.network(*inputs)

               ^

# 4 In file /cache/user-job-dir/code/src/yolox.py(344)

        if self.use_summary:

# 5 In file /cache/user-job-dir/code/src/yolox.py(294)

        ret_posk = P.Transpose()(ops.one_hot(min_index, gt_max, 1.0, 0.0), (0, 2, 1))

                                 ^

# 6 In file /usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/ops/function/array_func.py(270)

    return onehot(indices, depth, on_value, off_value)

           ^

环境信息

硬件环境:Ascend
软件环境:Mindspore version:1.8.1, python version 3.7.5
执行模式:pynative

关联用例

train.py

重现步骤

NPU: 1*Ascend 910, CPU: 24, 显存: 32GB, 内存: 256GB
backbone:darknet53
config_path = /cache/user-job-dir/code/yolox_darknet53.yaml; data_dir = /cache/data/visdrone2017; is_distributed = 0

预期结果

跑通

日志/截图

Traceback (most recent call last):

  File "/home/work/user-job-dir/code/train.py", line 420, in <module>

    run_train(config)

  File "/cache/user-job-dir/code/model_utils/moxing_adapter.py", line 105, in wrapped_func

    run_func(*args, **kwargs)

  File "/home/work/user-job-dir/code/train.py", line 398, in run_train

    sink_size=-1)

  File "/usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/train/model.py", line 1049, in train

    initial_epoch=initial_epoch)

  File "/usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/train/model.py", line 98, in wrapper

    func(self, *args, **kwargs)

  File "/usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/train/model.py", line 623, in _train

    cb_params, sink_size, initial_epoch, valid_infos)

  File "/usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/train/model.py", line 701, in _train_dataset_sink_process

    outputs = train_network(*inputs)

  File "/usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/nn/cell.py", line 578, in __call__

    out = self.compile_and_run(*args)

  File "/usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/nn/cell.py", line 965, in compile_and_run

    self.compile(*inputs)

  File "/usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/nn/cell.py", line 938, in compile

    jit_config_dict=self._jit_config_dict)

  File "/usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/common/api.py", line 1137, in compile

    result = self._graph_executor.compile(obj, args_list, phase, self._use_vm_mode())

TypeError: The primitive[OneHot]'s input arguments[off_dtype, on_value] must be all tensor and those type must be same. But got input argument[off_dtype]:Float32

 But got input argument[on_value]:Float32
Valid type list: {Tensor[Bool], Tensor[Complex128], Tensor[Complex64], Tensor[Float16], Tensor[Float32], Tensor[Float64], Tensor[Float], Tensor[Int16], Tensor[Int32], Tensor[Int64], Tensor[Int8], Tensor[Int], Tensor[UInt16], Tensor[UInt32], Tensor[UInt64], Tensor[UInt8], Tensor[UInt]}.



----------------------------------------------------

- The Traceback of Net Construct Code:

----------------------------------------------------

The function call stack (See file '/cache/user-job-dir/workspace/device0/rank_0/om/analyze_fail.dat' for more details. Get instructions about `analyze_fail.dat` at https://www.mindspore.cn/search?inputValue=analyze_fail.dat):

# 0 In file /usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/train/dataset_helper.py(106)

        return self.network(*outputs)

               ^

# 1 In file /cache/user-job-dir/code/src/yolox.py(480)

        if self.ema:

# 2 In file /cache/user-job-dir/code/src/yolox.py(485)

            loss = F.depend(loss, self.optimizer(grads))

                            ^

# 3 In file /cache/user-job-dir/code/src/yolox.py(466)

        loss = self.network(*inputs)

               ^

# 4 In file /cache/user-job-dir/code/src/yolox.py(344)

        if self.use_summary:

# 5 In file /cache/user-job-dir/code/src/yolox.py(294)

        ret_posk = P.Transpose()(ops.one_hot(min_index, gt_max, 1.0, 0.0), (0, 2, 1))

                                 ^

# 6 In file /usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/ops/function/array_func.py(270)

    return onehot(indices, depth, on_value, off_value)

           ^

备注

评论 (3)

glit_white 创建了任务
glit_white 添加了
 
kind/bug
标签
展开全部操作日志

Please assign maintainer to check this issue.
请为此issue分配处理人。
@fangwenyi @chengxiaoli @Shawny

感谢您的反馈,您可以评论//mindspore-assistant更快获取帮助,更多标签可以查看标签列表

  1. 如果您刚刚接触MindSpore,或许您可以在教程找到答案
  2. 如果您是资深Pytorch用户,您或许需要:
    与PyTorch典型区别 / PyTorch与MindSpore API映射表
  3. 如果您遇到动态图问题,可以设置mindspore.set_context(pynative_synchronize=True)查看报错栈协助定位
  4. 模型精度调优问题可参考官网调优指南
  5. 如果您反馈的是框架BUG,请确认您在ISSUE中提供了MindSpore版本、使用的后端类型(CPU、GPU、Ascend)、环境、训练的代码官方链接以及可以复现报错的代码的启动方式等必要的定位信息
  6. 如果您已经定位出问题根因,欢迎提交PR参与MindSpore开源社区,我们会尽快review
glit_white 修改了描述
Shawny 负责人设置为Shawny
Shawny 任务类型任务 修改为Question
Shawny 关联项目设置为MindSpore Issue Assistant
Shawny 移除了
 
kind/bug
标签
Shawny 移除了
 
kind/bug
标签

您好,这里是MindSpore社区,如您对该代码仓有疑问,可以移步对应代码仓提交issue

Shawny 任务状态TODO 修改为DONE
Shawny 任务状态DONE 修改为REJECTED
glit_white 修改了标题

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(3)
8108889 shawny233 1628167362
1
https://gitee.com/mindspore/models.git
git@gitee.com:mindspore/models.git
mindspore
models
models

搜索帮助

344bd9b3 5694891 D2dac590 5694891