在使用mindspore官网提供的nn算子时出现报错,请求帮忙分析

Describe the current behavior / 问题描述 (Mandatory / 必填)：

在使用nn.conv3D时会返回报错：
[ERROR] GE_ADPT(2061,ffff951bd020,python):2024-05-20-12:04:21.733.064 [mindspore/ccsrc/transform/graph_ir/graph_runner.cc:441] CompileGraph] Call GE CompileGraph Failed, ret is: 1343225857
Traceback (most recent call last):
File "/home/docker/code/SolarWind/train_gaussian.py", line 168, in
model.train(
File "/home/docker/miniconda3/envs/ms-2.2.11/lib/python3.9/site-packages/mindspore/train/model.py", line 1068, in train
self._train(epoch,
File "/home/docker/miniconda3/envs/ms-2.2.11/lib/python3.9/site-packages/mindspore/train/model.py", line 114, in wrapper
func(self, *args, **kwargs)
File "/home/docker/miniconda3/envs/ms-2.2.11/lib/python3.9/site-packages/mindspore/train/model.py", line 617, in _train
self._train_process(epoch, train_dataset, list_callback, cb_params, initial_epoch, valid_infos)
File "/home/docker/miniconda3/envs/ms-2.2.11/lib/python3.9/site-packages/mindspore/train/model.py", line 919, in _train_process
outputs = self._train_network(*next_element)
File "/home/docker/miniconda3/envs/ms-2.2.11/lib/python3.9/site-packages/mindspore/nn/cell.py", line 680, in call
out = self.compile_and_run(*args, **kwargs)
File "/home/docker/miniconda3/envs/ms-2.2.11/lib/python3.9/site-packages/mindspore/nn/cell.py", line 1020, in compile_and_run
self.compile(*args, **kwargs)
File "/home/docker/miniconda3/envs/ms-2.2.11/lib/python3.9/site-packages/mindspore/nn/cell.py", line 997, in compile
_cell_graph_executor.compile(self, phase=self.phase,
File "/home/docker/miniconda3/envs/ms-2.2.11/lib/python3.9/site-packages/mindspore/common/api.py", line 1547, in compile
result = self._graph_executor.compile(obj, args, kwargs, phase, self._use_vm_mode())
RuntimeError: Compile graph kernel_graph_4 failed.

Ascend Error Message:

E60108: In op[MatMulV2], [The supported format_out list is ['ND', 'NC1HWC0', 'FRACTAL_NZ'], while the current format_out is NCDHW.]
TraceBack (most recent call last):
Failed to compile Op [recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/MatMul-op256,[recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/Cast-op3319,recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/Cast-op3319,recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/BiasAdd-op259,recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/MatMul-op256]]. (oppath: [Compile /usr/local/Ascend/ascend-toolkit/7.0.1/opp/built-in/op_impl/ai_core/tbe/impl/mat_mul.py failed with errormsg/stack: File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/utils/errormgr/error_manager_util.py", line 69, in raise_runtime_error_cube
raise RuntimeError(args_dict, *msgs)
RuntimeError: ({'errCode': 'E60108', 'op_name': 'MatMulV2', 'reason': "The supported format_out list is ['ND', 'NC1HWC0', 'FRACTAL_NZ'], while the current format_out is NCDHW."}, "In op[MatMulV2], [The supported format_out list is ['ND', 'NC1HWC0', 'FRACTAL_NZ'], while the current format_out is NCDHW.]")
], optype: [MatMulV2])
[SubGraphOpt][Compile][ProcFailedCompTask] Thread[281442461184288] recompile single op[recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/MatMul-op256] failed[FUNC:ProcessAllFailedCompileTasks][FILE:tbe_op_store_adapter.cc][LINE:954]
[SubGraphOpt][Compile][ParalCompOp] Thread[281442461184288] process fail task failed[FUNC:ParallelCompileOp][FILE:tbe_op_store_adapter.cc][LINE:1001]
[SubGraphOpt][Compile][CompOpOnly] CompileOp failed.[FUNC:CompileOpOnly][FILE:op_compiler.cc][LINE:1127]
[GraphOpt][FusedGraph][RunCompile] Failed to compile graph with compiler Normal mode Op Compiler[FUNC:SubGraphCompile][FILE:fe_graph_optimizer.cc][LINE:1292]
Call OptimizeFusedGraph failed, ret:-1, engine_name:AIcoreEngine, graph_name:partition3_rank60_new_sub_graph521[FUNC:OptimizeSubGraph][FILE:graph_optimize.cc][LINE:131]
Failed to compile Op [recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/MatMul-op286,[recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/Cast-op3317,recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/Cast-op3317,recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/BiasAdd-op289,recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/MatMul-op286]]. (oppath: [Compile /usr/local/Ascend/ascend-toolkit/7.0.1/opp/built-in/op_impl/ai_core/tbe/impl/mat_mul.py failed with errormsg/stack: File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/utils/errormgr/error_manager_util.py", line 69, in raise_runtime_error_cube
raise RuntimeError(args_dict, *msgs)
根据上述信息，尝试把网络分层分块测试，最终定位到使用nn模块提供的conv3d会发生以上报错，烦请帮忙分析一下报错现象。

Environment / 环境信息 (Mandatory / 必填)

Hardware Environment(Ascend/GPU/CPU) / 硬件环境:

Atlas800 9000T a2

Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 1.7.0.Bxxx) :2.2.11
-- Python version (e.g., Python 3.7.5) :python 3.9.18
-- OS platform and distribution (e.g., Linux Ubuntu 16.04):Ubuntu20.04LTS
Excute Mode / 执行模式 (Mandatory / 必填)(PyNative/Graph):

GRAPH_MODE

Related testcase / 关联用例 (Mandatory / 必填)"

class PatchEmbed(nn.Cell):
  def __init__(
    self,
    in_channels: int,
    patch_size: tuple[int, int, int],
    patch_norm: bool,
    embed_dim: int,
    pre_patch_embed_paddings: tuple[int, int, int],
    patches_resolution: tuple[int, int, int],
  ) -> None:
    super(PatchEmbed, self).__init__()

    self.embed_dim = embed_dim
    self.num_patches = math.prod(patches_resolution)

    self.dhw_paddings: list[tuple[int, int]] = []
    for padding in pre_patch_embed_paddings:
      ahead = math.floor(padding / 2)
      behind = padding - ahead
      self.dhw_paddings.append((ahead, behind))
    self.pre_patch_embed_pad = P.Pad(
      paddings=(((0, 0), (0, 0)) + tuple(self.dhw_paddings))
    )
    self.proj = nn.Conv3d(
      in_channels=in_channels,
      out_channels=embed_dim,
      kernel_size=patch_size,
      stride=patch_size,  # type: ignore
      pad_mode="same",
    )

    self.norm = nn.LayerNorm((embed_dim,)) if patch_norm else nn.Identity()

    self.reshape = P.Reshape()
    self.transpose = P.Transpose()

  def construct(self, x):  # type: ignore
    """construct function.

    Args:
        x (Tensor): shape = (batch_size, in_channels, *input_size)

    Returns:
        Tensor: (batch_size, num_patches, embed_dim)
    """
    # last x shape = (batch_size, in_channels, *input_size)
    embed_dim = self.embed_dim
    num_patches = self.num_patches
    x = self.pre_patch_embed_pad(x)
    # last x shape = (batch_size, in_channels, *pre_patch_embed_resolution)
    x = self.proj(x)
    # last x shape = (batch_size, embed_dim, *patches_resolution)
    batch_size = x.shape[0]  # type: ignore
    x = self.reshape(x, (batch_size, embed_dim, num_patches))
    # last x shape = (batch_size, embed_dim, num_patches)
    x = self.transpose(x, (0, 2, 1))
    # last x shape = (batch_size, num_patches, embed_dim)
    x = self.norm(x)
    # last x shape = (batch_size, num_patches, embed_dim)
    return x

Please assign maintainer to check this issue.
请为此issue分配处理人。
@fangwenyi @chengxiaoli @Shawny

感谢您的提问，您可以评论//mindspore-assistant更快获取帮助：

如果您刚刚接触MindSpore，或许您可以在教程找到答案
如果您是资深Pytorch用户，您或许需要：

如果您遇到动态图问题，可以设置set_context(pynative_synchronize=True)查看报错栈协助定位
模型精度调优问题可参考官网调优指南
如果您反馈的是框架BUG，请确认您在ISSUE中提供了MindSpore版本、使用的后端类型（CPU、GPU、Ascend）、环境、训练的代码官方链接以及可以复现报错的代码的启动方式等必要的定位信息
如果您已经定位出问题根因，欢迎提交PR参与MindSpore开源社区，我们会尽快review

您好，由于问题单没有回复，我们后续会关闭，如您仍有疑问，可以反馈下具体信息，并将ISSUE状态修改为WIP，我们这边会进一步跟踪，谢谢

GVP MindSpore / mindspore

内容风险标识

Describe the current behavior / 问题描述 (Mandatory / 必填)：

Environment / 环境信息 (Mandatory / 必填)

Related testcase / 关联用例 (Mandatory / 必填)"

评论 (3)

GVPMindSpore / mindspore

内容风险标识