在使用nn.conv3D时会返回报错:
[ERROR] GE_ADPT(2061,ffff951bd020,python):2024-05-20-12:04:21.733.064 [mindspore/ccsrc/transform/graph_ir/graph_runner.cc:441] CompileGraph] Call GE CompileGraph Failed, ret is: 1343225857
Traceback (most recent call last):
File "/home/docker/code/SolarWind/train_gaussian.py", line 168, in
model.train(
File "/home/docker/miniconda3/envs/ms-2.2.11/lib/python3.9/site-packages/mindspore/train/model.py", line 1068, in train
self._train(epoch,
File "/home/docker/miniconda3/envs/ms-2.2.11/lib/python3.9/site-packages/mindspore/train/model.py", line 114, in wrapper
func(self, *args, **kwargs)
File "/home/docker/miniconda3/envs/ms-2.2.11/lib/python3.9/site-packages/mindspore/train/model.py", line 617, in _train
self._train_process(epoch, train_dataset, list_callback, cb_params, initial_epoch, valid_infos)
File "/home/docker/miniconda3/envs/ms-2.2.11/lib/python3.9/site-packages/mindspore/train/model.py", line 919, in _train_process
outputs = self._train_network(*next_element)
File "/home/docker/miniconda3/envs/ms-2.2.11/lib/python3.9/site-packages/mindspore/nn/cell.py", line 680, in call
out = self.compile_and_run(*args, **kwargs)
File "/home/docker/miniconda3/envs/ms-2.2.11/lib/python3.9/site-packages/mindspore/nn/cell.py", line 1020, in compile_and_run
self.compile(*args, **kwargs)
File "/home/docker/miniconda3/envs/ms-2.2.11/lib/python3.9/site-packages/mindspore/nn/cell.py", line 997, in compile
_cell_graph_executor.compile(self, phase=self.phase,
File "/home/docker/miniconda3/envs/ms-2.2.11/lib/python3.9/site-packages/mindspore/common/api.py", line 1547, in compile
result = self._graph_executor.compile(obj, args, kwargs, phase, self._use_vm_mode())
RuntimeError: Compile graph kernel_graph_4 failed.
E60108: In op[MatMulV2], [The supported format_out list is ['ND', 'NC1HWC0', 'FRACTAL_NZ'], while the current format_out is NCDHW.]
TraceBack (most recent call last):
Failed to compile Op [recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/MatMul-op256,[recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/Cast-op3319,recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/Cast-op3319,recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/BiasAdd-op259,recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/MatMul-op256]]. (oppath: [Compile /usr/local/Ascend/ascend-toolkit/7.0.1/opp/built-in/op_impl/ai_core/tbe/impl/mat_mul.py failed with errormsg/stack: File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/utils/errormgr/error_manager_util.py", line 69, in raise_runtime_error_cube
raise RuntimeError(args_dict, *msgs)
RuntimeError: ({'errCode': 'E60108', 'op_name': 'MatMulV2', 'reason': "The supported format_out list is ['ND', 'NC1HWC0', 'FRACTAL_NZ'], while the current format_out is NCDHW."}, "In op[MatMulV2], [The supported format_out list is ['ND', 'NC1HWC0', 'FRACTAL_NZ'], while the current format_out is NCDHW.]")
], optype: [MatMulV2])
[SubGraphOpt][Compile][ProcFailedCompTask] Thread[281442461184288] recompile single op[recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/MatMul-op256] failed[FUNC:ProcessAllFailedCompileTasks][FILE:tbe_op_store_adapter.cc][LINE:954]
[SubGraphOpt][Compile][ParalCompOp] Thread[281442461184288] process fail task failed[FUNC:ParallelCompileOp][FILE:tbe_op_store_adapter.cc][LINE:1001]
[SubGraphOpt][Compile][CompOpOnly] CompileOp failed.[FUNC:CompileOpOnly][FILE:op_compiler.cc][LINE:1127]
[GraphOpt][FusedGraph][RunCompile] Failed to compile graph with compiler Normal mode Op Compiler[FUNC:SubGraphCompile][FILE:fe_graph_optimizer.cc][LINE:1292]
Call OptimizeFusedGraph failed, ret:-1, engine_name:AIcoreEngine, graph_name:partition3_rank60_new_sub_graph521[FUNC:OptimizeSubGraph][FILE:graph_optimize.cc][LINE:131]
Failed to compile Op [recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/MatMul-op286,[recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/Cast-op3317,recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/Cast-op3317,recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/BiasAdd-op289,recompute_Default/network-WithLossCell/_backbone-SwinTransformer3D/enc_layers-CellList/0-EncoderStage/enc_blocks-CellList/0-Block/mlp-MLP/fc2-Dense/MatMul-op286]]. (oppath: [Compile /usr/local/Ascend/ascend-toolkit/7.0.1/opp/built-in/op_impl/ai_core/tbe/impl/mat_mul.py failed with errormsg/stack: File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/utils/errormgr/error_manager_util.py", line 69, in raise_runtime_error_cube
raise RuntimeError(args_dict, *msgs)
根据上述信息,尝试把网络分层分块测试,最终定位到使用nn模块提供的conv3d会发生以上报错,烦请帮忙分析一下报错现象。
Ascend
/GPU
/CPU
) / 硬件环境:Atlas800 9000T a2
Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 1.7.0.Bxxx) :2.2.11
-- Python version (e.g., Python 3.7.5) :python 3.9.18
-- OS platform and distribution (e.g., Linux Ubuntu 16.04):Ubuntu20.04LTS
Excute Mode / 执行模式 (Mandatory / 必填)(PyNative
/Graph
):
GRAPH_MODE
class PatchEmbed(nn.Cell):
def __init__(
self,
in_channels: int,
patch_size: tuple[int, int, int],
patch_norm: bool,
embed_dim: int,
pre_patch_embed_paddings: tuple[int, int, int],
patches_resolution: tuple[int, int, int],
) -> None:
super(PatchEmbed, self).__init__()
self.embed_dim = embed_dim
self.num_patches = math.prod(patches_resolution)
self.dhw_paddings: list[tuple[int, int]] = []
for padding in pre_patch_embed_paddings:
ahead = math.floor(padding / 2)
behind = padding - ahead
self.dhw_paddings.append((ahead, behind))
self.pre_patch_embed_pad = P.Pad(
paddings=(((0, 0), (0, 0)) + tuple(self.dhw_paddings))
)
self.proj = nn.Conv3d(
in_channels=in_channels,
out_channels=embed_dim,
kernel_size=patch_size,
stride=patch_size, # type: ignore
pad_mode="same",
)
self.norm = nn.LayerNorm((embed_dim,)) if patch_norm else nn.Identity()
self.reshape = P.Reshape()
self.transpose = P.Transpose()
def construct(self, x): # type: ignore
"""construct function.
Args:
x (Tensor): shape = (batch_size, in_channels, *input_size)
Returns:
Tensor: (batch_size, num_patches, embed_dim)
"""
# last x shape = (batch_size, in_channels, *input_size)
embed_dim = self.embed_dim
num_patches = self.num_patches
x = self.pre_patch_embed_pad(x)
# last x shape = (batch_size, in_channels, *pre_patch_embed_resolution)
x = self.proj(x)
# last x shape = (batch_size, embed_dim, *patches_resolution)
batch_size = x.shape[0] # type: ignore
x = self.reshape(x, (batch_size, embed_dim, num_patches))
# last x shape = (batch_size, embed_dim, num_patches)
x = self.transpose(x, (0, 2, 1))
# last x shape = (batch_size, num_patches, embed_dim)
x = self.norm(x)
# last x shape = (batch_size, num_patches, embed_dim)
return x
Please assign maintainer to check this issue.
请为此issue分配处理人。
@fangwenyi @chengxiaoli @Shawny
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
感谢您的提问,您可以评论//mindspore-assistant更快获取帮助:
您好,由于问题单没有回复,我们后续会关闭,如您仍有疑问,可以反馈下具体信息,并将ISSUE状态修改为WIP,我们这边会进一步跟踪,谢谢
登录 后才可以发表评论