Our team tried to implement the DETR model, which uses the Hungarian algorithm for selecting boxes for assignments during the training.
This Hungarian algorithm uses 3 nested while loops, and operates the tensors, which shape during computation dynamically changes. It makes it impossible to use it in GRAPH_MODE and works extremely slow in PYNATIVE_MODE.
Currently, the only solution is to compute losses manually using NumPy and then assigning the manually computed gradients as the sense parameter for the GradientOp operator.
The link to the DETR implementation: !2266:Models: DETR
The current solution can be described as the following:
net = MyNetwork(...)
one_step_cell = CustomTrainOneStepCellWithSense(net)
# ...
# Training cycle
input_data, gt = next(dataset_iterator)
# Perform a forward step to calculate the network outputs and the loss.
pred = net(input_data)
# Manually compute the loss and provide the back-propagated gradients for the loss.
loss, loss_grad = my_loss_function(pred, gt)
# Update the sense parameter of one-step-cell so it starts
# its back-propagation starting from those values
one_step_cell.sense_param.set_data(loss_grad)
# Performing one more forward step, calculating gradients and the weights update.
one_step_cell(input_data)
The additional forward pass takes additional time, so this solution is a bit slower than the original PyTorch model.
The support for the arbitrary functions will allow wider class of models, supported by MindSpore.
The design of a possible solution:
Adding the PythonCell class, where a user can define forward and backward propagation logic, or at least the CustomLossCell operator which accepts the tensor, and returns the loss value and the computed gradients. The internal computation of the CustomLossCell should support arbitrary functions.
Please assign maintainer to check this issue.
请为此issue分配处理人。
@fangwenyi @chengxiaoli
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
Please add labels (comp or sig), also you can visit https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md to find more.
为了让代码尽快被审核,请您为Pull Request打上 组件(comp)或兴趣组(sig) 标签,打上标签的PR可直接推送给责任人进行审核。
更多的标签可以查看https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md
以组件相关代码提交为例,如果你提交的是data组件代码,你可以这样评论:
//comp/data
当然你也可以邀请data SIG组来审核代码,可以这样写:
//sig/data
另外你还可以给这个PR标记类型,例如是bugfix或者是特性需求:
//kind/bug or //kind/feature
恭喜你,你已经学会了使用命令来打标签,接下来就在下面的评论里打上标签吧!
Thank you for your suggestion. This is a good idea for such arbitrary function.
For the current solution that calculate the loss
and scale_sense
manually, we have to do the forward propagation twice for both prediction and gradient.
Once if we could create a special Cell for this, the full TrainOneStepCell
will be able to eliminatie it and remove these redundant computation.
We will take more discussion about it and make the complete design and plan.
For GPU and CPU target, you could have a try with Custom Operator of pyfunc Type:
https://www.mindspore.cn/docs/programming_guide/en/r1.6/custom_operator_custom.html#defining-custom-operator-of-pyfunc-type
俄罗斯项目算子性能问题,成辉统一评审处理
登录 后才可以发表评论