5 Star 27 Fork 10

MindSpore Lab / mindocr

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
Apache-2.0

简介

MindOCR是一个基于MindSpore 框架开发的OCR开源工具箱,集成系列主流文字检测识别的算法、模型,并提供易用的训练和推理工具,可以帮助用户快速开发和应用业界SoTA文本检测、文本识别模型,如DBNet/DBNet++和CRNN/SVTR,满足图像文档理解的需求。

主要特性
  • 模块化设计: MindOCR将OCR任务解耦成多个可配置模块,用户只需修改几行代码,就可以轻松地在定制化的数据和模型上配置训练、评估的全流程;
  • 高性能: MindOCR提供的预训练权重和训练方法可以使其达到OCR任务上具有竞争力的表现;
  • 易用性: MindOCR提供易用工具帮助在真实世界数据中进行文本的检测和识别。

安装教程

MindSpore相关环境准备

MindOCR基于MindSpore AI框架(支持CPU/GPU/NPU)开发,并适配以下框架版本。安装方式请参见下方的安装链接。

  • mindspore >= 2.2.0 [安装]
  • python >= 3.7
  • openmpi 4.0.3 (用于分布式训练与验证) [安装]
  • mindspore lite (用于离线推理) >= 2.2.0 [安装]

包依赖

pip install -r requirements.txt

通过源文件安装(推荐)

git clone https://github.com/mindspore-lab/mindocr.git
cd mindocr
pip install -e .

使用 -e 代表可编辑模式,可以帮助解决潜在的模块导入问题。

通过docker安装

目前提供的docker,环境信息如下

  • 操作系统版本:Euler2.8
  • CANN版本:7.0
  • Python版本:3.9
  • MindSpore 版本:2.2.10
  • MindSpore Lite 版本:2.2.10

使用docker安装,根据以下步骤:

  1. 下载docker

    • 910:
      docker pull swr.cn-central-221.ovaijisuan.com/mindocr/mindocr_dev_910_ms_2_2_10_cann7_0_py39:v1
    • 910*:
      docker pull swr.cn-central-221.ovaijisuan.com/mindocr/mindocr_dev_ms_2_2_10_cann7_0_py39:v1
  2. 新建容器

    docker_name="temp_mindocr"
    # 910
    image_name="swr.cn-central-221.ovaijisuan.com/mindocr/mindocr_dev_910_ms_2_2_10_cann7_0_py39:v1"
    # 910*
    image_name="swr.cn-central-221.ovaijisuan.com/mindocr/mindocr_dev_ms_2_2_10_cann7_0_py39:v1"
    
    docker run --privileged --name ${docker_name} \
        --tmpfs /tmp \
        --tmpfs /run \
        -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
        --device=/dev/davinci1 \
        --device=/dev/davinci2 \
        --device=/dev/davinci3 \
        --device=/dev/davinci4 \
        --device=/dev/davinci5 \
        --device=/dev/davinci6 \
        --device=/dev/davinci7 \
        --device=/dev/davinci_manager \
        --device=/dev/hisi_hdc \
        --device=/dev/devmm_svm \
        -v /etc/localtime:/etc/localtime \
        -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
        -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
        --shm-size 800g \
        --cpus 96 \
        --security-opt seccomp=unconfined \
        --network=bridge -itd ${image_name} bash
  3. 进入容器

    # 设置docker id
    container_id="your docker id"
    docker exec -it --user root $container_id bash
  4. 设置环境变量 进入容器后,设置环境变量:

    source env_setup.sh

通过PyPI安装

pip install mindocr

由于此项目正在积极开发中,从PyPI安装的版本目前已过期,我们将很快更新,敬请期待。

快速开始

1. 文字检测和识别示例

安装完MindOCR后,我们就很方便地进行任意图像的文本检测和识别,如下。

python tools/infer/text/predict_system.py --image_dir {path_to_img or dir_to_imgs} \
                                          --det_algorithm DB++  \
                                          --rec_algorithm CRNN

运行结束后,结果将被默认保存在./inference_results路径,可视化结果如下:

文本检测、识别结果可视化

可以看到图像中的文字块均被检测出来并正确识别。更详细的用法介绍,请参考推理教程

2. 模型训练、评估与推理-快速指南

使用tools/train.py脚本可以进行OCR模型训练,该脚本可支持文本检测和识别模型训练。

python tools/train.py --config {path/to/model_config.yaml}

--config 参数用于指定yaml文件的路径,该文件定义要训练的模型和训练策略,包括数据处理流程、优化器、学习率调度器等。

MindOCR在configs文件夹中提供系列SoTA的OCR模型及其训练策略,用户可以快速将其适配到自己的任务或数据集上,参考例子如下

# train text detection model DBNet++ on icdar15 dataset
python tools/train.py --config configs/det/dbnet/dbpp_r50_icdar15.yaml
# train text recognition model CRNN on icdar15 dataset
python tools/train.py --config configs/rec/crnn/crnn_icdar15.yaml

使用tools/eval.py 脚本可以评估已训练好的模型,如下所示:

python tools/eval.py \
    --config {path/to/model_config.yaml} \
    --opt eval.dataset_root={path/to/your_dataset} eval.ckpt_load_path={path/to/ckpt_file}

使用tools/infer/text/predict_system.py 脚本可进行模型的在线推理,如下所示:

python tools/infer/text/predict_system.py --image_dir {path_to_img or dir_to_imgs} \
                                          --det_algorithm DB++  \
                                          --rec_algorithm CRNN

更多使用方法,请参考使用教程中的模型训练、推理章节。

3. 模型离线推理-快速指南

你可以在MindOCR中对MindOCR原生模型第三方模型(如PaddleOCR、MMOCR等)进行MindSpore Lite推理。请参考以下文档

使用教程

模型列表

文本检测
文本识别
版面分析
关键信息抽取
表格识别
OCR大模型

关于以上模型的具体训练方法和结果,请参见configs下各模型子目录的readme文档。

关于MindSpore LiteACL模型推理的支持列表, 请参见MindOCR原生模型推理支持列表第三方模型推理支持列表(如PaddleOCR、MMOCR等)。

数据集列表

MindOCR提供了数据格式转换工具 ,以支持不同格式的OCR数据集,支持用户自定义的数据集。 当前已在模型训练评估中验证过的公开OCR数据集如下。

通用OCR数据集
版面分析数据集
关键信息抽取数据集
表格识别数据集

我们会在更多的数据集上进行模型训练和验证。该列表将持续更新。

常见问题

关于配置环境、使用mindocr遇到的高频问题,可以参考常见问题

重要信息

更新日志

详细
  • 2023/04/01
  1. 增加新模型
  • 2024/03/20
  1. 增加新模型
    • OCR大模型Vary-toy,支持基于通义千问1.8B LLM的检测和OCR功能
  • 2023/12/25
  1. 增加新模型
  2. 添加更多基准数据集及其结果
  • 2023/12/14
  1. 增加新模型
  2. 添加更多基准数据集及其结果
  3. 昇腾910硬件多规格支持:DBNet ResNet-50、DBNet++ ResNet-50、CRNN VGG7、SVTR-Tiny、FCENet、ABINet
  • 2023/11/28
  1. 增加支持PP-OCRv4模型离线推理
  2. 修复第三方模型离线推理bug
  • 2023/11/17
  1. 增加新模型
  2. 添加更多基准数据集及其结果
  • 2023/07/06
  1. 增加新模型
  • 2023/07/05
  1. 增加新模型
  • 2023/06/29
  1. 新增2个SoTA模型
  • 2023/06/07
  1. 增加新模型
  2. 添加更多基准数据集及其结果
  3. 增加断点续训(resume training)功能,可在训练意外中断时使用。如需使用,请在配置文件中model字段下增加resume参数,允许传入具体路径resume: /path/to/train_resume.ckpt或者通过设置resume: True来加载在ckpt_save_dir下保存的trian_resume.ckpt
  4. 改进检测模块的后处理部分:默认情况下,将检测到的文本多边形重新缩放到原始图像空间,可以通过在eval.dataset.output_columns列表中增加"shape_list"实现。
  5. 重构在线推理以支持更多模型,详情请参见README.md
  • 2023/05/15
  1. 增加新模型
  2. 添加更多基准数据集及其结果
  3. 添加用于保存前k个checkpoint的checkpoint manager并改进日志。
  4. Python推理代码重构。
  5. Bug修复:对大型数据集使用平均损失meter,在AMP训练中对ctcloss禁用pred_cast_fp32,修复存在无效多边形的错误。
  • 2023/05/04
  1. 支持加载自定义的预训练checkpoint, 通过在yaml配置中将model-pretrained设置为checkpoint url或本地路径来使用。
  2. 支持设置执行包括旋转和翻转在内的数据增强操作的概率。
  3. 为模型训练添加EMA功能,可以通过在yaml配置中设置train-ema(默认值:False)和train-ema_decay来启用。
  4. 参数修改:num_columns_to_net -> net_input_column_index: 输入网络的columns数量改为输入网络的columns索引
  5. 参数修改:num_columns_of_labels -> label_column_index: 用索引替换数量,以表示label的位置。
  • 2023/04/21
  1. 添加参数分组以支持训练中的正则化。用法:在yaml config中添加grouping_strategy参数以选择预定义的分组策略,或使用no_weight_decay_params参数选择要从权重衰减中排除的层(例如,bias、norm)。示例可参考configs/rec/crn/crnn_icdar15.yaml
  2. 添加梯度累积,支持大批量训练。用法:在yaml配置中添加gradient_accumulation_steps,全局批量大小=batch_size * devices * gradient_aaccumulation_steps。示例可参考configs/rec/crn/crnn_icdar15.yaml
  3. 添加梯度裁剪,支持训练稳定。通过在yaml配置中将grad_clip设置为True来启用。
  • 2023/03/23
  1. 增加dynamic loss scaler支持, 且与drop overflow update兼容。如需使用, 请在配置文件中增加loss_scale字段并将type参数设为dynamic,参考例子请见configs/rec/crnn/crnn_icdar15.yaml
  • 2023/03/20
  1. 参数名修改:output_keys -> output_columnsnum_keys_to_net -> num_columns_to_net
  2. 更新数据流程。
  • 2023/03/13
  1. 增加系统测试和CI工作流;
  2. 增加modelarts平台适配器,使得支持在OpenI平台上训练,在OpenI平台上训练需要以下步骤:
  i)   在OpenI云平台上创建一个训练任务;
  ii)  在网页上关联数据集,如ic15_mindocr;
  iii) 增加 `config` 参数,在网页的UI界面配置yaml文件路径,如'/home/work/user-job-dir/V0001/configs/rec/test.yaml';
  iv)  在网页的UI界面增加运行参数`enable_modelarts`并将其设置为True;
  v)   填写其他项并启动训练任务。

如何贡献

我们欢迎包括问题单和PR在内的所有贡献,来让MindOCR变得更好。

请参考CONTRIBUTING.md作为贡献指南,请按照Model Template and Guideline的指引贡献一个适配所有接口的模型,多谢合作。

许可

本项目遵从Apache License 2.0开源许可。

引用

如果本项目对您的研究有帮助,请考虑引用:

@misc{MindSpore OCR 2023,
    title={{MindSpore OCR }:MindSpore OCR Toolbox},
    author={MindSpore Team},
    howpublished = {\url{https://github.com/mindspore-lab/mindocr/}},
    year={2023}
}
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

简介

MindOCR is an open-source toolbox for OCR development and application based on MindSpore. It helps users to train and apply the best text detection and recognition models, such as DBNet/DBNet++ and CR 展开 收起
Apache-2.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
1
https://gitee.com/mindspore-lab/mindocr.git
git@gitee.com:mindspore-lab/mindocr.git
mindspore-lab
mindocr
mindocr
main

搜索帮助