1 Star 0 Fork 878

wen / PaddleOCR

forked from PaddlePaddle / PaddleOCR 
加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
table_datasets_en.md 2.31 KB
一键复制 编辑 原始数据 按行查看 历史
文幕 提交于 2022-08-16 07:45 . add dataset desc

Table Recognition Datasets

Here are the commonly used table recognition datasets, which are being updated continuously. Welcome to contribute datasets~

Dataset Summary

dataset Image download link PPOCR format annotation download link
PubTabNet https://github.com/ibm-aur-nlp/PubTabNet jsonl format, which can be loaded directly with pubtab_dataset.py
TAL Table Recognition Competition Dataset https://ai.100tal.com/dataset jsonl format, which can be loaded directly with pubtab_dataset.py
WTW Chinese scene table dataset https://github.com/wangwen-whu/WTW-Dataset Conversion is required to load with pubtab_dataset.py

1. PubTabNet

  • Data Introduction:The training set of the PubTabNet dataset contains 500,000 images and the validation set contains 9000 images. Part of the image visualization is shown below.
  • illustrate:When using this dataset, the CDLA-Permissive protocol is required.

2. TAL Table Recognition Competition Dataset

  • Data Introduction:The training set of the TAL table recognition competition dataset contains 16,000 images. The validation set does not give trainable annotations.

3. WTW Chinese scene table dataset

马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
Python
1
https://gitee.com/helloxxproject/PaddleOCR.git
git@gitee.com:helloxxproject/PaddleOCR.git
helloxxproject
PaddleOCR
PaddleOCR
release/2.6

搜索帮助

344bd9b3 5694891 D2dac590 5694891