1 Star 5 Fork 0

雨落辟湖 / IE-Survey

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
关系抽取-学术界.md 26.09 KB
一键复制 编辑 原始数据 按行查看 历史
TaoGe 提交于 2020-10-05 09:55 . Update 关系抽取-学术界.md

关系抽取调研——学术界(更新中。。。

目录

1.1. 任务定义

自动识别句子中实体之间具有的某种语义关系。根据参与实体的多少可以分为二元关系抽取(两个实体)和多元关系抽取(三个及以上实体)。

通过关注两个实体间的语义关系,可以得到(arg1, relation, arg2)三元组,其中arg1和arg2表示两个实体,relation表示实体间的语义关系。

根据处理数据源的不同,关系抽取可以分为以下三种:

  • 面向结构化文本的关系抽取:包括表格文档、XML文档、数据库数据等
  • 面向非结构化文本的关系抽取:纯文本
  • 面向半结构化文本的关系抽取:介于结构化和非结构化之间

根据抽取文本的范围不同,关系抽取可以分为以下两种:

  • 句子级关系抽取:从一个句子中判别两个实体间是何种语义关系
  • 语料(篇章)级关系抽取:不限定两个目标实体所出现的上下文

根据所抽取领域的划分,关系抽取又可以分为以下两种:

  • 限定域关系抽取:在一个或者多个限定的领域内对实体间的语义关系进行抽取,限定关系的类别,可看成是一个文本分类任务
  • 开放域关系抽取:不限定关系的类别

限定域关系抽取方法:

  • 基于模板的关系抽取方法:通过人工编辑或者学习得到的模板对文本中的实体关系进行抽取和判别,受限于模板的质量和覆盖度,可扩张性不强
  • 基于机器学习的关系抽取方法:将关系抽取看成是一个分类问题

1.2. 常见数据集

  • ACE 2005

    数据集简介:ACE2005语料库是语言数据联盟(LDC)发布的由实体,关系和事件注释组成的各种类型的数据,包括英语,阿拉伯语和中文培训数据,目标是开发自动内容提取技术,支持以文本形式自动处理人类语言。ACE语料解决了五个子任务的识别:entities、values、temporal expressions、relations and events。这些任务要求系统处理文档中的语言数据,然后为每个文档输出有关其中提到或讨论的实体,值,时间表达式,关系和事件的信息。

    获取方式:数据集收费,需在LDC联盟的官网上注册再购买,LDC账号注册地址 ACE 2005 下载地址

  • TACRED

    数据集简介:TACRED(TAC Relation Extraction Dataset)是一个拥有106264条实例的大规模关系抽取数据集,这些数据来自于每年的TAC KBP(TAC Knowledge Base Population)比赛使用的语料库中的新闻专线和网络文本。包含了41关系类型,此外若句子无定义关系,被标注成no_relation类型。数据集的详细介绍可以访问TACRED文档

    获取方式:数据集收费,需在LDC联盟官网注册会员再购买 LDC账号注册地址 TACRED 下载地址

  • SemEval2010_task8

    数据集简介:对于给定了的句子和两个做了标注的名词,从给定的关系清单中选出最合适的关系。数据集一共9种关系类别数,此外包含一类Other关系,含有6674实例数量。

    获取方式: 原始数据

  • FewRel

    数据集简介:FewRel是目前最大规模的精标注关系抽取数据集,由孙茂松教授领导的清华大学自然语言处理实验室发布。一共100种关系类别数,含有70000实例数量。

    获取方式:FewRel 网站地址 论文地址

  • NYT10

    NYT-10数据集文本来源于纽约时报,命名实体是通过 Stanford NER 工具并结合 Freebase 知识库进行标注的。实体对之间的关系是链接Freebase知识库中的关系,结合远监督方法所得到。该数据集共含有53种关系类型,包括特殊关系类型NA,即头尾实体无关系。

    获取方式:原始数据

获取更多关系抽取数据集,可访问此处Annotated-Semantic-Relationships-Datasets

1.3. 评测标准

二分类

Accuracy = (预测正确的样本数)/(总样本数)=(TP+TN)/(TP+TN+FP+FN)

Precision = (预测为正例且正确预测的样本数)/(所有预测为正例的样本数) = TP/(TP+FP)

Recall = (预测为正例且正确预测的样本数)/(所有真实情况为正例的样本数) = TP/(TP+FN)

F1 = 2 * (Precision * Recall) / (Precision + Recall )

多分类

Macro Average

多类别(N类) F1/P/R的计算,即计算N个类别的F1/P/R,每次计算以当前类别为正例,其他所有类别为负例,最终将各类别结果求和并除以类别数取平均。

Micro Average

统计当前类别的TP、TN、FP、FN数量,再将该四类样本数各自求和作为新的TP、TN、FP、FN,计算F1/P/R公式同二分类。

P@N(最高置信度预测精度):

通常在远监督关系抽取中使用到,由于知识库所含关系实例的不完善,会出现高置信度包含关系实例的实体对被叛为负例,从而低估了系统正确率。此时可以采用人工评价,将预测结果中知识库已包含的三元组移除,然后人工判断抽取关系实例是否正确,按照top N的准确率对抽取效果进行评价。

1.4. SOTA

Relation Extraction on TACRED

模型 average F1 论文题目 年份 论文链接 code
BERTEM+MTB 71.5 Matching the Blanks: Distributional Similarity for Relation Learning 2019 https://arxiv.org/pdf/1906.03158v1.pdf https://github.com/plkmo/BERT-Relation-Extraction
KnowBert-W+W 71.5 Knowledge Enhanced Contextual Word Representations 2019 https://arxiv.org/pdf/1909.04164v2.pdf
DG-SpanBERT 71.5 Efficient long-distance relation extraction with DG-SpanBERT 2020 https://arxiv.org/pdf/2004.03636v1.pdf
SpanBERT 70.8 SpanBERT: Improving Pre-training by Representing and Predicting Spans 2019 https://arxiv.org/pdf/1907.10529v3.pdf https://github.com/facebookresearch/SpanBERT
R-BERT 69.4 Enriching Pre-trained Language Model with Entity Information for Relation Classification 2020 https://arxiv.org/pdf/1905.08284v1.pdf https://github.com/wang-h/bert-relation-classification
C-GCN + PA-LSTM 68.2 Graph Convolution over Pruned Dependency Trees Improves Relation Extraction 2018 https://arxiv.org/pdf/1809.10185v1.pdf https://github.com/qipeng/gcn-over-pruned-trees

Relation Extraction on SemEval-2010 Task 8

模型 average F1 论文题目 年份 论文链接 code
Skeleton-Aware BERT 90.36 Enhancing Relation Extraction Using Syntactic Indicators and Sentential Contexts 2019 https://arxiv.org/pdf/1912.01858v1.pdf https://github.com/wang-h/bert-relation-classification
EPGNN 90.2 Improving Relation Classification by Entity Pair Graph 2019 http://proceedings.mlr.press/v101/zhao19a/zhao19a.pdf
BERTEM+MTB 89.5 Matching the Blanks: Distributional Similarity for Relation Learning 2019 https://arxiv.org/pdf/1906.03158v1.pdf https://github.com/plkmo/BERT-Relation-Extraction
R-BERT 89.25 Enriching Pre-trained Language Model with Entity Information for Relation Classification 2020 https://arxiv.org/pdf/1905.08284v1.pdf https://github.com/wang-h/bert-relation-classification
KnowBert-W+W 89.1 Knowledge Enhanced Contextual Word Representations 2019 https://arxiv.org/pdf/1909.04164v2.pdf
Entity-Aware BERT 89 Extracting Multiple-Relations in One-Pass with Pre-Trained Transformers 2019 https://arxiv.org/pdf/1902.01030v2.pdf https://github.com/helloeve/mre-in-one-pass

Relation Extraction on ACE 2005

模型 RELATION F1 ENTITY F1 SENTENCE ENCODER 论文题目 年份 论文链接 code
MRC4ERE++ 62.1 85.5 BERT base Asking Effective and Diverse Questions: A Machine Reading Comprehension based Framework for Joint Entity-Relation Extraction 2020 https://www.ijcai.org/Proceedings/2020/0546.pdf https://github.com/TanyaZhao/MRC4ERE
Multi-turn QA 60.2 84.8 BERT base Entity-Relation Extraction as Multi-Turn Question Answering 2019 https://arxiv.org/pdf/1905.05529v4.pdf
MRT 59.6 83.6 biLSTM Extracting Entities and Relations with Joint Minimum Risk Training 2018 https://www.aclweb.org/anthology/D18-1249
GCN 59.1 84.2 biLSTM Joint Type Inference on Entities and Relations via Graph Convolutional Networks 2019 https://www.aclweb.org/anthology/P19-1131
Global 57.5 83.6 biLSTM End-to-End Neural Relation Extraction with Global Optimization 2017 https://www.aclweb.org/anthology/D17-1182
SPTree 55.6 83.4 biLSTM End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures 2016 https://arxiv.org/pdf/1601.00770v3.pdf https://github.com/tticoin/LSTM-ER

Relation Extraction on ACE 2004

模型 RELATION F1 ENTITY F1 论文题目 年份 论文链接 code
DYGIE 59.7 87.4 A General Framework for Information Extraction using Dynamic Span Graphs 2019 https://arxiv.org/pdf/1904.03296v1.pdf https://github.com/luanyi/DyGIE
Multi-turn QA 49.4 83.6 Entity-Relation Extraction as Multi-Turn Question Answering 2019 https://arxiv.org/pdf/1905.05529v4.pdf
SPTree 48.4 81.8 End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures 2016 https://arxiv.org/pdf/1601.00770v3.pdf https://github.com/tticoin/LSTM-ER
multi-head + AT 47.45 81.64 Adversarial training for multi-context joint entity and relation extraction 2018 https://arxiv.org/pdf/1808.06876v3.pdf https://github.com/bekou/multihead_joint_entity_relation_extraction
multi-head 47.14 81.16 Joint entity recognition and relation extraction as a multi-head selection problem 2018 https://arxiv.org/pdf/1804.07847v3.pdf https://github.com/bekou/multihead_joint_entity_relation_extraction
Attention 45.7 79.6 Going out on a limb: Joint Extraction of Entity Mentions and Relations without Dependency Trees 2017 https://www.aclweb.org/anthology/P17-1085

Relation Extraction on NYT

模型 average F1 论文题目 年份 论文链接 code
REDN 89.8 Downstream Model Design of Pre-trained Language Model for Relation Extraction Task 2020 https://arxiv.org/pdf/2004.03786v1.pdf https://github.com/slczgwh/REDN
CASREL 89.6 A Novel Cascade Binary Tagging Framework 2019 https://arxiv.org/pdf/1909.03227v4.pdf https://github.com/weizhepei/CasRel
HBT 89.5 A Novel Cascade Binary Tagging Framework for Relational Triple Extraction 2019 https://arxiv.org/pdf/1909.03227v4.pdf https://github.com/weizhepei/CasRel
WDec 84.4 Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction 2019 https://arxiv.org/pdf/1911.09886v1.pdf https://github.com/nusnlp/PtrNetDecoding4JERE
ETL-Span 78.0 Joint Extraction of Entities and Relations Based on a Novel Decomposition Strategy 2019 https://arxiv.org/pdf/1909.04273v3.pdf https://github.com/yubowen-ph/JointER
CopyRE' OneDecoder 72.2 CopyMTL: Copy Mechanism for Joint Extraction of Entities and Relations with Multi-Task Learning 2019 https://arxiv.org/pdf/1911.10438v1.pdf https://github.com/WindChimeRan/CopyMTL

Relation Extraction on CoNLL04

模型 RELATION F1 ENTITY F1 论文题目 年份 论文链接 code
SpERT 71.47 88.94 Span-based Joint Entity and Relation Extraction with Transformer Pre-training 2019 https://arxiv.org/pdf/1909.07755v3.pdf https://github.com/markus-eberts/spert
Multi-turn QA 68.9 87.8 Entity-Relation Extraction as Multi-Turn Question Answering 2019 https://arxiv.org/pdf/1905.05529v4.pdf
Global 67.8 85.6 End-to-End Neural Relation Extraction with Global Optimization 2017 https://www.aclweb.org/anthology/D17-1182
Biaffine attention 64.40 86.20 End-to-end neural relation extraction using deep biaffine attention 2018 https://arxiv.org/pdf/1812.11275v1.pdf https://github.com/datquocnguyen/jointRE
Relation-Metric with AT 62.29 84.15 Neural Metric Learning for Fast End-to-End Relation Extraction 2019 https://arxiv.org/pdf/1905.07458v4.pdf
multi-head 62.04 83.9 Joint entity recognition and relation extraction as a multi-head selection problem 2018 https://arxiv.org/pdf/1804.07847v3.pdf https://github.com/bekou/multihead_joint_entity_relation_extraction

Relation Extraction on FewRel

模型 average F1 论文题目 年份 论文链接 code
ERNIE 88.32 ERNIE: Enhanced Language Representation with Informative Entities 2019 https://arxiv.org/pdf/1905.07129v3.pdf https://github.com/thunlp/ERNIE

6. Paper List

6.1. 论文列表

6.1.1. 监督类方法

6.1.1.1. 利用语法信息的方法
论文题目 抽取任务 关键词 论文链接 会议及年份 code
Attention Guided Graph Convolutional Networks for Relation Extraction 关系提取 注意力导向图卷积网络(AGGCN);语义依赖树;软修剪;自动学习子结构; https://www.aclweb.org/anthology/P19-1024.pdf ACL2019
A Richer-but-Smarter Shortest Dependency Path with Attentive Augmentation for Relation Extraction 关系提取 最短依赖路径SDP;注意力模型;深度神经模型;LSTM网络;CNN https://www.aclweb.org/anthology/N19-1298 NAACL 2019 https://github.com/catcd/RbSP
1.1.1.2. 不利用语法信息的方法
论文题目 抽取任务 关键词 论文链接 会议及年份 code
Joint Type Inference on Entities and Relations via Graph Convolutional Networks 抽取三元组的joint任务 实体关系联合推断;图卷积模型(GCN);二元关系分类 https://pdfs.semanticscholar.org/7ce8/ce2768907421fb1a6cbfe13a8a36992721a7.pdf ACL2019
GraphRel: Modeling Text as Relational Graphs for Joint Entity and Relation Extraction 抽取三元组的joint任务 端到端关系抽取;图卷积网络; https://tsujuifu.github.io/pubs/acl19_graph-rel.pdf ACL2019
Exploiting Entity BIO Tag Embeddings and Multi-task Learning for Relation Extraction with Imbalanced Data 关系抽取 BIO字符/词嵌入;多任务体系结构;关系分类 https://arxiv.org/pdf/1906.08931.pdf ACL2019
Entity-Relation Extraction as Multi-turn Question Answering 关系抽取 多回合QA;上下文识别答案范围任务 https://arxiv.org/pdf/1905.05529.pdf ACL2019
Graph Neural Networks with Generated Parameters for Relation 关系抽取 图神经网络(GNN);多跳关系推理 https://arxiv.org/pdf/1902.00756.pdf ACL2019
Kernelized Hashcode Representations for Biomedical Relation Extraction 关系分类 核化的局部敏感哈希(KLSH);降低计算成本 https://arxiv.org/pdf/1711.04044.pdf ACL2019
Connecting the Dots: Document-level Neural Relation Extraction with Edge-oriented Graphs 关系抽取 图神经网络模型;文档级关系提取 https://arxiv.org/pdf/1909.00228v1.pdf EMNLP2019

1.1.2. 远监督方法

论文题目 抽取任务 关键词 论文链接 会议及年份 code
Hybrid Attention-based Prototypical Networks for Noisy Few-Shot Relation Classification 关系分类 远监督;噪声;混合注意力圆形网络 https://gaotianyu1350.github.io/assets/aaai2019_hatt_paper.pdf AAAI2019 https://github.com/thunlp/HATT-Proto
A Hierarchical Framework for Relation Extraction with Reinforcement Learning 关系提取 增强关系类型系和实体交互;分层强化学习(HRL)框架;远监督数据集 https://arxiv.org/pdf/1811.03925.pdf AAAI2019
Cross-relation Cross-bag Attention for Distantly-supervised Relation Extraction 关系提取 远监督抗噪;Cross-relation Cross-bag Selective Attention;多实例学习;句子级别;注意力机制;关注高质量实体对 https://arxiv.org/pdf/1812.10604.pdf AAAI2019
Structured Minimally Supervised Learning for Neural Relation Extraction 关系提取 最小监督;学习的表示形式;结构化学习 https://arxiv.org/pdf/1904.00118.pdf NAACL2019
Combining Distant and Direct Supervision for Neural Relation Extraction 关系提取 降噪;监督学习+远监督模型 https://arxiv.org/pdf/1810.12956.pdf NAACL2019 https://github.com/allenai/comb_dist_direct_relex/
Distant Supervision Relation Extraction with Intra-Bag and Inter-Bag Attentions 关系提取 句子级别的Attention; https://www.aclweb.org/anthology/N19-1288.pdf NAACL2019
GAN Driven Semi-distant Supervision for Relation Extraction 关系提取 半远监督;生成对抗网络(GAN) https://www.aclweb.org/anthology/N19-1307 NAACL 2019
Improving Distantly-Supervised Relation Extraction with Joint Label Embedding 关系提取 多层注意力模型;联合标签嵌入 https://www.aclweb.org/anthology/D19-1395.pdf NAACL 2019
Self-Attention Enhanced CNNs and Collaborative Curriculum Learning for Distantly Supervised Relation Extraction 关系提取 协作式学习;卷积神经网(CNN);卷积运算内部自注意机制 https://www.aclweb.org/anthology/D19-1037.pdf NAACL 2019
1
https://gitee.com/yuluopihu/IE-Survey.git
git@gitee.com:yuluopihu/IE-Survey.git
yuluopihu
IE-Survey
IE-Survey
master

搜索帮助