论文翻译：SENT：Sentence-level Distant Relation Extraction via Negative Training

$Ruotian\ Ma^1, Tao\ Gui^{2∗}, Linyang\ Li^1, Qi\ Zhang^{1∗}, Yaqian\ Zhou^1\ 和\ Xuanjing\ Huang^1$
$^1计算机科学学院, 复旦大学, 上海, 中国$
$^2现代语言与语言学研究所, 复旦大学, 上海, 中国$
$\{rtma19,tgui16,linyangli19,qz,yqzhou,xjhuang\}@fudan.edu.cn$

摘要

关系提取的远程监督为包内的每个句子提供统一的包级标签，而准确的句子标签对于需要确切关系类型的下游任务很重要。直接使用包级标签进行句子级训练会引入很多噪音，从而严重降低性能。在这项工作中，我们建议使用负训练（negative training，NT），其中使用关于“实例不属于这些互补标签（complementary labels）”的互补标签来训练模型。由于选择真实标签作为互补标签的概率很低，因此 NT 提供的噪声信息较少。此外，用 NT 训练的模型能够将噪声数据与训练数据分开。基于 NT，我们提出了一个句子级框架 SENT，用于远程监督关系提取。 SENT 不仅过滤噪声数据以构建更干净的数据集，而且还执行重新标记过程将噪声数据转换为有用的训练数据，从而进一步提高模型的性能。实验结果表明，所提出的方法在句子级评估和去噪效果方面比以前的方法有显着的改进。

1 概述

关系提取 (RE) 旨在从非结构化文本中提取实体对之间的关系，是自然语言处理中的一项基本任务。提取的关系事实可以有益于各种下游应用，例如知识图谱补全（Bordes 等，2013；Wang 等，2014）、信息提取（Wu 和 Weld，2010）和问答（Yao 和 Van Durme，2014; Fader 等人，2014）。

关系提取的一个重大挑战是缺乏大规模标记数据。因此，提出了远程监督（Mintz 等人，2009 年）通过数据库和纯文本之间的自动对齐来收集训练数据。这种标注范式导致不可避免的噪声问题，之前使用多实例学习（MIL）的研究减轻了这种问题。在 MIL 中，训练和测试过程在包级别执行，其中包包含提及相同实体对但可能不描述相同关系的噪音句子。使用 MIL 的研究可以大致分为两类：1）利用软权重区分每个句子影响的软去噪方法（Lin 等人，2016；Han 等人，2018c；Li 等人， 2020；Hu 等，2019a；Ye 和 Ling，2019；Yuan 等，2019a,b)； 2) 从包中去除噪音句子的硬去噪方法（Zeng 等人，2015；Qin 等人，2018；Han 等人，2018a；Shang，2019）

然而，这些包级方法无法用明确的句子标签映射包内的每个句子。这个问题限制了 RE 在一些需要句子级关系类型的下游任务中的应用，例如 Yao 和 Van Durme (2014) 以及 Xu 等人(2016) 使用句子级关系提取来识别答案与问题中的实体之间的关系。因此，几项研究（Jia 等人 (2019); Feng 等人(2018)）在句子级（或实例级）远程监督 RE 上做出了努力，实证验证了包级方法在句子级评估的不足。然而，这些方法的实例选择方法依赖于包含大量噪声的包级标签确定的奖励（Feng 等人, 2018）或频繁模式（Jia 等人, 2019）。一方面，一个包可能被分配给多个包级标签，导致句子和标签之间的一对一映射困难。如图 1 所示，对于“Obama was born in the UnitedStates.”这句话，我们无法获得“place_of_birth（出生地）”和“employee_of（雇员）”之间的确切关系。另一方面，包内的句子可能不表达包关系。图1中，“Obama was back to the United States”这句话实际上表达的是“live_in（生活在）”的关系，但并未包含在包包标签中

图1：包级标签中存在两种类型的噪声：1）多标签噪声：每个句子的确切标签（“place_of_birth”或“employee_of”）不清楚； 2）错误的标签噪音：包里面的第三句话实际上表达的是“live_in”，它没有包含在包级标签中。

在这项工作中，我们建议对远程监督 RE 使用负训练 (NT) (Kim 等人, 2019)。与正训练（PT）不同，NT通过选择给定标签的互补标签来训练模型，即“输入的句子不属于这个互补标签”。由于选择真实标签作为互补标签的概率很低，NT 降低了提供噪声信息的风险并防止模型过度拟合噪声数据。此外，用 NT 训练的模型能够将噪声数据与训练数据分离（图 3 中的直方图显示了 NT 期间分离的数据分布）。基于 NT，我们提出了 SENT，一个用于远程监督 RE 的句子级框架。在 SENT 训练期间，噪声实例不仅通过噪声过滤策略进行过滤，而且还通过重新标记方法转化为有用的训练数据。我们进一步设计了一种迭代训练算法，以充分利用这些数据精炼过程，从而显着提高性能。我们的代码在 Github（ https://github.com/rtmaww/SENT ）上公开可用。

总结一下这项工作的贡献：

我们建议对句子级远程监督 RE 使用负训练，这极大地保护了模型免受噪声信息的影响。
我们提出了一个句子级框架 SENT，它包括一个噪声过滤和一个用于重新精炼远程监督数据的重新标记策略。
所提出的方法在 RE 性能和去噪效果方面都比以前的方法有了显着的改进。

2 相关研究

2.1 远程监督关系提取

监督关系提取（RE）一直受到缺乏大规模标记数据的限制。因此，Mintz 等人(2009)引入了远程监督（DS），它使用现有的知识库 (KB) 作为监督源而不是带注释的文本。Riedel等人(2010) 将 DS 假设放宽到表达至少一次假设。因此，为此任务引入了多实例学习（Riedel 等人（2010）；Hoffmann 等人（2011）；Surdeanu 等人（2012）），其中训练和评估过程在包层级中执行，每个包中都存在潜在的噪音句子。大多数后续远程监督 RE 研究采用这种范式，旨在减少每个包中噪音句子的影响。这些研究包括关注有用信息的基于注意力的方法（Lin 等人 (2016);Han 等人 (2018c) ; Li 等人 (2020); Hu 等人 (2019a); Ye 和 Ling ( 2019); Yuan 等人 (2019a); Zhu 等人 (2019); Yuan 等人 (2019b); Wu 等人 (2017))，选择RL 或对抗性训练等策略以从包中去除嘈杂的句子（Zeng 等人（2015）；Shang（2019）；Qin 等人（2018）；Han 等人（2018a））以及与额外信息的结合例如 KG、多语言语料库或其他信息（Ji 等人 (2017); Lei 等人 (2018); Vashishth 等人 (2018); Han 等人 (2018b); Zhang 等人 (2019) )；Qu 等人(2019)；Verga 等人(2016)；Lin 等人(2017)；Wang 等人(2018)；Deng 和Sun (2019)；Beltagy 等人(2019)。其他方法包括去噪的软标签策略 (Liu 等人 (2017))、利用预训练的 LM (Alt 等人 (2019))、基于模式的方法 (Zheng 等人 (2019))、结构化学习方法（Bai 和 Ritter (2019)）等（Luo 等人（2017 年）；Chen 等人（2019 年））。

在这项工作中，我们专注于句子级关系提取。之前的几项研究也在句子级别执行远程监督 RE。 Feng等人 (2018) 提出了一种用于句子选择的强化学习框架，其中奖励由包级标签上的分类分数给出。 Jia等人 (2019) 建立一个初始训练集，并根据选定的模式进一步选择可信实例。所提出的工作与以前的工作之间的区别在于，我们不依赖包级标签进行句子选择。此外，我们利用 NT 从训练数据中动态分离噪声数据，从而可以利用多样化的干净数据。

2.2 用噪声数据学习

噪声数据学习是深度学习中一个被广泛讨论的问题，尤其是在计算机视觉领域。现有方法包括稳健的学习方法，例如利用稳健的损失函数或正则化方法（Lyu 和 Tsang，2020 年；Zhang 和 Sabuncu，2018 年；Hu 等人，2019b；Kim 等人，2019 年），重新加权潜在噪声样本的损失函数（Ren 等人，2018 年；Jiang 等人，2018 年），使用转换矩阵对腐败概率进行建模（Goldberger 和 Ben-Reuven，2016 年；Xia 等人）等等。另一项研究试图从训练数据中识别甚至纠正噪声实例（Malach 和 Shalev-Shwartz，2017；Yu 等，2019；Arazo 等，2019；Li 等，2019）。

在本文中，我们关注远程监督 RE 中的噪音标签问题。我们首先利用强大的负损失 (Kim 等人, 2019) 进行模型训练。然后，我们开发了一种新的迭代训练算法，用于噪声选择和校正。

3 方法

为了在远程监督 RE 中使用包级标签实现句子级关系分类，我们提出了一个框架 SENT，它包含三个主要步骤（如图 2 所示）：（1）使用负训练将噪声数据从训练集分离出来（第 3.1 节）； (2) 过滤噪声数据并重新标记一部分可信实例（第 3.2 节）； (3) 利用基于 (1) 和 (2) 的有效训练算法进一步提高性能（第 3.3 节）。

图2：用于句子级远程监督 RE 的框架 SENT 的概述。包括三个步骤：（1）从训练数据中分离噪声数据的负训练； (2) 噪声过滤和重新标记； (3) 迭代训练，进一步提升性能。

$^2$在这里，我们随机选择多个包级标签之一进行单射关系分类。详见第 4.2 节。

3.1 远程监督数据的负训练

为了对嘈杂的远程监督数据进行稳健的训练，我们建议使用负训练（NT），它基于“输入句子不属于这个互补标签”的概念进行训练。我们发现 NT 不仅提供较少噪声的信息，而且在训练过程中将噪声数据和干净数据分开。

3.1.1 正训练

正训练 (PT) 基于“输入句子属于这个标签”的概念，训练模型预测给定的标签。这里，给定任何带有标签 $y^∗ \in \mathbb{R} = \{1, 2,\dots , C\}$的输入$s$, $y \in \{0, 1\}^C$ 是 $y^∗$ 的 $C$ 维 one-hot 向量。我们将 $p = f(s)$ 表示为由关系分类器 $f(·)$ 给出的句子的概率向量。使用交叉熵损失函数，典型正训练中定义的损失函数为：

$L_{PT}(f,y^*)=-\sum_{k=1}^{C}y_k \log p_k \qquad (1)$

其中 $p_k$ 表示第 $k$ 个标签的概率。对等式 1 进行优化满足 $PL$ 的要求，因为给定标签的概率随着损失的减少而接近 1。

3.1.2 负训练

在负训练 (NT) 中，对于每个带有标签 $y^∗ \in R$ 的输入 $s$，我们通过从 $y^∗$ 之外的标签空间中随机采样来生成互补标签 $\bar{y^∗}$，例如，$\bar{y^∗} \in \mathbb{R}\setminus\{y^∗\}$。使用交叉熵损失函数，我们将负训练中的损失定义为：

$L_{NT}(f,y^*)=-\sum_{k=1}^{C}\bar{y_k}\log (1-p_k) \qquad (2)$

与 PT 不同，等式 2 旨在降低互补标签的概率值，随着损失的减小，$p_k\to 0$。

为了进一步说明 NT 的效果，我们分别使用 PT 和 NT 在构建的具有 30% 噪声的 TACRED 数据集上训练分类器（详细信息见第 4.1 节）。 PT 和 NT 后训练数据的直方图如图$^3$所示。图 3(a),(b) 表明，当用 PT 训练时，干净数据和噪声数据的置信度增加，没有差异，导致模型过度拟合噪声训练数据。相反，当用 NT 训练时，噪声数据的置信度远低于干净数据的置信度。该结果证实，使用 NT 训练的模型在提供噪声较少的信息的情况下，受过拟合噪声数据的影响较小。此外，由于干净数据和噪声数据的置信度值相互分离，我们能够以一定的阈值过滤噪声数据。图 4 显示了数据过滤效果的细节。在 NT 的第一次迭代之后，适度的阈值有助于 97% 的精度噪声过滤和约 50% 的召回率，这进一步验证了 NT 在噪声数据训练上的有效性。

图3：使用 PT 和 SENT 训练时的数据分布。 (a) 在 PT 期间，干净数据和噪声数据的置信度同时增加； (b) 在 NT 期间，噪声数据的置信度远低于干净数据的置信度； (c) 用 SENT 方法训练后，进一步分离干净和嘈杂的数据； (d) SENT 后的 PT 有助于提高干净数据的收敛性。

$^3$在绘制直方图时，我们省略了大量“NA”类数据（80% 的训练数据），以便更清晰地表示正类数据。

3.2 噪声过滤和重新标记

在第 3.1 节中，我们已经说明了 NT 在噪声数据训练方面的有效性，以及识别噪声实例的能力。虽然过滤噪声数据对于训练远程数据很重要，但这些过滤后的数据包含有用的信息，如果正确重新标记，可以提高性能。在本节中，我们描述了基于 NT 提炼远程监督数据所提出的噪声过滤和标签恢复策略。

3.2.1 过滤噪声数据

如前所述，在NT之后根据某个阈值构建过滤策略是很直观的。然而，在远程监督 RE 中，长尾问题不容忽视。在训练过程中，不同类之间的收敛程度是不同的。简单地设置一个统一的阈值可能会损害数据分布，因为长尾关系的实例在很大程度上被过滤掉了。因此，我们利用动态阈值来过滤噪声数据。假设第 $i$ 个实例的类 $c$ 的概率是 $p^i_c \in (0, p^h_c)$，其中 $p^h_c$ 是类 $c$ 中的最大概率值。根据经验，我们假设概率值遵循一个分布，其中噪声数据主要分布在低值区域，而干净数据通常分布在中值或高值区域。因此，$c$ 类的过滤阈值设置为：

$Th_c = Th \cdot p^h_c,p^h_c = \max^{N}_{i=1} \{p^i_c\} \qquad (3)$

其中 $Th$ 是全局阈值。这样，噪声过滤阈值不仅依赖于每个类的收敛程度，而且在训练阶段动态变化，从而使其更适合对长尾数据进行噪声过滤。

3.2.2 重新标记有用数据

噪声过滤后，噪声实例被视为未标记数据，其中也包含用于训练的有用信息。在这里，我们设计了一个简单的策略来重新标记这些未标记的数据。给定一组过滤数据 $D_u = {s_1, \dots , s_m}$，我们使用在本次迭代中训练的分类器来预测概率向量 ${p_1,\dots , p_m}$。然后，我们通过以下方式重新标记这些实例：

$\hat{y_i}=\arg \max_{k} \{p^i_k\},if \max_k \{ p^i_k \} > Th_{relabel} \qquad (4)$

其中 $p^i_k$ 是第 $k$ 类中第 $i$ 个实例的概率，$Th_{relabel}$ 是重新标记阈值。

3.3 迭代训练算法

虽然有效，但简单地执行一条NT、噪声过滤和重新标记的流水线，并不能充分利用每个部分，因此可以通过迭代训练进一步提升模型性能。

如图 2 所示，对于每次迭代，我们首先使用 NT 在噪声数据上训练分类器：对于每个实例，我们随机采样 $K$个互补标签，并使用公式（2）计算这些标签上的损失。在 $M$ 轮 epoch 负训练后，进行噪声过滤和重新标记过程以更新训练数据。接下来，我们对新精炼的数据执行新的训练迭代。在这里，我们在每次迭代中重新初始化分类器，原因有两个：首先，重新初始化确保在每次迭代中，新分类器在更高质量的数据集上训练。其次，重新初始化引入了随机性，从而有助于更强健的数据过滤。最后，我们在观察到验证集上取得最佳结果后停止迭代。然后我们在最后一次迭代中使用最佳模型进行一轮噪声过滤和重新标记，以获得最终的精炼数据。

图 3(c) 显示了 SENT 一定量迭代后的数据分布。如所见，噪声和干净数据相隔很大。大多数嘈杂的数据都被成功过滤掉了，错误的干净数据的数量是可以接受的。然而，我们可以看到用 NT 训练的模型仍然缺乏收敛性（低置信度预测）。因此，我们使用 PT 在迭代精炼的数据上训练分类器以获得更好的收敛性。如图 3(d) 所示，在 PT 训练后，对大部分干净数据的模型预测具有高置信度。

4 实验

这项工作中的实验分为两部分，分别在两个数据集上进行：NYT-10 数据集（Riedel 等，2010）和 TACRED 数据集（Zhang 等，2017）。

第一部分是远程监督RE句子级评价的有效性研究。与包级评估不同，句子级评估直接在数据集中的所有单个实例上计算精度 (Prec.)、召回率 (Rec.) 和 F1 度量。在这一部分，我们采用 NYT 10 数据集进行句子级训练，遵循 Jia 等人(2019)的设置，他发布了一个手动标记的句子级测试集$^4$。此外，他们还发布了评估噪声过滤能力的测试集。采用的数据集的详细信息如表 1 所示。

表1：数据集的统计$^6$。 “Positive”是指未标记为“NA”的正实例。请注意，noise-TACRED 的正实例包括假正噪声，并且由于注释不准确，NYT-10 中的噪声数未知。

我们构建实验的第二部分（第 4.4 节）以更好地理解 SENT 的行为。由于在远程监督设置中没有可用的标记训练数据，我们从标记数据集 TACRED (Zhang 等人, 2017)$^5$ 构建了一个噪声数据集，其中包含 30% 的噪声。我们将这个构建的数据集视为 noisy-TACRED。我们选择这个数据集的原因是训练数据中 80% 的实例是“no_relation”。这个“NA”率类似于包含 70%“NA”关系类型的 NYT 数据，因此对这个数据集的分析更可信。

在构建noisy-TACRED时，噪声实例以30%的噪声比统一选择。然后，通过从具有类频率权重的互补类中采样标签来创建每个噪声标签（以保持数据分布）。请注意，原始数据集由 80% 的“no_relation”数据组成，这意味着 80% 的噪声实例是“false-positive”实例，对应于 NYT-10 中的大量“false-positive”噪声。表 1 还显示了noisy-TACRED 的详细信息。

$^4$https://github.com/PaddlePaddle/Research/tree/master/NLP/ACL2019-ARNOR
$^5$https://github.com/yuhaozhang/tacred-relation
$^6$NYT-10 的统计数据引自（Jia 等人，2019）。

4.1 基线

我们将我们的 SENT 方法与远程监督 RE 中的几个强劲的基线模型进行比较。这些比较的方法可以分为：包级去噪方法、句子级去噪方法、句子级非去噪方法。

PCNN+SelATT（Lin 等人，2016 年）：一种利用注意力机制来减少噪声影响的包级 RE 模型。

PCNN+RA_BAG_ATT（Ye 和 Ling，2019）是 PCNN+ATT_RA+BAG_ATT 的缩写，一种包级模型，包含包内和包间注意以减轻噪音。

CNN+RL$_1$ (Qin 等人, 2018)：一种基于 RL 的包级方法。与 CNN+RL$_2$ 不同的是，它们将过滤后的数据重新分配到反例中。

CNN+RL$_2$ (Feng 等人, 2018)：句子级 RE 模型。它使用强化学习 (RL) 联合训练实例选择器和 CNN 分类器。

ARNOR (Jia 等人, 2019)：一种句子级 RE 模型，它根据所选模式的注意力分数选择可信实例。它是句子级别的最先进方法。

CNN (Zeng 等人, 2014)、PCNN (Zeng 等人, 2015) 和 BiLSTM (Zhang 等人, 2015) 是 RE 中使用的典型架构。

BiLSTM+ATT (Zhang 等人, 2017) 利用基于 BiLSTM 的注意力机制来捕捉有用信息。

BiLSTM+BERT (Devlin 等人, 2019)：基于 BiLSTM，它利用预训练的 BERT 表示作为词嵌入。

4.2 实现细节

由于 SENT 是一个与模型无关的框架，我们使用两种典型的架构来实现分类模型：BiLSTM 和 BiLSTM+BERT。由于 BiLSTM 也是 ARNOR 的基础模型，我们可以更公平地比较这两种方法。在 SENT 训练期间，我们使用 50 维glove向量作为词嵌入。而对于 SENT 之后的 PT，我们在 ARNOR 中随机初始化 50 维词嵌入。在两个训练阶段，我们使用 50 维随机初始化位置和实体类型嵌入。我们使用 adam 优化器以 5e-4 的学习率训练隐藏层大小为 256 的单层 BiLSTM。使用 BiLSTM+BERT 实现时，设置与使用 BiLSTM 相同，除了我们使用 768 维固定 BERT 表示作为词嵌入（我们使用“bert-base-uncased”预训练模型）。我们通过网格搜索调整验证集上的超参数。具体来说，在 NYT 数据集上训练时，我们在每次迭代中训练模型 10 个 epoch，全局数据过滤阈值 $Th = 0.25$，重新标记阈值 $Th_{relabel} = 0.7$，负样本数 $K = 10$。在noisy-TACRED 上进行训练，我们在每次迭代中训练 50 个 epoch，其中 $Th = 0.15$，$Th_{relabel} = 0.85$ 和 $K = 50$。

为了处理多标签问题，我们使用一种简单的方法，为每个句子随机选择一个包级标签。这种随机选择将多标签噪声变成了错误标签噪声，更容易处理。根据 Surdeanu 等人(2012)的说法。 , NYT-10 中有 31% 的错误标签噪声和 7.5% 的多标签噪声，错误选择可能会导致 4% 的额外错误标签噪声，可以通过 NT 相同地过滤掉错误标签实例。

4.3 句子级评价

表2：句子级评估的主要结果。比较基线包括正常 RE 模型（表的第一部分）和远程 RE 模型（表的第二部分）。我们将模型运行了 3 次以获得平均结果。

表 2 显示了 SENT 和其他基线在句子级别评估的结果，其中 SENT 的结果是在 SENT 之后通过 PT 获得的。我们可以观察到：1）包级方法在句子级评估上表现不佳，表明这些包级方法很难使具有准确句子标签的下游任务受益。该结果与Feng等人(2018)的结果一致。 2) 在对嘈杂的远程监督数据进行句子级训练时，所有基线模型都显示出较差的结果，包括卓越的预训练语言模型 BERT。这些结果表明，无论噪声如何，直接使用包级标签进行句子级训练的负面影响。 3) 所提出的 SENT 方法比以前的句子级去噪方法取得了显着的改进。当使用 BiLSTM 实现时，该模型获得的 F1 分数比 ARNOR 高 4.09%。此外，当使用 BiLSTM+BERT 实现时，F1 分数进一步提高了 8.52%。 4）SENT方法在保持可比较或更高召回率时比以前的去噪方法实现了更高的精度，表明噪声过滤和重新标记方法的有效性。

4.3.1 对远程监督数据的噪声过滤效果

为了证明 SENT 在对远程监督数据去噪方面的有效性，我们按照 ARNOR 进行了噪声过滤实验。我们使用由 ARNOR 发布的测试集，它由 200 个随机选择的带有“is_noise”注释的句子组成。我们执行第 3.2.1 节中描述的噪声过滤过程，并计算去噪精度。如表 3 所示，SENT 方法在 F1 分数上比 ARNOR 提高了 12%。在提高精度的同时，SENT 在召回率方面比 ARNOR 提高了 20%。由于 ARNOR 使用一小部分频繁模式初始化训练数据，这些模式可能会限制模型泛化到各种正确的数据。与 ARNOR 不同，SENT 利用负训练自动学习正确的模式，在多样性和泛化方面表现出更好的能力。

表3：在 NYT-10 的噪声注释测试集上评估噪声过滤效果。

4.4 分析 SENT 上的“标记噪声”

在本节中，我们使用自构建的噪声数据集：noise-TACRED（表 1 中的详细信息）分析数据精炼过程的有效性。

4.4.1 在 Noisy-TACRED 上的表现

表4：在干净和嘈杂的 TACRED 上的模型性能。在噪声数据上训练时，基础模型的性能显着下降，而 SENT 获得的结果与在干净数据上训练的模型相当。

表 4 显示了 TACRED 和noise-TACRED 的训练结果。正如所见，基线模型在噪声数据上急剧下降，LSTM 下降了 20.2%。但是，在使用 SENT 进行训练后，BiLSTM 模型可以获得与在干净数据上训练的模型相当的结果。请注意，去噪方法对提高精度分数很有帮助，但召回率仍然低于干净数据。

4.4.2 数据精炼的效果

图4：noise-TACRED 上的数据精炼细节。

我们还评估了noise-TACRED 训练集的噪声过滤和标签恢复能力，如图 4 所示。我们可以观察到：1）SENT 在noise-TACRED 数据上达到了大约 85% 的 F1 分数。该结果与在 NYT 数据集（具有 200 个采样实例）上获得的噪声过滤结果一致，验证了 SENT 在不同数据集上的去噪能力。 2）随着训练迭代的进行，噪声过滤的精度随着召回率的提高而降低。更多的噪声过滤有助于获得更干净的数据集，但它可能会带来更多的假噪声错误。因此，当模型在验证集上达到最佳分数时，我们停止迭代。 3）对于标签恢复，SENT 可以达到约 70% 的准确率，约 25% 的召回率。在这里，阈值设置也是一种权衡，我们更喜欢采用适度的值来更准确地重新标记。

4.4.3 动态过滤的效果

如第 3.2 节所述，我们为长尾数据设计了一个动态过滤阈值。该策略的效果如图5所示。可以看出，长尾关系“per:cause_of_death”的收敛程度远低于头部关系。简单地设置一个统一的阈值会损害数据分布，因为“per:cause_of_death”的实例在很大程度上被过滤掉了。虽然使用动态确定的阈值，但来自头部和长尾关系的数据都被适当过滤。

图5：NT期间头部关系（per:title）和长尾关系（per:cause_of_death）的数据分布。动态设计的阈值有利于过滤。

4.5 消融研究

为了更好地说明 SENT 中每个组件的贡献，我们通过删除以下组件进行消融研究：最终 PT、重新标记、动态阈值、重新初始化、NT。测试结果如表 6 所示。我们可以观察到： 1）去除最终的正训练对性能影响很小。这是因为用 NT 训练的模型已经达到了很高的准确率，最终 PT 的目的只是为了实现置信度更高的预测。 2）删除重新标记过程会损害性能，因为过滤的实例会被简单地丢弃，而不管是否有有用的训练信息。 3）如果没有动态阈值，来自尾部类的干净实例被错误地过滤掉，这严重降低了性能。 4）重新初始化对性能也有很大贡献。在原始噪声数据上训练的模型不可避免地拟合噪声分布，而重新初始化有助于清除过拟合的参数并消除噪声影响，从而有助于更好地训练和过滤噪声。 5）用PT代替NT训练会导致性能急剧下降，尤其是在精度上，这验证了NT在防止模型过度拟合噪声数据方面的有效性。

表6：NYT-10 上的消融研究。

4.6 案例分析

如前所述，SENT 能够优化远程 RE 数据集。事实上，NYT 数据中存在很多用包级方法难以处理的噪声。在表 5 中，我们展示了一些示例。 (1) 前两行是多标签包中的句子。我们为每个句子随机选择一个包级标签，模型能够纠正错误的选择（通过用“place_living”纠正第二个句子，用“NA”纠正第一个句子）。 (2) 以下三行显示了一个带有“place_of_death”标签的包，而这整个包实际上是一个错误标记为正的“NA”包。 (3) SENT 也可以识别“NA”中的阳性样本。如最后三行所示，每个标记为“NA”的句子实际上都在表达一个正的标签。事实上，这种假阴性问题经常出现在 NYT 数据中，其中 70% 的否定实例被标记为“NA”，这仅仅是因为实体对没有在数据库的关系中。我们相信识别这些假阴性样本的能力可以显着提高性能。

5 总结

在本文中，我们提出了 SENT，这是一种基于负训练 (NT) 的新型句子级框架，用于对远程 RE 数据进行句子级训练。 NT 不仅可以防止模型过度拟合噪声数据，还可以将噪声数据与训练数据分开。通过基于 NT 迭代执行噪声过滤和重新标记，SENT 有助于重新细化嘈杂的远距离数据并实现卓越的性能。实验结果验证了 SENT 在句子级关系提取和噪声过滤效果方面较以往方法的改进。

参考文献

Christoph Alt, Marc Hubner, and Leonhard Hennig. 2019. Fine-tuning pre-trained transformer language models to distantly supervised relation extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1388–1398, Florence, Italy. Association for Computational Linguistics.

Eric Arazo, Diego Ortego, Paul Albert, Noel O’Connor, and Kevin McGuinness. 2019. Unsupervised label noise modeling and loss correction. In International Conference on Machine Learning, pages 312–321. PMLR.

Fan Bai and Alan Ritter. 2019. Structured Minimally Supervised Learning for Neural Relation Extraction. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3057–3069, Minneapolis, Minnesota. Association for Computational Linguistics.

Iz Beltagy, Kyle Lo, and Waleed Ammar. 2019.Combining distant and direct supervision for neural relation extraction. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1858–1867, Minneapolis, Minnesota. Association for Computational Linguistics.

Antoine Bordes, Nicolas Usunier, Alberto Garcia Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi relational data. Advances in neural information processing systems, 26:2787–2795.

Junfan Chen, Richong Zhang, Yongyi Mao, Hongyu Guo, and Jie Xu. 2019. Uncover the ground-truth relations in distant supervision: A neural expectation maximization framework. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 326–336, Hong Kong, China. Association for Computational Linguistics.

Xiang Deng and Huan Sun. 2019. Leveraging 2-hop distant supervision from table entity pairs for relation extraction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 410–420, Hong Kong, China. Association for Computational Linguistics.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.

Anthony Fader, Luke Zettlemoyer, and Oren Etzioni. 2014. Open question answering over curated and extracted knowledge bases. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1156–1165.

Jun Feng, Minlie Huang, Li Zhao, Yang Yang, and Xiaoyan Zhu. 2018. Reinforcement learning for relation classification from noisy data. In Proceedings of the aaai conference on artificial intelligence.

Jacob Goldberger and Ehud Ben-Reuven. 2016. Training deep neural-networks using a noise adaptation layer.

Xu Han, Zhiyuan Liu, and Maosong Sun. 2018a. Denoising distant supervision for relation extraction via instance-level adversarial training. arXiv preprint arXiv:1805.10959.

Xu Han, Zhiyuan Liu, and Maosong Sun. 2018b. Neural knowledge acquisition via mutual attention between knowledge graph and text. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.

Xu Han, Pengfei Yu, Zhiyuan Liu, Maosong Sun, and Peng Li. 2018c. Hierarchical relation extraction with coarse-to-fine grained attention. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2236–2245.

Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. 2011. Knowledge-based weak supervision for information extraction of overlapping relations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 541–550, Portland, Oregon, USA. Association for Computational Linguistics.

Linmei Hu, Luhao Zhang, Chuan Shi, Liqiang Nie, Weili Guan, and Cheng Yang. 2019a. Improving distantly-supervised relation extraction with joint label embedding. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3821–3829, Hong Kong, China. Association for Computational Linguistics.

Wei Hu, Zhiyuan Li, and Dingli Yu. 2019b. Simple and effective regularization methods for training on noisily labeled data with generalization guarantee. In International Conference on Learning Representations.

Guoliang Ji, Kang Liu, Shizhu He, Jun Zhao, et al. 2017. Distant supervision for relation extraction with sentence-level attention and entity descriptions. In AAAI, volume 3060.

Wei Jia, Dai Dai, Xinyan Xiao, and Hua Wu. 2019. ARNOR: Attention regularization based noise reduction for distant supervision relation classification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1399–1408, Florence, Italy. Association for Computational Linguistics.

Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. 2018. MentorNet: Learning data driven curriculum for very deep neural networks on corrupted labels. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 2304–2313. PMLR.

Youngdong Kim, Junho Yim, Juseung Yun, and Junmo Kim. 2019. Nlnl: Negative learning for noisy labels. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 101–110.

Kai Lei, Daoyuan Chen, Yaliang Li, Nan Du, Min Yang, Wei Fan, and Ying Shen. 2018. Cooperative denoising for distantly supervised relation extraction. In Proceedings of the 27th International Conference on Computational Linguistics, pages 426–436, Santa Fe, New Mexico, USA. Association for Computational Linguistics.

Junnan Li, Richard Socher, and Steven CH Hoi. 2019. Dividemix: Learning with noisy labels as semisupervised learning. In International Conference on Learning Representations.

Yang Li, Guodong Long, Tao Shen, Tianyi Zhou, Lina Yao, Huan Huo, and Jing Jiang. 2020. Self-attention enhanced selective gate with entity aware embedding for distantly supervised relation extraction. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05):8269–8276.

Yankai Lin, Zhiyuan Liu, and Maosong Sun. 2017. Neural relation extraction with multi-lingual attention. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 34–43, Vancouver, Canada. Association for Computational Linguistics.

Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2016. Neural relation extraction with selective attention over instances. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2124–2133.

Tianyu Liu, Kexiang Wang, Baobao Chang, and Zhifang Sui. 2017. A soft-label method for noise tolerant distantly supervised relation extraction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1790–1795, Copenhagen, Denmark. Association for Computational Linguistics.

Bingfeng Luo, Yansong Feng, Zheng Wang, Zhanxing Zhu, Songfang Huang, Rui Yan, and Dongyan Zhao. 2017. Learning with noise: Enhance distantly supervised relation extraction with dynamic transition matrix. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 430–439, Vancouver, Canada. Association for Computational Linguistics.

Yueming Lyu and Ivor W. Tsang. 2020. Curriculum loss: Robust learning and generalization against label corruption. In International Conference on Learning Representations.

Eran Malach and Shai Shalev-Shwartz. 2017. Decoupling” when to update” from” how to update”. In NIPS.

Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 1003–1011.

Pengda Qin, Weiran Xu, and William Yang Wang. 2018. Robust distant supervision relation extraction via deep reinforcement learning. arXiv preprint arXiv:1805.09927.

Jianfeng Qu, Wen Hua, Dantong Ouyang, Xiaofang Zhou, and Ximing Li. 2019. A fine-grained and noise-aware method for neural relation extraction. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pages 659–668.

Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. 2018. Learning to reweight examples for robust deep learning. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4334–4343. PMLR.

Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling relations and their mentions without labeled text. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 148–163. Springer.

Yuming Shang. 2019. Are noisy sentences useless for distant supervised relation extraction?

Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D. Manning. 2012. Multi-instance multi-label learning for relation extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 455–465, Jeju Island, Korea. Association for Computational Linguistics.

Shikhar Vashishth, Rishabh Joshi, Sai Suman Prayaga, Chiranjib Bhattacharyya, and Partha Talukdar. 2018. RESIDE: Improving distantly-supervised neural relation extraction using side information. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1257–1266, Brussels, Belgium. Association for Computational Linguistics.

Patrick Verga, David Belanger, Emma Strubell, Benjamin Roth, and Andrew McCallum. 2016. Multilingual relation extraction using compositional universal schema. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 886–896, San Diego, California. Association for Computational Linguistics.

Xiaozhi Wang, Xu Han, Yankai Lin, Zhiyuan Liu, and Maosong Sun. 2018. Adversarial multi-lingual neural relation extraction. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1156–1166, Santa Fe, New Mexico, USA. Association for Computational Linguistics.

Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 28.

Fei Wu and Daniel S. Weld. 2010. Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 118–127, Uppsala, Sweden. Association for Computational Linguistics.

Yi Wu, David Bamman, and Stuart Russell. 2017. Adversarial training for relation extraction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1778–1783, Copenhagen, Denmark. Association for Computational Linguistics.

Xiaobo Xia, Tongliang Liu, Nannan Wang, Bo Han, Chen Gong, Gang Niu, and Masashi Sugiyama. Are anchor points really indispensable in label-noise learning?

Kun Xu, Siva Reddy, Yansong Feng, Songfang Huang, and Dongyan Zhao. 2016. Question answering on Freebase via relation extraction and textual evidence. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2326–2336, Berlin, Germany. Association for Computational Linguistics.

Xuchen Yao and Benjamin Van Durme. 2014. Information extraction over structured data: Question answering with Freebase. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 956–966, Baltimore, Maryland. Association for Computational Linguistics.

Zhi-Xiu Ye and Zhen-Hua Ling. 2019. Distant supervision relation extraction with intra-bag and inter-bag attentions. arXiv preprint arXiv:1904.00143.

Xingrui Yu, Bo Han, Jiangchao Yao, Gang Niu, Ivor Tsang, and Masashi Sugiyama. 2019. How does disagreement help generalization against label corruption? In International Conference on Machine Learning, pages 7164–7173. PMLR.

Changsen Yuan, Heyan Huang, Chong Feng, Xiao Liu, and Xiaochi Wei. 2019a. Distant supervision for relation extraction with linear attenuation simulation and non-iid relevance embedding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 7418–7425.

Yujin Yuan, Liyuan Liu, Siliang Tang, Zhongfei Zhang, Yueting Zhuang, Shiliang Pu, Fei Wu, and Xiang Ren. 2019b. Cross-relation cross-bag attention for distantly-supervised relation extraction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 419–426.

Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. 1. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the 2015 conference on empirical methods in natural language processing, pages 1753–1762.

Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, and Jun Zhao. 2014. Relation classification via convolutional deep neural network. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 2335–2344, Dublin, Ireland. Dublin City University and Association for Computational Linguistics.

Ningyu Zhang, Shumin Deng, Zhanlin Sun, Guanying Wang, Xi Chen, Wei Zhang, and Huajun Chen. 2019. Long-tail relation extraction via knowledge graph embeddings and graph convolution networks. arXiv preprint arXiv:1903.01306.

Shu Zhang, Dequan Zheng, Xinchen Hu, and Ming Yang. 2015. Bidirectional long short-term memory networks for relation classification. In Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation, pages 73–78, Shanghai, China.

Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D. Manning. 2017. Position aware attention and supervised data improve slot filling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 35–45, Copenhagen, Denmark. Association for Computational Linguistics.

Zhilu Zhang and Mert Sabuncu. 2018. Generalized cross entropy loss for training deep neural networks with noisy labels. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc.

Shun Zheng, Xu Han, Yankai Lin, Peilin Yu, Lu Chen, Ling Huang, Zhiyuan Liu, and Wei Xu. 2019. DIAG-NRE: A neural pattern diagnosis framework for distantly supervised neural relation extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1419–1429, Florence, Italy. Association for Computational Linguistics.

Zhangdong Zhu, Jindian Su, and Yang Zhou. 2019. Improving distantly supervised relation classification with attention and semantic weight. IEEE Access, 7:91160–91168

ACL2021，下载链接