Mohapatra, H. ; Jain, S. ; Chakrabarti, S. (2013) Joint bootstrapping of corpus annotations and entity types In: Conference on Empirical Methods in Natural Language Processing.
|
PDF
329kB |
Official URL: http://mirror.aclweb.org/emnlp2013/papers.html
Abstract
Web search can be enhanced in powerful ways if token spans in Web text are annotated with disambiguated entities from large catalogs like Freebase. Entity annotators need to be trained on sample mention snippets. Wikipedia entities and annotated pages offer high-quality labeled data for training and evaluation. Unfortunately, Wikipedia features only one-ninth the number of entities as Freebase, and these are a highly biased sample of well-connected, frequently mentioned "head" entities. To bring hope to "tail" entities, we broaden our goal to a second task: assigning types to entities in Freebase but not Wikipedia. The two tasks are synergistic: knowing the types of unfamiliar entities helps disambiguate mentions, and words in mention contexts help assign types to entities. We present TMI, a bipartite graphical model for joint type-mention inference. TMI attempts no schema integration or entity resolution, but exploits the above-mentioned synergy. In experiments involving 780,000 people in Wikipedia, 2.3 million people in Freebase, 700 million Web pages, and over 20 professional editors, TMI shows considerable annotation accuracy improvement (e.g. 70%) compared to baselines (e.g. 46%), especially for "tail" and emerging entities. We also compare with Google's recent annotations of the same corpus with Freebase entities, and report considerable improvements within the people domain.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Source: | Copyright of this article belongs to Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics. |
ID Code: | 99980 |
Deposited On: | 12 Feb 2018 12:26 |
Last Modified: | 12 Feb 2018 12:26 |
Repository Staff Only: item control page