Exploiting dictionaries in named entity extraction

Cohen, William W. ; Sarawagi, Sunita (2004) Exploiting dictionaries in named entity extraction In: ACM SIGKDD international conference on Knowledge discovery and data mining.

[img] PDF

Official URL: http://doi.org/10.1145/1014052.1014065

Related URL: http://dx.doi.org/10.1145/1014052.1014065


We consider the problem of improving named entity recognition (NER) systems by using external dictionaries---more specifically, the problem of extending state-of-the-art NER systems by incorporating information about the similarity of extracted entities to entities in an external dictionary. This is difficult because most high-performance named entity recognition systems operate by sequentially classifying words as to whether or not they participate in an entity name; however, the most useful similarity measures score entire candidate names. To correct this mismatch we formalize a semi-Markov extraction process, which is based on sequentially classifying segments of several adjacent words, rather than single words. In addition to allowing a natural way of coupling high-performance NER methods and high-performance similarity functions, this formalism also allows the direct use of other useful entity-level features, and provides a more natural formulation of the NER problem than sequential word classification. Experiments in multiple domains show that the new model can substantially improve extraction performance over previous methods for using external dictionaries in NER.

Item Type:Conference or Workshop Item (Paper)
Source:Copyright of this article belongs to ACM, Inc
ID Code:128406
Deposited On:20 Oct 2022 06:26
Last Modified:14 Nov 2022 11:33

Repository Staff Only: item control page