GraphNER

GraphNER is a named entity recognizer that uses graph propagation and improves BANNER and BANNER-ChemDNER systems. Data is available for gene mention detection task.

Project Description

GraphNER is a named entity recognizer that uses graph propagation and improves BANNER and BANNER-ChemDNER systems.

BANNER and BANNER-ChemDNER are named entity recognition systems based on chain first order and second order conditional random fields(CRF). These systems formulate the named entity recognition task as a tagging task where each type of entity has a distinct beginning and inside marker and there is one tag to mark when a word does not belong to any named entity. For example if we are interested in genes, mutations, and diseases we can have B-GENE, I-GENE, B-MUTATION, I-MUTATION, B-DISEASE, I-DISEASE, and O. CRF is supervised and also ignores corpus level similarities between words. GraphNER improves upon a CRF-based system by using a graph that encodes these similarities. The output of CRF-based models are extracted in the form of posterior and transition probabilities. The posteriors get propagated on graph vertices so that similar vertices get similar distributions, and the updated label distributions are combined with the transition probabilities in a viterbi algorithm.

GraphNER works with the data format of Biocreative II shared task data, also supported by BANNER and BANNER-ChemDNER. Data of gene mention detection subtask in BioCreative II shared task is available for testing.