Snowball: extracting relations from large plain-text collections
[DBLP_Link] [Online_Version] CitedBy 486 [PDF]-
Abstract:
Text documents often contain valuable structured data that is hidden Yin regular English sentences. This data is best exploited infavailable as arelational table that we could use for answering precise queries or running data mining tasks.We explore a technique for extracting such tables from document collections that requires only a handful of training examples from users. These examples are used to generate extraction patterns, that in turn result in new tuples being extracted from the document collection.We build on this idea and present our Snowball system. Snowball introduces novel strategies for generating patterns and extracting tuples from plain-text documents.At each iteration of the extraction process, Snowball evaluates the quality of these patterns and tuples without human intervention,and keeps only the most reliable ones for the next iteration. In this paper we also develop a scalable evaluation methodology and metrics for our task, and present a thorough experimental evaluation of Snowball and comparable techniques over a collection of more than 300,000 newspaper documents.
- Year: 2000
- Pages: 10
- In Proceedings: ACM DL
-
Authors:
Eugene Agichtein
(Courant Institue of Mathematical Sciences NYU The Proteus Project )
H-index: 17; Papers: 46; Citation: 1582 [FOAF] Homepage: http://www.cooper.edu/~agicht/resume.html Expertise: Information Retrieval / Probabilistic Indexing; XML Data; Data mining; Web Mining; Digital library / Information Access;
Luis Gravano
(Associate Professor, Computer Science Department Columbia University)
H-index: 37; Papers: 85; Citation: 5439 [FOAF] Homepage: http://www.cs.columbia.edu/~gravano Expertise: XML Data; Digital library / Information Access; Database Systems; Parallel Algorithms / Wormhole Networks;
Reference: [Top]
Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer
Authors: Gerard Salton
Automatically Generating Extraction Patterns from Untagged Text
Authors: Ellen Riloff
Organization: AAAI/IAAI, Vol. 2
Description of the UMass system as used for MUC-6
Authors: David Fisher Stephen Soderland Fangfang Feng Wendy G. Lehnert
Organization: MUC
Learning to construct knowledge bases from the World Wide Web
Authors: Mark Craven Dan DiPasquo Dayne Freitag Andrew McCallum Tom M. Mitchell Kamal Nigam Sean Slattery
Organization: Artif. Intell.
Cited By: [Top]
Extracting position relations from the web
Authors: Yanhong Liu Peiquan Jin Lihua Yue
Organization: WIDM
A stopping criterion for active learning
Authors: Andreas Vlachos
Organization: Computer Speech Language
Improving the performance of question answering with semantically equivalent answer patterns
Authors: Leila Kosseim Jamileh Yousefi
Organization: Data Knowl. Eng.
Relation discovery from web data for competency management
Authors: Jianhan Zhu Alexandre L. Goncalves Victoria S. Uren Enrico Motta Roberto Pacheco Marc Eisenstadt Dawei Song
Organization: Web Intelligence and Agent Systems
Flint: Google-basing the Web
Authors: Lorenzo Blanco Valter Crescenzi Paolo Merialdo Paolo Papotti
Organization: EDBT
A quality-aware optimizer for information extraction
Authors: Alpa Jain Panagiotis G. Ipeirotis
Organization: ACM Trans. Database Syst.
Automatic Extraction of the Fine Category of Person Named Entities from Text Corpora
Authors: Tri-Thanh Nguyen Akira Shimazu
Organization: IEICE - Transactions on Information and Systems
Semantic relation extraction from socially-generated tags: a methodology for metadata generation
Authors: Miao Chen Xiaozhong Liu Jian Qin
Organization: Proceedings of the 2008 International Conference on Dublin Core and Metadata Applications
Building query optimizers for information extraction: the SQoUT project
Authors: Alpa Jain Panagiotis Ipeirotis Luis Gravano
Organization: ACM SIGMOD Record
Unsupervised named-entity extraction from the Web: An experimental study
Authors: Oren Etzioni Michael J. Cafarella Doug Downey Ana-Maria Popescu Tal Shaked Stephen Soderland Daniel S. Weld Alexander Yates
Organization: Artif. Intell.
Assessing the correlation between contextual patterns and biological entity tagging
Authors: M. Krallinger M. Padrón C. Blaschke A. Valencia
Organization: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Information Extraction and Semantic Annotation of Wikipedia
Authors: Maria Ruiz-Casado Enrique Alfonseca Manabu Okumura Pablo Castells
Organization: Proceeding of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge
The semantics of a definiendum constrains both the lexical semantics and the lexicosyntactic patterns in the definiens
Authors: Hong Yu Ying Wei
Organization: Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Task Driven Coreference Resolution for Relation Extraction
Authors: Feiyu Xu Hans Uszkoreit Hong Li
Organization: ECAI
Ontology extraction and conceptual modeling for web information
Authors: Hyoil Han Ramez Elmasri
Organization: Information modeling for internet applications
Ontology-driven, unsupervised instance population
Authors: Luke McDowell Michael J. Cafarella
Organization: J. Web Sem.
Learning Rules for Conceptual Structure on the Web
Authors: Hyoil Han Ramez Elmasri
Organization: J. Intell. Inf. Syst.
Automatic thesaurus generation for Chinese documents
Authors: Yuen-Hsien Tseng
Organization: Journal of the American Society for Information Science and Technology
YAGO: A Large Ontology from Wikipedia and WordNet
Authors: Fabian M. Suchanek Gjergji Kasneci Gerhard Weikum
Organization: J. Web Sem.
A Rote Extractor with Edit Distance-Based Generalisation and Multi-Corpora Precision Calculation
Authors: Enrique Alfonseca Pablo Castells Manabu Okumura Maria Ruiz-Casado
Organization: ACL
URES : an Unsupervised Web Relation Extraction System
Authors: Binyamin Rosenfeld Ronen Feldman
Organization: ACL
On-Demand Information Extraction
Authors: Satoshi Sekine
Organization: ACL
Unsupervised Relation Disambiguation Using Spectral Clustering
Authors: Jinxiu Chen Dong-Hong Ji Chew Lim Tan Zheng-Yu Niu
Organization: ACL
A portable method for acquiring information extraction patterns without annotated corpora
Authors: Neus Català Núria Castell Mario Martín
Organization: Natural Language Engineering
The role of documents vs. queries in extracting class attributes from text
Authors: Marius Paşca Benjamin Van Durme Nikesh Garera
Organization: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A Relational Approach to Incrementally Extracting and Querying Structure in Unstructured Data
Authors: Eric Chu Akanksha Baid Ting Chen AnHai Doan Jeffrey F. Naughton
Organization: VLDB
Self-supervised relation extraction from the Web
Authors: Benjamin Rosenfeld Ronen Feldman
Organization: Knowl. Inf. Syst.
Relation Extraction from Wikipedia Using Subtree Mining
Authors: Dat P. T. Nguyen Yutaka Matsuo Mitsuru Ishizuka
Organization: AAAI
Harvesting relations from the web: quantifiying the impact of filtering functions
Authors: Sebastian Blohm Philipp Cimiano Egon Stemle
Organization: Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Subtree Mining for Relation Extraction from Wikipedia
Authors: Dat P. T. Nguyen Yutaka Matsuo Mitsuru Ishizuka
Organization: HLT-NAACL (Short Papers)
Unsupervised information extraction approach using graph mutual reinforcement
Authors: Hany Hassan Ahmed Hassan Ossama Emam
Organization: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Semi-supervised Relation Extraction with Label Propagation
Authors: Jinxiu Chen Dong-Hong Ji Chew Lim Tan Zheng-Yu Niu
Organization: HLT-NAACL
Learning field compatibilities to extract database records from unstructured text
Authors: Michael Wick Aron Culotta Andrew McCallum
Organization: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Coupling semi-supervised learning of categories and relations
Authors: Andrew Carlson Justin Betteridge Estevam R. Hruschka Jr. Tom M. Mitchell
Organization: Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Unsupervised relation disambiguation with order identification capabilities
Authors: Jinxiu Chen Donghong Ji Chew Lim Tan Zhengyu Niu
Organization: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Entity annotation based on inverse index operations
Authors: Ganesh Ramakrishnan Sreeram Balakrishnan Sachindra Joshi
Organization: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Boosting unsupervised relation extraction by using NER
Authors: Ronen Feldman Benjamin Rosenfeld
Organization: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
May all your wishes come true: a study of wishes and how to recognize them
Authors: Andrew B. Goldberg Nathanael Fillmore David Andrzejewski Zhiting Xu Bryan Gibson Xiaojin Zhu
Organization: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Surrogate learning: from feature independence to semi-supervised classification
Authors: Sriharsha Veeramachaneni Ravi Kumar Kondadadi
Organization: Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Relation detection between named entities: report of a shared task
Authors: Cláudia Freitas Diana Santos Cristina Mota Hugo Gonçalo Oliveira Paula Carvalho
Organization: Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
Structural, Transitive and Latent Models for Biographic Fact Extraction
Authors: Nikesh Garera David Yarowsky
Organization: EACL
Combining linguistic and statistical analysis to extract relations from web documents
Authors: Fabian M. Suchanek Georgiana Ifrim Gerhard Weikum
Organization: KDD
Other Format:LNCS: [Top]
IEEE:
ACM:
Latex: