Journal article
Normalisation of imprecise temporal expressions extracted from text
Knowledge and information systems, v 61(3), pp 1361-1394
01 Dec 2019
Featured in Collection : UN Sustainable Development Goals @ Drexel
Abstract
Information extraction systems and techniques have been largely used to deal with the increasing amount of unstructured data available nowadays. Time is among the different kinds of information that may be extracted from such unstructured data sources, including text documents. However, the inability to correctly identify and extract temporal information from text makes it difficult to understand how the extracted events are organised in a chronological order. Furthermore, in many situations, the meaning of temporal expressions (timexes) is imprecise, such as in “less than 2 years” and “several weeks”, and cannot be accurately normalised, leading to interpretation errors. Although there are some approaches that enable representing imprecise timexes, they are not designed to be applied to specific scenarios and difficult to generalise. This paper presents a novel methodology to analyse and normalise imprecise temporal expressions by representing temporal imprecision in the form of membership functions, based on human interpretation of time in two different languages (Portuguese and English). Each resulting model is a generalisation of probability distributions in the form of trapezoidal and hexagonal fuzzy membership functions. We use an adapted
F
1-score to guide the choice of the best models for each kind of imprecise timex and a weighted
F
1-score (
F
1
3
D
) as a complementary metric in order to identify relevant differences when comparing two normalisation models. We apply the proposed methodology for three distinct classes of imprecise timexes, and the resulting models give distinct insights in the way each kind of temporal expression is interpreted.
Metrics
Details
- Title
- Normalisation of imprecise temporal expressions extracted from text
- Creators
- Hegler Tissot - University College LondonMarcos Didonet Del Fabro - C3SL Labs, Universidade Federal do ParanáLeon Derczynski - IT University of CopenhagenAngus Roberts - King's College London
- Publication Details
- Knowledge and information systems, v 61(3), pp 1361-1394
- Publisher
- Springer London
- Grant note
- University College London (UCL)
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Information Science (Informatics)
- Web of Science ID
- WOS:000491431700007
- Scopus ID
- 2-s2.0-85061660895
- Other Identifier
- 991021862413904721
UN Sustainable Development Goals (SDGs)
This publication has contributed to the advancement of the following goals:
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Domestic collaboration
- International collaboration
- Web of Science research areas
- Computer Science, Artificial Intelligence
- Computer Science, Information Systems