Boundary-based MWE segmentation with text partitioning

Jake Ryland Williams

doi:10.48550/arxiv.1608.02025

Back

Boundary-based MWE segmentation with text partitioning

Preprint

Open access

Boundary-based MWE segmentation with text partitioning

Jake Ryland Williams

arXiv.org

09 Jun 2017

DOI: https://doi.org/10.48550/arxiv.1608.02025

Files and links (1)

url

https://doi.org/10.48550/arxiv.1608.02025View

Preprint (Author's original)arXiv.org - Non-exclusive license to distribute, Open

Abstract

Computer Science - Computation and Language

This work presents a fine-grained, text-chunking algorithm designed for the task of multiword expressions (MWEs) segmentation. As a lexical class, MWEs include a wide variety of idioms, whose automatic identification are a necessity for the handling of colloquial language. This algorithm's core novelty is its use of non-word tokens, i.e., boundaries, in a bottom-up strategy. Leveraging boundaries refines token-level information, forging high-level performance from relatively basic data. The generality of this model's feature space allows for its application across languages and domains. Experiments spanning 19 different languages exhibit a broadly-applicable, state-of-the-art model. Evaluation against recent shared-task data places text partitioning as the overall, best performing MWE segmentation algorithm, covering all MWE classes and multiple English domains (including user-generated text). This performance, coupled with a non-combinatorial, fast-running design, produces an ideal combination for implementations at scale, which are facilitated through the release of open-source software.

Metrics

5 Record Views

Details

Title: Boundary-based MWE segmentation with text partitioning
Creators: Jake Ryland Williams
Publication Details: arXiv.org
Resource Type: Preprint
Language: English
Academic Unit: Information Science (Informatics)
Other Identifier: 991021806679204721

Boundary-based MWE segmentation with text partitioning

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media