Text mixing shapes the anatomy of rank-frequency distributions: A modern Zipfian mechanics for natural language

Jake Ryland Williams; James P Bagrow; Christopher M Danforth; Peter Sheridan Dodds

doi:10.48550/arxiv.1409.3870

Back

Text mixing shapes the anatomy of rank-frequency distributions: A modern Zipfian mechanics for natural language

Preprint

Open access

Text mixing shapes the anatomy of rank-frequency distributions: A modern Zipfian mechanics for natural language

Jake Ryland Williams, James P Bagrow, Christopher M Danforth and Peter Sheridan Dodds

arXiv.org

30 Jan 2015

DOI: https://doi.org/10.48550/arxiv.1409.3870

Files and links (1)

url

https://doi.org/10.48550/arxiv.1409.3870View

Preprint (Author's original)arXiv.org - Non-exclusive license to distribute, Open

Abstract

Computer Science - Computation and Language

Physics - Physics and Society

Phys. Rev. E 91, 052811 (2015) Natural languages are full of rules and exceptions. One of the most famous quantitative rules is Zipf's law which states that the frequency of occurrence of a word is approximately inversely proportional to its rank. Though this `law' of ranks has been found to hold across disparate texts and forms of data, analyses of increasingly large corpora over the last 15 years have revealed the existence of two scaling regimes. These regimes have thus far been explained by a hypothesis suggesting a separability of languages into core and non-core lexica. Here, we present and defend an alternative hypothesis, that the two scaling regimes result from the act of aggregating texts. We observe that text mixing leads to an effective decay of word introduction, which we show provides accurate predictions of the location and severity of breaks in scaling. Upon examining large corpora from 10 languages in the Project Gutenberg eBooks collection (eBooks), we find emphatic empirical support for the universality of our claim.

Metrics

5 Record Views

Details

Title: Text mixing shapes the anatomy of rank-frequency distributions: A modern Zipfian mechanics for natural language
Creators: Jake Ryland Williams
James P Bagrow
Christopher M Danforth
Peter Sheridan Dodds
Publication Details: arXiv.org
Resource Type: Preprint
Language: English
Academic Unit: Information Science
Other Identifier: 991021806684804721

Text mixing shapes the anatomy of rank-frequency distributions: A modern Zipfian mechanics for natural language

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media