Logo image
Historical subject representation: an analysis of historical vocabularies for temporally-aligned and contextual access points
Dissertation   Open access

Historical subject representation: an analysis of historical vocabularies for temporally-aligned and contextual access points

Sam Grabus
Doctor of Philosophy (Ph.D.), Drexel University
Jun 2023
DOI:
https://doi.org/10.17918/00001709
pdf
Grabus_Sam_20232.35 MBDownloadView

Abstract

Automatic indexing Controlled vocabularies Encyclopedia Britannica Subject headings, Library of Congress Metadata Temporal drift Library Science
Introduction: Topical metadata for historical digital collections is primarily generated using contemporary terminologies. One key challenge with these controlled vocabulary terms is that they can significantly differ from the language of the historical documents being indexed. As a result, key topics in the historical text may not be represented in the metadata record because the historical terms are no longer part of our contemporary language, and therefore not part of our current knowledge organization systems. This dissertation research analyzes changes in these terms over time, as they manifest through historical resources and controlled vocabulary versions. This dissertation study compares the use of historical vocabularies (e.g., historical and contemporary versions of the Library of Congress Subject Headings) for providing contextualized and temporally-aligned access to historical resources, working with the 19th-Century Knowledge Project use case. Goal: The goal of the dissertation research is to advance our understanding of temporal drift as it manifests in historical resources and the controlled vocabulary versions used to index them. Specifically, the goal of this research is to explore and compare the use of temporally-aligned and contemporary controlled vocabularies to highlight semantic changes in both historical documents and knowledge organization systems. Method: This study uses a comparative mixed methods, consisting of a mapping analysis, document analysis, content analysis, and user relevance judgment. There were four phases of data collection and analysis. Phase 1 includes data collection through the comparison of automatic subject indexing outputs using HIVE for the two selected vocabulary versions (1910 LCSH and 2021 FAST). The resulting data are terms exclusive to the 1910 LCSH indexing output. Phase 2 includes a synthesis of document and mapping analysis to identify terms that demonstrate temporal drift and determine contemporary equivalent subject heading. Phase 3 has a two-pronged approach: content analysis to qualitatively cluster the terms into conceptual categories, and relevance judgment to account for common indexing errors and ensure that the results are relevant to the encyclopedia entries from which they were retrieved. Phase 4 synthesizes the results through descriptive statistics to identify conceptual trends among the terms, and a crosswalk table to illustrate how the terms have changed. Findings: From a population of 335 unique 1910 LCSH terms, 14% demonstrated temporal drift. A total of 48 high-level categories and 55 subcategories were assigned to these 47 terms demonstrating temporal drift. The most common category was Geography & Travel, which accounts for 25% of the categories assigned. The second-most prevalent category was Philosophy and Religion, accounting for 14.58% of the categories assigned. The most common subcategory was Human Languages, accounting for 14.54% of the subcategories assigned, followed by Languages, which accounted for 12.73% of the subcategories. A final crosswalk documenting temporal drift terms and their contemporary equivalent terms demonstrates how the terms changed over time. Implications: This research provides a way to address the temporal misalignment between historical resources and the controlled vocabularies used to index them. This research also demonstrates how automated infrastructural inversion can be achieved to study how terms in knowledge organization systems have changed over time. This dissertation study contributes a better understanding of temporal drift as it manifests in historical resources and controlled vocabulary versions, provides a four-phase method on how to study this topic, and contributes a refined methodological approach that could be modified and reused to study temporal drift across different vocabularies and historical corpora.

Metrics

64 File views/ downloads
67 Record Views

Details

Logo image