A comparison of document clusters derived from co-cited references and co-assigned index terms
Barbara Ann Rapp
Doctor of Philosophy (Ph.D.), Drexel University
1985
DOI:
https://doi.org/10.17918/00009885
Files and links (1)
pdf
Rapp_Barbara_19858.30 MB
PDF Restricted Access, VIEWABLE UPON REQUEST: contact archives@drexel.edu
Abstract
The major objective of this dissertation was to determine how similarly co-cited references and co-assigned MeSH terms structure a set of documents. It was a large-scale, exploratory study of the database partitioning effects of two fundamentally different indexing systems, both of which have the common information retrieval purpose of identifying documents related in subject, yet each of which, judging from our results, performs quite independently. We draw a matched dataset of over 8500 documents, published during the years 1975 to 1977 in 20 cardiology journals, from the MEDLINE database and from the Institute for Scientific Information's citation database. Single-link clustering was used to partition the data set in three different ways. Co-citation clustering was used to partition on the basis of reference relationships. To partition on the basis of MeSH index terms, we used both term clustering and document clustering. Results of the three cluster analyses were compared pairwise to measure similarity of document placement. Overlap between cluster systems was measured in two ways. First, an asymmetric measure of percent overlap was computed between all cluster pairs in two systems being compared. Second, beginning with selected clusters in one of the systems based on MeSH term relationships, we traced their member papers in each of the other two cluster systems. When seed papers mapped to several clusters, inter-cluster relationships of the clusters so identified were examined. In both overlap analyses, we found extremely low overlap between the clusters derived from co-citation analysis and those derived from term clustering or document clustering based on MeSH term relationships. Consistent findings of low overlap suggest that, as alternative document representations, cited references and MeSH index terms organize a database in very different ways. Differences in organization are most likely accounted for by differences in intellectual structure of the systems for assigning references and index terms. Subject analysis of selected clusters showed that, in spite of low overlap between cluster systems, clusters within each system were cohesive in subject content. Our results illustrate the complementary nature of references and index terms in how they partition a database. Results are consistent with low overlap observed in comparative retrieval studies, and reinforce the notion that these two methods of representing document content are complementary in retrieval performance.
Metrics
10 Record Views
Details
Title
A comparison of document clusters derived from co-cited references and co-assigned index terms
Creators
Barbara Ann Rapp
Awarding Institution
Drexel University
Degree Awarded
Doctor of Philosophy (Ph.D.)
Publisher
Drexel University; Philadelphia, Pennsylvania
Number of pages
x, 216 pages
Resource Type
Dissertation
Language
English
Academic Unit
College of Information Studies (1984-1995); Drexel University
Other Identifier
991021888779904721
Research Home Page
Browse by research and academic units
Learn about the ETD submission process at Drexel
Learn about the Libraries’ research data management services