Information science Information visualization Communication in science--Data processing
Science is entering its fourth paradigm of "data-intensive science". Relatively little attention has been paid to the users of scientific data, particularly their data practices. This dissertation endeavors to advance our knowledge of data behavior in the new paradigm. In association with the scope of Sloan Digital Sky Survey (SDSS) project, I conduct two major lines of research: a content analysis of SDSS-related scientific publications to investigate astronomers' data use behavior, and a visual exploration analysis (VEA) of SDSS SQL query logs with the design of a visualization tool, SDSS Log Viewer. By integrating results from VEA and statistics, I conducted three case studies of SDSS log data to investigate users' data seeking behavior. For astronomers' data usage behavior, I found that: 1) while a large volume of scientific data is produced in SDSS, researchers that rely on SDSS only intended to leverage the large number and use more data; 2) studies that leveraged a large volume of data from multiple data sources are relatively rare in the SDSS research domain; 3) using data collected by others, both data collection projects and other researchers, is a common data behavior in the SDSS research community; and 4) the results of possibility of data reconstruction suggest that scientific publications themselves are insufficient for linking scientific data with the data sources. For users' data seeking behavior, I found that: 1) a small number of automatic query generators formed the major query traffics (in terms of the number of queries) to the SDSS data archive and six common categories of queries were identified. The number of query templates used by automatic query generators are small; 2) Academic researchers, who are the target users of the SDSS data archive, issued relatively large number of queries mannually. Compared to the queries generated by automatic data requestors, the query templates used by this type of users are rather diverse in terms of both sophistication of condition strings and complexity of query structures. A possible learning hierarchy is observed in this user group; and 3) occasional passing-by users are large in numbers, but their behavior is still unclear. As possible the first empirical study of users' data use and access behavior, aforementioned findings have lay down the foundation for wide range of future study. Also the method used in this study is generic and much of it is applicable to other fields because the specific steps in the methods are independent of application domains.
Metrics
32 File views/ downloads
29 Record Views
Details
Title
Data use and access behavior in eScience
Creators
Jian Zhang - DU
Contributors
Chaomei Chen (Advisor) - Drexel University (1970-)
Awarding Institution
Drexel University
Degree Awarded
Doctor of Philosophy (Ph.D.)
Publisher
Drexel University; Philadelphia, Pennsylvania
Resource Type
Dissertation
Language
English
Academic Unit
College of Information Science and Technology (1995-2013); Drexel University
Other Identifier
3543; 991014632686804721
Research Home Page
Browse by research and academic units
Learn about the ETD submission process at Drexel
Learn about the Libraries’ research data management services