Conference proceeding
Automatic approaches to clustering occupational description data for prediction of probability of workplace exposure to beryllium
2011 IEEE International Conference on Granular Computing, pp 596-601
Nov 2011
Abstract
We investigated automatic approaches for clustering data that describes occupations related to hazardous airborne exposure (beryllium). The regulatory compliance data from Occupational Safety and Health Administration includes records containing short free text job descriptions and associated numerical exposure levels. Researchers in public health domain need to map job descriptions to Standard Occupational Classification (SOC) nomenclature for estimating occupational health risks. Previous manual process was time-consuming and did not advance so far to linkage to SOC. We investigated alternative automatic approaches for clustering job descriptions. The clustering results are the first essential step towards discovery of corresponding SOC terms. Our study indicated that the Tolerance Rough Set with Jaccard similarity was a better combination overall. The utility of the algorithm was further verified by applying logistic regression and validating that the predictive power of the automatically generated classifications, in terms of association of "job" with probability of exposure to beryllium above certain threshold, closely approached that of the manually assembled classification of the same 12,148 records.
Metrics
14 Record Views
3 citations in Scopus
Details
- Title
- Automatic approaches to clustering occupational description data for prediction of probability of workplace exposure to beryllium
- Creators
- A Slutsky - Drexel UniversityYuan An - Drexel UniversityT Hu - Drexel UniversityI Burstyn - Drexel University
- Publication Details
- 2011 IEEE International Conference on Granular Computing, pp 596-601
- Conference
- 2011 IEEE International Conference on Granular Computing
- Publisher
- IEEE
- Number of pages
- 1
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Information Science; Environmental and Occupational Health
- Scopus ID
- 2-s2.0-84856795783
- Other Identifier
- 991019173569704721