Dissertation
Bridging distinct domains in privacy related learning problems
Doctor of Philosophy (Ph.D.), Drexel University
Sep 2017
DOI:
https://doi.org/10.17918/etd-7607
Abstract
Development of efficient and effective machine learning methods has prompted a surge of research on their application from use in spam filtering to recommender systems. Blindly applying machine learning tools to learning problems in privacy and security, however, does not often produce the desired results. Applications of machine learning in privacy and security are often affected by this difference as adversaries are ordinarily present and training data with reliable ground truth is frequently difficult to obtain. This problem is exacerbated by the fact that data used for testing methods may differ from the real world data that the model is created for. This thesis addresses three learning problems in privacy and security, all of which have data from different domains that needs to be considered. In authorship attribution we tackle the cross-domain case in which the training data and testing data are written in different contexts and mediums. Research in this area has been limited to texts written in the same domain, an assumption that cannot often be made in real world settings. We explore cross-domain attribution in three such domains: blogs, Twitter feeds, and Reddit comments. Research in website fingerprinting focuses on a single domain, the incoming and outgoing packets on a network, to determine which webpage a user is visiting. In addition to this domain, we focus on the websites themselves and develop methods that successfully determine which website level features cause a site to be more or less susceptible to this type of attack. Similarly, most research on the economies and structure of cybercriminal forums focuses on only the domain of private messages. While there is some research that has investigated what can be learned from the public interactions on these forums, no work has tried to bridge these domains. We present a method to predict which public threads are likely to trigger private interactions.
Metrics
40 File views/ downloads
14 Record Views
Details
- Title
- Bridging distinct domains in privacy related learning problems
- Creators
- Rebekah Overdorf - DU
- Contributors
- Rachel Greenstadt (Advisor) - Drexel University (1970-)
- Awarding Institution
- Drexel University
- Degree Awarded
- Doctor of Philosophy (Ph.D.)
- Publisher
- Drexel University; Philadelphia, Pennsylvania
- Number of pages
- xiv, 112 pages
- Resource Type
- Dissertation
- Language
- English
- Academic Unit
- Computer Science (Computing) (2013-2026); College of Computing and Informatics (2013-2026); Drexel University
- Other Identifier
- 7607; 991014632258404721