Logo image
Trustworthy generative AI through security, safety, grounding, and verification
Dissertation   Open access

Trustworthy generative AI through security, safety, grounding, and verification

Manil Shrestha
Doctor of Philosophy (Ph.D.), Drexel University
Mar 2026
DOI:
https://doi.org/10.17918/00011305
pdf
Shrestha_Manil_20266.37 MBDownloadView

Abstract

Generative artificial intelligence Large language models Trustworthy artificial intelligence
The rapid growth of generative artificial intelligence has created opportunities across critical domains, yet fundamental trust barriers still limit deployment in high-stakes applications. The field currently faces three interconnected challenges: safety risks from agents that can be exploited for harmful purposes, privacy vulnerabilities arising from centralized inference, and unreliable hallucinated outputs. This dissertation addresses these challenges through four independent research contributions aligned with the themes of Safety, Security, Grounding, and Verification. The first contribution examines application-level safety by analyzing the behavior of LLM-powered penetration testing agents, characterizing their capabilities, limitations, and the risks associated with autonomous offensive-security tools. The second contribution investigates Secure Multi-Party Computation (SMPC) for privacy-preserving inference, demonstrating how generative models can operate over decentralized servers while protecting both sensitive data and proprietary model parameters. The third contribution focuses on grounding by coupling language models with knowledge graphs and embedding-guided graph traversal so that generated outputs remain connected to structured, verifiable information rather than relying on unconstrained text generation. The fourth contribution develops conformal prediction-based methods that provide finite-sample statistical guarantees on model outputs. By decomposing predictions into atomic statements and attaching calibrated confidence measures, this work offers a principled mechanism to quantify uncertainty, validate outputs, and enforce safety constraints during generative inference. Together, these contributions unify statistical guarantees, structured grounding, and privacy-preserving computation to support interpretable and secure deployment of generative AI in high-stakes environments.

Metrics

1 File views/ downloads
1 Record Views

Details

Logo image