Logo image
Accuracy of Autism-Related TikTok Information in Italian: A Comparison Between Human Raters and Large Language Models
Journal article   Open access   Peer reviewed

Accuracy of Autism-Related TikTok Information in Italian: A Comparison Between Human Raters and Large Language Models

Alessandro Carollo, Seraphina Fong, Giovanni Belardinelli, Silvia Perzolli, Giacomo Vivanti, Daniel Messinger, Dagmara Dimitriou and Gianluca Esposito
Journal of autism and developmental disorders
18 Feb 2026
PMID: 41706307
url
https://doi.org/10.1007/s10803-026-07249-9View
Published, Version of Record (VoR)CC BY V4.0 Open

Abstract

Autism spectrum disorder Social networking sites Large language models TikTok Clinical information accuracy
Purpose Social networking sites are major channels for sharing information on neurodiversity, including autism spectrum disorder. TikTok has become a particularly influential platform for autism-related communication, yet concerns remain about the scientific accuracy of such content. Most prior studies have focused on English-language videos and have evaluated accuracy with limited granularity. Additionally, the difficulty of achieving consistent expert ratings underscores the need for automated reliability assessment. Methods In this study, we examined 408 informational statements extracted from 148 TikTok videos posted under the hashtag #Autismo (Italian for #Autism). Three clinical experts independently classified each statement as inaccurate, overgeneralized, or accurate; their median ratings served as the human-derived ground truth and were compared with classifications from two large language models: ChatGPT 4.0 mini and Gemini 1.5 Flash. Results Human raters showed moderate agreement (κmean = 0.52) and high specific agreement only for accurate statements, with lower agreement for overgeneralized and inaccurate content. ChatGPT achieved moderate agreement with human ratings (κ = 0.58), while Gemini reached only fair agreement (κ = 0.29). ChatGPT also exhibited a more conservative evaluation pattern (accurate information: precision = 0.89, recall = 0.82), whereas Gemini tended to overestimate accuracy (accurate information: precision = 0.76, recall = 0.93). Conclusion These findings suggest that LLMs, particularly ChatGPT, may support cautious and assistive evaluation of online health content. Future research should assess their applicability across online communities and platforms and explore their integration into accuracy-based alert systems that provide users with contextual reliability cues.

Metrics

3 Record Views

Details

Research   19 Feb 2026

La Voce Del Trentino (Redazione Trento)
Logo image