Beyond Cross-Entropy: Discounted Least Information Theory of Entropy (DLITE) Loss and the Impact of Loss Functions on AI-Driven Named Entity Recognition

Sonia Pascua; Michael Pan; Weimao Ke

doi:10.3390/info16090760

Back

Beyond Cross-Entropy: Discounted Least Information Theory of Entropy (DLITE) Loss and the Impact of Loss Functions on AI-Driven Named Entity Recognition

Journal article

Open access

Peer reviewed

Beyond Cross-Entropy: Discounted Least Information Theory of Entropy (DLITE) Loss and the Impact of Loss Functions on AI-Driven Named Entity Recognition

Sonia Pascua, Michael Pan and Weimao Ke

Information (Basel), v 16(9), 760

02 Sep 2025

DOI: https://doi.org/10.3390/info16090760

Featured in Collection : Research Supported by Drexel Libraries' OA Programs

Files and links (1)

url

https://doi.org/10.3390/info16090760View

Published, Version of Record (VoR)Open Access Discount via Drexel Libraries Read and Publish Program 2025CC BY V4.0, Open

Abstract

DLITE Loss

named entity recognition

loss functions

information theory

transformer models

information-theoretic optimization

entropy-aware learning

recall optimization

model uncertainty

noisy datasets

Loss functions play a significant role in shaping model behavior in machine learning, yet their design implications remain underexplored in natural language processing tasks such as Named Entity Recognition (NER). This study investigates the performance and optimization behavior of five loss functions—L1, L2, Cross-Entropy (CE), KL Divergence (KL), and the proposed DLITE (Discounted Least Information Theory of Entropy) Loss—within transformer-based NER models. DLITE introduces a bounded, entropy-discounting approach to penalization, prioritizing recall and training stability, especially under noisy or imbalanced data conditions. We conducted empirical evaluations across three benchmark NER datasets: Basic NER, CoNLL-2003, and the Broad Twitter Corpus. While CE and KL achieved the highest weighted F1-scores in clean datasets, DLITE Loss demonstrated distinct advantages in macro recall, precision–recall balance, and convergence stability—particularly in noisy environments. Our findings suggest that the choice of loss function should align with application-specific priorities, such as minimizing false negatives or managing uncertainty. DLITE adds a new dimension to model design by enabling more measured predictions, making it a valuable alternative in high-stakes or real-world NLP deployments.

Metrics

8 Record Views

Details

Title: Beyond Cross-Entropy: Discounted Least Information Theory of Entropy (DLITE) Loss and the Impact of Loss Functions on AI-Driven Named Entity Recognition
Creators: Sonia Pascua - Drexel University
Michael Pan - University of Rhode Island
Weimao Ke (Corresponding Author) - Drexel University
Publication Details: Information (Basel), v 16(9), 760
Publisher: MDPI
Number of pages: 18
Resource Type: Journal article
Language: English
Academic Unit: Information Science
Web of Science ID: WOS:001580073200001
Scopus ID: 2-s2.0-105017010023
Other Identifier: 991022086609904721

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

#16 Peace, Justice and Strong Institutions

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types: Domestic collaboration
Web of Science research areas: Computer Science, Information Systems

Beyond Cross-Entropy: Discounted Least Information Theory of Entropy (DLITE) Loss and the Impact of Loss Functions on AI-Driven Named Entity Recognition

Files and links (1)

Abstract

Metrics

Details

UN Sustainable Development Goals (SDGs)

InCites Highlights

Drexel University Social media