Fusion of Text and Audio Semantic Representations Through CCA

Kamelia Aryafar; Ali Shokoufandeh

doi:10.1007/978-3-319-14899-1_7

Back

Fusion of Text and Audio Semantic Representations Through CCA

Conference proceeding

Peer reviewed

Fusion of Text and Audio Semantic Representations Through CCA

Kamelia Aryafar and Ali Shokoufandeh

MULTIMODAL PATTERN RECOGNITION OF SOCIAL SIGNALS IN HUMAN-COMPUTER-INTERACTION, v 8869, pp 66-73

01 Jan 2015

DOI: https://doi.org/10.1007/978-3-319-14899-1_7

Featured in Collection : UN Sustainable Development Goals @ Drexel

Additional Links

Abstract

Computer Science

Computer Science, Artificial Intelligence

Computer Science, Theory & Methods

Science & Technology

Technology

Humans are natural multimedia processing machines. Multimedia is a domain of multi-modalities including audio, text and images. A central aspect of multimedia processing is the coherent integration of media from different modalities as a single identity. Multimodal information fusion architectures become a necessity when not all information channels are available at all times. In this paper, we introduce a multimodal fusion of audio signals and lyrics in a shared semantic space through canonical correlation analysis. We propose an audio retrieval system based on extended semantic analysis of audio signals. We will combine this model with a tf-idf representation of lyrics to achieve a multimodal retrieval system. We use canonical correlation analysis and supervised learning methods as a basis for relating audio and lyrics information. Our experimental evaluation of the proposed method indicated that the proposed model outperforms the prior approaches based on simple canonical correlation methods. Finally, the efficiency of the proposed method allows for dealing with large music and lyrics collections enabling users to explore relevant lyrics information for music datasets.

Metrics

11 Record Views

Details

Title: Fusion of Text and Audio Semantic Representations Through CCA
Creators: Kamelia Aryafar - Drexel University
Ali Shokoufandeh - Drexel University
Contributors: F Schwenker (Editor)
S Scherer (Editor)
L P Morency (Editor)
Publication Details: MULTIMODAL PATTERN RECOGNITION OF SOCIAL SIGNALS IN HUMAN-COMPUTER-INTERACTION, v 8869, pp 66-73
Series: Lecture Notes in Artificial Intelligence
Publisher: Springer Nature
Number of pages: 8
Resource Type: Conference proceeding
Language: English
Academic Unit: Computer Science
Web of Science ID: WOS:000360223900007
Scopus ID: 2-s2.0-84927928093
Other Identifier: 991019168075004721

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Web of Science research areas: Computer Science, Artificial Intelligence; Computer Science, Theory & Methods

Fusion of Text and Audio Semantic Representations Through CCA

Additional Links

Abstract

Metrics

Details

UN Sustainable Development Goals (SDGs)

InCites Highlights

Drexel University Social media