Word EMBeddings: From Cognitive Linguistics to Language Engineering, and Back

Responsabile scientifico: Marianna Bolognesi

Logo

Project Code: 2022EPTPJ9_002

Start date: 28/09/2023

End date: 27/09/2025

CUP: J53D23007100001

Financed by the European Union - NextGenerationEU under PNRR - Mission 4 Component 2, Investment 1.1

In the past decade, advancements in deep learning, particularly in the field of natural language processing (NLP) and text mining, have significantly enhanced semantic analysis tasks such as text classification, word sense disambiguation, machine translation, text summarization, question answering, and sentiment analysis. This progress is largely attributed to the concept of word embedding, a word's meaning representation obtained through numeric coordinates, also known as vectors. Current word embeddings, derived from large textual corpora, have demonstrated efficacy but raise questions about their alignment with human language processing. 

The WEMB project aims to address this by pursuing two objectives: firstly, gaining a deeper understanding of how word embeddings align with human language processing, and secondly, leveraging this understanding to develop a new generation of embeddings for NLP tasks. The project employs a "from mind to application and back" approach, bridging the expertise of UniBO in language processing with ISTI-CNR's proficiency in NLP. WEMB focuses on three key aspects: 

  1. Embeddings and Cross-Modality: Investigating the relationship between embeddings that incorporate cross-modal information (e.g., from text and images) and traditional text-based embeddings in language processing. 
  2. Embeddings and Misspellings: Exploring the connection between embeddings and misspellings, a prevalent linguistic behavior in a growing number of texts for various reasons. 
  3. Embeddings and Word Senses: Examining the relationship between embeddings and word senses, particularly among different embeddings associated with different senses of the same ambiguous word. 

Through these investigations, WEMB aims to contribute to both a theoretical understanding of word embeddings in human language processing and the practical development of enhanced embeddings for NLP applications.