A Bolu: A Structured Dataset for the Computational Analysis of Sardinian Improvisational Poetry

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

A Bolu is introduced as the first structured corpus specifically designed for the computational analysis of Sardinian extemporaneous poetry, known as cantada logudorese. This dataset addresses a significant gap in Natural Language Processing (NLP) for minority languages and the preservation of oral linguistic heritage. Comprising 2,835 stanzas and 141,321 tokens, A Bolu facilitates the documentation and analysis of improvised poetic structures. The study details the corpus architecture and employs a multidimensional analysis, combining descriptive statistics with computational linguistics techniques. Initial findings reveal recurring patterns in Sardinian extemporaneous poetry, supporting Parry and Lord's theory of formulaicity, and contribute to developing more inclusive NLP tools for less widely spoken languages.

Key takeaway

For NLP Engineers and Research Scientists focused on linguistic diversity, A Bolu offers a critical resource for exploring extemporaneous poetry. Your work can now leverage this structured dataset to develop more inclusive NLP models, particularly for minority languages, and to computationally validate theories of oral creativity like formulaicity. Consider integrating similar corpus creation methodologies for other under-resourced oral traditions.

Key insights

A Bolu is the first structured corpus for analyzing Sardinian extemporaneous poetry, revealing formulaic patterns.

Principles

Method

The study uses a multidimensional analysis, combining descriptive statistical indices and computational linguistics techniques to map poetic text characteristics within the A Bolu corpus.

In practice

Topics

Best for: NLP Engineer, Research Scientist, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.