A BART-based approach with hierarchical strategy for Vietnamese abstractive multi-document summarization

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, quick

Summary

A technical report introduces a BART-based hierarchical strategy for Vietnamese abstractive multi-document summarization, addressing a challenge from the International Workshop on Vietnamese Language and Speech Processing (VLSP) 2022. The approach condenses individual documents before aggregation and final summarization. A novel document shortening strategy, guided by the "golden summary," ensures strong correlation across the hierarchical stages. This method achieved a ROUGE2-F1 score of 0.2468 on VLSP's public test set, producing fluent and concise summaries. Additionally, the authors utilized external sources to augment data quantity for Vietnamese multi-document summarization, making this supplementary data available to the community.

Key takeaway

For NLP Engineers developing multi-document summarization systems for low-resource languages like Vietnamese, this research offers a validated hierarchical BART-based approach. You should consider implementing a golden summary-driven document shortening strategy to improve inter-stage correlation and utilize external data sources to enhance model performance. This method achieved a ROUGE2-F1 of 0.2468, suggesting a robust framework for producing fluent and concise summaries.

Key insights

A BART-based hierarchical strategy with golden summary-driven shortening improves Vietnamese multi-document abstractive summarization, achieving 0.2468 ROUGE2-F1.

Principles

Method

The method condenses each document using a novel golden summary-driven shortening strategy, followed by aggregation and abstractive summarization with a BART-based model.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.