A BART-based approach with hierarchical strategy for Vietnamese abstractive multi-document summarization
Summary
A technical report introduces a BART-based hierarchical strategy for Vietnamese abstractive multi-document summarization, addressing a challenge from the International Workshop on Vietnamese Language and Speech Processing (VLSP) 2022. The approach condenses individual documents before aggregation and final summarization. A novel document shortening strategy, guided by the "golden summary," ensures strong correlation across the hierarchical stages. This method achieved a ROUGE2-F1 score of 0.2468 on VLSP's public test set, producing fluent and concise summaries. Additionally, the authors utilized external sources to augment data quantity for Vietnamese multi-document summarization, making this supplementary data available to the community.
Key takeaway
For NLP Engineers developing multi-document summarization systems for low-resource languages like Vietnamese, this research offers a validated hierarchical BART-based approach. You should consider implementing a golden summary-driven document shortening strategy to improve inter-stage correlation and utilize external data sources to enhance model performance. This method achieved a ROUGE2-F1 of 0.2468, suggesting a robust framework for producing fluent and concise summaries.
Key insights
A BART-based hierarchical strategy with golden summary-driven shortening improves Vietnamese multi-document abstractive summarization, achieving 0.2468 ROUGE2-F1.
Principles
- Hierarchical summarization condenses documents then aggregates.
- Golden summary-driven shortening enhances stage correlation.
- External data augmentation improves summarization performance.
Method
The method condenses each document using a novel golden summary-driven shortening strategy, followed by aggregation and abstractive summarization with a BART-based model.
In practice
- Utilize BART for abstractive summarization tasks.
- Implement hierarchical strategies for multi-document inputs.
- Incorporate external data to boost training sets.
Topics
- Multi-document Summarization
- Abstractive Summarization
- Vietnamese NLP
- BART Model
- Hierarchical Summarization
- VLSP 2022
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.