On Solving the Multiple Variable Gapped Longest Common Subsequence Problem

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

This paper introduces a novel Iterative Multi-Source Beam Search (IMSBS) framework to address the Variable Gapped Longest Common Subsequence (VGLCS) problem, a generalization of the classical LCS problem. VGLCS incorporates flexible gap constraints between consecutive characters in solutions, relevant for molecular sequence comparison and time-series analysis. The proposed IMSBS framework utilizes a root-based state graph representation and an iterative beam search strategy to manage the combinatorial explosion arising from a large number of rooted state subgraphs. It dynamically maintains a global pool of promising candidate root nodes, enabling effective diversification across iterations. The study includes a comprehensive computational evaluation on 320 synthetic instances, featuring up to 10 input sequences and 500 characters, demonstrating the robustness of IMSBS over a baseline beam search within comparable runtimes. For $m=2$ cases, IMSBS achieves near-optimal solutions in 63 out of 80 instances, outperforming other heuristic approaches.

Key takeaway

For AI Scientists and Research Scientists working on sequence alignment or time-series analysis with flexible distance constraints, the Iterative Multi-Source Beam Search (IMSBS) offers a robust solution. You should consider implementing IMSBS, especially for problems with multiple input sequences ($m>2$) where traditional dynamic programming is computationally prohibitive. Its ability to explore diverse search regions leads to higher quality solutions compared to single-source beam search methods, making it a strong candidate for complex VGLCS instances.

Key insights

Iterative Multi-Source Beam Search effectively solves the Variable Gapped Longest Common Subsequence problem by exploring multiple root-based state subgraphs.

Principles

Method

IMSBS employs a backward beam search for root refinement, followed by a forward beam search. It dynamically generates new candidate root nodes from complete states, adding them to a global priority queue for subsequent iterations, guided by heuristics like character frequency alignment.

In practice

Topics

Code references

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.