Revisiting Structural Dependency in Autoregressive Multi-Task Table Recognition via Order-Independent Cell-Level Representations

2026-06-16 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

A new paper introduces a structural refinement module designed to improve multi-task table recognition by addressing order-dependent cell representations. Existing methods, which use autoregressive decoders for table structure prediction, cell localization, and cell content recognition, often create cell features that are sensitive to generation order, impacting global consistency. The proposed module employs non-causal attention to generate order-independent cell features. This design allows for parallel inference of cell contents, leveraging global context encoded in the refined features. Experiments conducted on two large datasets demonstrate consistent improvements in both cell localization and end-to-end recognition, alongside a significant reduction in overall inference time by approximately threefold.

Key takeaway

For Computer Vision Engineers developing multi-task table recognition systems, consider integrating non-causal attention mechanisms to overcome limitations of order-dependent cell representations. Your systems could achieve consistent gains in cell localization and end-to-end recognition accuracy, while significantly reducing inference time by approximately threefold. This approach offers a clear path to more robust and efficient table processing.

Key insights

Order-independent cell representations, achieved via non-causal attention, enhance multi-task table recognition and reduce inference time.

Principles

Autoregressive decoders can create order-dependent cell features.
Non-causal attention enables order-independent feature generation.
Global context improves cell localization and content recognition.

Method

A structural refinement module uses non-causal attention to produce order-independent cell features. This design enables parallel inference of cell contents, conditioning each cell on global context for improved consistency.

In practice

Improve cell localization accuracy.
Reduce table recognition inference time threefold.
Enhance global consistency in cell representations.

Topics

Multi-Task Table Recognition
Non-Causal Attention
Cell Representations
Autoregressive Models
Inference Optimization
Computer Vision

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.