Interpretable experiential learning based on state history and global feedback

2026-05-05 · Source: cs.LG updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

A new interpretable experiential learning model, based on state history and global feedback, has been developed and evaluated. This model learns a behavioral representation as a transition graph between state sets, with transitions attributed with utility and evidence counts. Designed for resource-constrained environments, the model was tested on the OpenAI Gym Atari Breakout benchmark, achieving performance comparable to established neural network-based solutions like DQN and Rainbow-IQN, and surpassing human expert scores. The architecture comprises a state transformer, a state learning layer, and a decision-making layer, utilizing a custom computer vision algorithm for dimensionality reduction. Experiments on low-cost laptops demonstrated its ability to learn from scratch and improve performance rapidly, with certain random seeds yielding average scores of 120 on 30 million training frames and 196 on 41 million frames, outperforming some baselines that require 50 million frames for similar scores.

Key takeaway

For research scientists developing reinforcement learning solutions for mission-critical or resource-constrained applications, this interpretable experiential learning model offers a viable alternative to complex deep RL. You should consider implementing its weighted state transition graph and "global feedback" principle to achieve competitive performance on low-end hardware, while also gaining crucial model transparency for verification and auditing. Focus on optimizing hyperparameters like context size (CS=2) and state similarity (SS=0.9-0.95) to maximize learning efficiency.

Key insights

An interpretable experiential learning model achieves competitive RL performance in resource-constrained settings using state history and global feedback.

Principles

Global feedback updates utility across entire state sequences.
Explicit state representations enhance interpretability.
Context size (CS) and state similarity (SS) are critical hyperparameters.

Method

The model uses a state transformer for interpretable representations, a state learning component with an in-memory graph database for weighted transition graphs, and a decision component maximizing utility or counted utility based on state history.

In practice

Use CS=2 for optimal context in state history.
Set SS between 0.9 and 0.95 for effective state similarity matching.
Employ both positive and negative feedback for robust learning (LM=2).

Topics

Interpretable Experiential Learning
Global Feedback
Resource-Constrained Computing
OpenAI Gym Atari Breakout
State Transition Graphs

Code references

aigents/aigents-python

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Automation Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.