Separation Power of Equivariant Neural Networks

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

This paper presents a theoretical framework to analyze the separation power of equivariant neural networks employing point-wise activations. It provides an explicit, recursive formula to characterize inputs indistinguishable by a network with a fixed architecture. A key finding is that all non-polynomial activation functions, including ReLU and sigmoid, are equivalent in terms of expressivity and achieve maximal discrimination capacity, provided intermediate layers have complete bias. The framework simplifies separation power assessment to evaluating minimal representations, which are shown to form a hierarchy corresponding to subgroups of the symmetry group. This work introduces the "twin network trick" to convert separation problems into zero locus problems, offering a precise method to understand architectural influence on network expressivity.

Key takeaway

For AI Scientists designing equivariant neural networks, you should understand that your choice of non-polynomial activation function, such as ReLU or sigmoid, does not affect the network's maximal separation power. Instead, prioritize architectural decisions around representation types, as these form a hierarchy directly influencing separability. Ensure your intermediate layers maintain complete bias to achieve this maximal discrimination capacity. This insight allows you to simplify activation function selection and focus on representation design for optimal expressivity.

Key insights

Non-polynomial activations in equivariant networks offer equivalent, maximal separation power, simplifying architectural choices.

Principles

Non-polynomial activations provide equivalent, maximal separation power.
Equivariant network separation power forms a hierarchy based on representation type.
Intermediate layer multiplicity does not impact separability.

Method

The "twin network trick" converts network separation problems into zero locus problems, which are then solved using a recursive formula over network depth.

In practice

Any non-polynomial activation function provides maximal separation.
Decompose complex hidden representations into minimal factors.
Match representation type to desired separation power hierarchy.

Topics

Equivariant Neural Networks
Separation Power
Activation Functions
Neural Network Expressivity
Representation Theory
Architectural Design

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.