Logo image
Wallpaper Group Symmetry Benchmark (ImageNetSYM, NoiseSYM, AtomSYM)
Dataset

Wallpaper Group Symmetry Benchmark (ImageNetSYM, NoiseSYM, AtomSYM)

Yichen Guo and Joshua Agar
19 Feb 2026
url
https://doi.org/10.5281/zenodo.18703668View
Open

Abstract

Wallpaper Group Symmetry Benchmark (ImageNetSYM, NoiseSYM, AtomSYM) Overview This dataset collection provides a large-scale benchmark for symmetry recognition in computer vision and scientific machine learning. It is designed to systematically evaluate whether modern deep learning architectures can internalize two-dimensional crystallographic symmetry as a transferable geometric abstraction rather than as domain-specific visual heuristics. The benchmark consists of three procedurally generated datasets: ImageNetSYM NoiseSYM AtomSYM Each dataset is constructed around the 17 wallpaper group symmetries and together they contain over 10 million labeled images. The primary goal of this benchmark is to enable rigorous in-domain and cross-domain evaluation of symmetry recognition and generalization across distinct visual modalities. Scientific Motivation Symmetry governs structure–property relationships across physical systems in materials science, condensed matter physics, chemistry, and crystallography. Despite the rapid adoption of deep learning models in scientific domains, current architectures often fail to capture symmetry as a fundamental geometric concept. Instead, models tend to rely on texture statistics, color cues, or dataset-specific visual patterns. This benchmark was developed to: Quantify symmetry recognition performance Test cross-domain generalization Identify architectural limitations Provide a foundation for symmetry-aware model development The datasets are designed to decouple symmetry structure from visual appearance, forcing models to confront the underlying geometric transformations. Dataset Components 1. ImageNetSYM ImageNetSYM consists of natural image textures transformed to obey one of the 17 wallpaper group symmetries. Base images are procedurally tiled and symmetrized to enforce exact group operations including rotations, reflections, glide reflections, and translations. Purpose: Test symmetry recognition in naturalistic visual domains Evaluate reliance on texture and semantic content 2. NoiseSYM NoiseSYM contains purely synthetic, noise-based patterns generated algorithmically and symmetrized according to the 17 wallpaper groups. Purpose: Remove semantic cues Isolate geometric symmetry recognition Provide a texture-agnostic evaluation setting 3. AtomSYM AtomSYM contains atomistic lattice-like renderings designed to mimic crystalline structures. These images are generated using procedural atomic motif placement consistent with wallpaper group operations. Purpose: Bridge abstract symmetry and materials-inspired representations Evaluate relevance to crystallographic and materials science workflows Structure and Organization Each dataset follows a hierarchical directory structure with: - 17 classes corresponding to the 17 wallpaper groups- Data with rich metadata- Balanced class distributions- Procedurally generated samples with controlled randomness Scale Total images across the complete dataset collection: - 10,000,000+ Due to file size limitations of the hosting platform, the files deposited here represent a curated portion of the full benchmark.  Researchers who require access to the full dataset may also contact the authors directly to arrange data transfer. Benchmark Tasks This benchmark supports: In-domain classificationTrain and evaluate within the same dataset Cross-domain generalizationTrain on one dataset, evaluate on another Scaling studiesEvaluate performance as a function of dataset size Attention and feature analysisStudy learned symmetry representations using: Attention maps Confusion matrices Feature embedding analysis Intended Use This dataset is intended for: Machine learning researchers studying geometric inductive biases Materials informatics researchers Computer vision researchers Crystallography and symmetry modeling studies Benchmarking symmetry-aware architectures It is particularly suited for testing: CNNs such as ResNet-50 Multi-scale architectures such as Feature Pyramid Networks Transformer-based models including Cross-Covariance Image Transformers Equivariant or symmetry-aware neural networks Key Findings Enabled by This Dataset Using this benchmark, we observe: High in-domain classification accuracy across architectures Significant degradation in cross-domain performance Improved robustness from global attention mechanisms Persistent failure to encode symmetry as a fully transferable abstraction These findings highlight the need for explicit geometric priors and symmetry-aware model design. Data Generation All datasets are procedurally generated to ensure: Exact enforcement of wallpaper group operations Reproducibility Controlled random seeds Balanced class sampling Absence of labeling noise Generation scripts are included in the associated repository when applicable. Limitations Synthetic generation may not capture all real-world symmetry imperfections Models trained on these datasets may still rely on statistical shortcuts Recognition does not imply physical understanding

Metrics

1 Record Views

Details

Logo image