Publications

2026

6 papers

arXiv · 2026

The Gentle Collapse: Distributional Metrics for Continual Learning

Ahmed Anwar, Andreas Wagner, Federico Raue, Tobias Christian Nauen, Andreas Dengel

Accuracy alone hides how models forget in continual learning. We introduce six softmax-derived metrics covering rank, confidence, and distributional divergence that expose class-level forgetting patterns invisible to accuracy. Using these as loss weights or replay sampling criteria reduces forgetting by up to 7.7 pp on TinyImageNet over uniform experience replay.

→ project page ↗ pdf

Deep Learning·Continual Learning

arXiv · 2026

OA-CutMix: Correcting the Label Bias of CutMix

Tobias Christian Nauen, Stanislav Frolov, Federico Raue, Brian Bernhard Moser, Andreas Dengel

CutMix assigns labels by patch area, not by visible object content, a systematic bias that mislabels 21.5% of samples and creates ghost labels in 17%. OA-CutMix replaces the label with one derived from object area, leaving the image mixing unchanged. It matches or beats 10+ static and dynamic mixing methods across 4 architectures and 6 datasets.

→ project page ↗ pdf ↗ code ↗ Precomputed Segmentations

Deep Learning·Data Augmentation·Multimodal Data

CVPR · 2026

When Pretty Isn't Useful: Investigating Why Modern Text-to-Image Models Fail as Reliable Training Data Generators

Krzysztof Adamkiewicz, Brian Bernhard Moser, Stanislav Frolov, Tobias Christian Nauen, Federico Raue, Andreas Dengel

We show that newer text-to-image models are progressively worse as training data generators, despite better visual quality, because they collapse to a narrow aesthetic-centric distribution that diverges from real data.

→ project page ↗ pdf ↗ supplementary material

Deep Learning·Diffusion Models

TMLR · 2026

TextTeacher: What Can Language Teach About Images?

Tobias Christian Nauen, Stanislav Frolov, Brian Bernhard Moser, Federico Raue, Ahmed Anwar, Andreas Dengel

We use a frozen text encoder on image captions as a lightweight training-time auxiliary objective for image classifiers. The text components are dropped at inference, leaving a fast, unimodal vision model. Accuracy on ImageNet improves by up to +2.7 p.p. and downstream transfer by +1.0 p.p. on average, outperforming vision knowledge distillation at a fraction of the compute.

→ project page ↗ pdf ↗ code ↗ Precomputed Embeddings

Deep Learning·Knowledge Distillation·Multimodal Data

arXiv · 2026

Hyperspherical Forward-Forward with Prototypical Representations

Shalini Sarode, Brian Bernhard Moser, Joachim Folz, Federico Raue, Tobias Christian Nauen, Stanislav Frolov, Andreas Dengel

We fix Forward-Forward's slow inference by replacing per-class passes with a single forward pass through hyperspherical prototype matching. Thus, we achieve 40× faster inference with competitive accuracy.

→ project page ↗ pdf

Deep Learning·Efficient AI·Forward-Forward

TMLR · 2026

PRISM: Diversifying Dataset Distillation by Decoupling Architectural Priors

Brian Bernhard Moser, Shalini Sarode, Federico Raue, Krzysztof Adamkiewicz, Arundhati Shanbhag, Joachim Folz, Tobias Christian Nauen, Andreas Dengel

We introduce PRISM, a framework that disentangles architectural priors for dataset distillation, outperforming single-teacher setups.

→ project page ↗ pdf ↗ link

Deep Learning·Dataset Distillation

2025

8 papers

arXiv · 2025

SubZeroCore: A Submodular Approach with Zero Training for Coreset Selection

Brian Bernhard Moser, Tobias Christian Nauen, Arundhati Shanbhag, Federico Raue, Stanislav Frolov, Joachim Folz, Andreas Dengel

We introduce SubZeroCore, a novel, training-free coreset selection method that integrates submodular coverage and density into a single, unified objective.

→ project page ↗ pdf

Deep Learning·Coreset Selection

Accepted to ICPR 2026 · 2025

HyperCore: Coreset Selection under Noise via Hypersphere Models

Brian Bernhard Moser, Arundhati Shanbhag, Tobias Christian Nauen, Stanislav Frolov, Federico Raue, Joachim Folz, Andreas Dengel

We present HyperCore, a lightweight adaptive coreset selection framework designed for noisy environments. HyperCore utilizes per class hypersphere models and adaptively selects pruning thresholds.

→ project page ↗ pdf

Deep Learning·Efficient AI·Coreset Selection

ICIP · 2025

When 512×512 is not Enough: Local Degradation-Aware Multi-Diffusion for Extreme Image Super-Resolution

Brian Bernhard Moser, Stanislav Frolov, Tobias Christian Nauen, Federico Raue, Andreas Dengel

We extend pretrained super-resolution models to larger images by using local-aware prompts.

→ project page ↗ pdf ↗ code ↗ doi

Deep Learning·Image Super-Resolution·Diffusion Models

IJCNN · 2025

Distill the Best, Ignore the Rest: A Study in Latent Dataset Distillation on Core-Sets

Brian Bernhard Moser, Federico Raue, Tobias Christian Nauen, Stanislav Frolov, Andreas Dengel

We improve dataset distillation by distilling only a representative coreset.

→ project page ↗ pdf ↗ code ↗ doi

Deep Learning·Dataset Distillation

SIAM · 2025

Stochastic Control with Signatures

Peter Bank, Christian Bayer, Paul Peter Hager, Sebastian Riedel, Tobias Christian Nauen

This paper proposes a new method to parameterize open loop controls in stochastic optimal control problems using path signatures. We show that these controls are dense in the space of all admissible controls and establish conditions for stability of the controlled dynamics and target functional.

→ project page ↗ pdf ↗ code ↗ doi

Rough Paths·Signature Method

arXiv · 2025

ForAug: Recombining Foregrounds and Backgrounds to Improve Vision Transformer Training with Bias Mitigation

Tobias Christian Nauen, Brian Bernhard Moser, Federico Raue, Stanislav Frolov, Andreas Dengel

We improve the training of vision transformers by segmenting and recombining objects and backgrounds from datasets. This makes the transformers more accurate, as well as more robust.

→ project page ↗ pdf ↗ code ↗ dataset ↗ Supplementary Material

Deep Learning·Efficient AI·Data Augmentation

WACV · 2025

Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

Tobias Christian Nauen, Sebastian Palacio, Federico Raue, Andreas Dengel

A comprehensive benchmark and analysis of more than 45 transformer models for image classification to evaluate their efficiency, considering various performance metrics. We find the optimal architectures to use and uncover that model-scaling is more efficient than image scaling.

→ project page ↗ pdf ↗ code ↗ poster ↗ slides ↗ Data Explorer ↗ Supplementary Material

Deep Learning·Efficient AI·Benchmark

Accepted to ICPR 2026 · 2025

A Study in Dataset Distillation for Image Super-Resolution

Tobias Dietz, Brian Bernhard Moser, Tobias Christian Nauen, Federico Raue, Stanislav Frolov, Andreas Dengel

We conduct the first systematic study of dataset distillation for Super-Resolution.

→ project page ↗ pdf

Deep Learning·Dataset Distillation·Image Super-Resolution

2024

3 papers

ICPR (oral) · 2024

TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax

Tobias Christian Nauen, Sebastian Palacio, Andreas Dengel

This paper introduces TaylorShift, a novel reformulation of the attention mechanism using Taylor softmax that enables computing full token-to-token interactions in linear time. We analytically and empirically determine the crossover points where employing TaylorShift becomes more efficient than traditional attention. TaylorShift outperforms the traditional transformer architecture in 4 out of 5 tasks.

→ project page ↗ pdf ↗ code ↗ slides ↗ Appendix ↗ doi

Deep Learning·Efficient AI