§ 02 — publications

Publications.

Research on efficient transformer models — knowledge distillation, multimodal vision–language learning, and the quality of training data.

17 papers 18 co-authors 4 research areas ↗ Google Scholar
Type
Topic
2026
4 papers
TextTeacher: What Can Language Teach About Images? figure
Preprint · 2026

TextTeacher: What Can Language Teach About Images?

Tobias Christian Nauen, Stanislav Frolov, Brian Bernhard Moser, Federico Raue, Ahmed Anwar, Andreas Dengel

We use a frozen text encoder on image captions as a lightweight training-time auxiliary objective for image classifiers. The text components are dropped at inference, leaving a fast, unimodal vision model. Accuracy on ImageNet improves by up to +2.7 p.p. and downstream transfer by +1.0 p.p. on average, outperforming vision knowledge distillation at a fraction of the compute.

When Pretty Isn't Useful: Investigating Why Modern Text-to-Image Models Fail as Reliable Training Data Generators figure
Accepted to CVPR 2026 · 2026

When Pretty Isn't Useful: Investigating Why Modern Text-to-Image Models Fail as Reliable Training Data Generators

Krzysztof Adamkiewicz, Brian Bernhard Moser, Stanislav Frolov, Tobias Christian Nauen, Federico Raue, Andreas Dengel

We show that newer text-to-image models are progressively worse as training data generators, despite better visual quality, because they collapse to a narrow aesthetic-centric distribution that diverges from real data.

2025
7 papers
Stochastic Control with Signatures figure
SIAM · 2025

Stochastic Control with Signatures

Peter Bank, Christian Bayer, Paul Peter Hager, Sebastian Riedel, Tobias Christian Nauen

This paper proposes a new method to parameterize open loop controls in stochastic optimal control problems using path signatures. We show that these controls are dense in the space of all admissible controls and establish conditions for stability of the controlled dynamics and target functional.

ForAug: Recombining Foregrounds and Backgrounds to Improve Vision Transformer Training with Bias Mitigation figure
arXiv · 2025

ForAug: Recombining Foregrounds and Backgrounds to Improve Vision Transformer Training with Bias Mitigation

Tobias Christian Nauen, Brian Bernhard Moser, Federico Raue, Stanislav Frolov, Andreas Dengel

We improve the training of vision transformers by segmenting and recombining objects and backgrounds from datasets. This makes the transformers more accurate, as well as more robust.

Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers figure
WACV 2025 · 2025

Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

Tobias Christian Nauen, Sebastian Palacio, Federico Raue, Andreas Dengel

A comprehensive benchmark and analysis of more than 45 transformer models for image classification to evaluate their efficiency, considering various performance metrics. We find the optimal architectures to use and uncover that model-scaling is more efficient than image scaling.

2024
4 papers
TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax figure
ICPR 2024 (oral) · 2024

TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax

Tobias Christian Nauen, Sebastian Palacio, Andreas Dengel

This paper introduces TaylorShift, a novel reformulation of the attention mechanism using Taylor softmax that enables computing full token-to-token interactions in linear time. We analytically and empirically determine the crossover points where employing TaylorShift becomes more efficient than traditional attention. TaylorShift outperforms the traditional transformer architecture in 4 out of 5 tasks.

2022
1 paper
2021
1 paper