Deep Learning

When Pretty Isn't Useful: Investigating Why Modern Text-to-Image Models Fail as Reliable Training Data Generators featured image

When Pretty Isn't Useful: Investigating Why Modern Text-to-Image Models Fail as Reliable Training Data Generators

arXiv
We show that newer text-to-image models are progressively worse as training data generators, despite better visual quality, because they collapse to a narrow aesthetic-centric distribution that diverges from real data.
krzysztof-adamkiewicz
PDF
PRISM: Diversifying Dataset Distillation by Decoupling Architectural Priors featured image

PRISM: Diversifying Dataset Distillation by Decoupling Architectural Priors

arXiv
We introduce PRISM, a framework that disentangles architectural priors for dataset distillation, outperforming single-teacher setups.
brian-bernhard-moser
PDF
SubZeroCore: A Submodular Approach with Zero Training for Coreset Selection featured image

SubZeroCore: A Submodular Approach with Zero Training for Coreset Selection

arXiv
We introduce SubZeroCore, a novel, training-free coreset selection method that integrates submodular coverage and density into a single, unified objective.
brian-bernhard-moser
PDF
HyperCore: Coreset Selection under Noise via Hypersphere Models featured image

HyperCore: Coreset Selection under Noise via Hypersphere Models

arXiv
We present HyperCore, a lightweight adaptive coreset selection framework designed for noisy environments. HyperCore utilizes per class hypersphere models and adaptively selects pruning thresholds.
brian-bernhard-moser
PDF
When 512×512 is not Enough: Local Degradation-Aware Multi-Diffusion for Extreme Image Super-Resolution featured image

When 512×512 is not Enough: Local Degradation-Aware Multi-Diffusion for Extreme Image Super-Resolution

ICIP 2025
We extend pretrained super-resolution models to larger images by using local-aware prompts.
brian-b.-moser
ForAug: Recombining Foregrounds and Backgrounds to Improve Vision Transformer Training with Bias Mitigation featured image

ForAug: Recombining Foregrounds and Backgrounds to Improve Vision Transformer Training with Bias Mitigation

arXiv
We improve the training of vision transformers by segmenting and recombining objects and backgrounds from datasets. This makes the transformers more accurate, as well as more robust.
avatar
Tobias Christian Nauen
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers featured image

Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

Presentation at WACV 2025 on a large-scale benchmark of 45+ transformer models for image classification, evaluating accuracy, speed, and memory efficiency.
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers featured image

Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

WACV 2025
A comprehensive benchmark and analysis of more than 45 transformer models for image classification to evaluate their efficiency, considering various performance metrics. We find the optimal architectures to use and uncover that model-scaling is more efficient than image scaling.
A Study in Dataset Distillation for Image Super-Resolution featured image

A Study in Dataset Distillation for Image Super-Resolution

arXiv
We conduct the first systematic study of dataset distillation for Super-Resolution.
tobias-dietz
PDF
TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax featured image

TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax

ICPR 2024 (oral)
This paper introduces TaylorShift, a novel reformulation of the attention mechanism using Taylor softmax that enables computing full token-to-token interactions in linear time. We analytically and empirically determine the crossover points where employing TaylorShift becomes more efficient than traditional attention. TaylorShift outperforms the traditional transformer architecture in 4 out of 5 tasks.
avatar
Tobias Christian Nauen