Article

TextTeacher: What Can Language Teach About Images? featured image

TextTeacher: What Can Language Teach About Images?

Preprint
We use a frozen text encoder on image captions as a lightweight training-time auxiliary objective for image classifiers. The text components are drop.p.ed at inference, leaving a fast, unimodal vision model. Accuracy on ImageNet improves by up to +2.7 p.p. and downstream transfer by +1.0 p.p. on average, outperforming vision knowledge distillation at a fraction of the compute.
avatar
Tobias Christian Nauen
SubZeroCore: A Submodular Approach with Zero Training for Coreset Selection featured image

SubZeroCore: A Submodular Approach with Zero Training for Coreset Selection

arXiv
We introduce SubZeroCore, a novel, training-free coreset selection method that integrates submodular coverage and density into a single, unified objective.
brian-bernhard-moser
PDF
ForAug: Recombining Foregrounds and Backgrounds to Improve Vision Transformer Training with Bias Mitigation featured image

ForAug: Recombining Foregrounds and Backgrounds to Improve Vision Transformer Training with Bias Mitigation

arXiv
We improve the training of vision transformers by segmenting and recombining objects and backgrounds from datasets. This makes the transformers more accurate, as well as more robust.
avatar
Tobias Christian Nauen
Just Leaf It: Accelerating Diffusion Classifiers with Hierarchical Class Pruning featured image

Just Leaf It: Accelerating Diffusion Classifiers with Hierarchical Class Pruning

arXiv
We speed up diffusion classifiers by utilizing a label hierarchy and pruning unrelated paths.
arundhati-s-shanbhag
PDF
Distill the Best, Ignore the Rest: Improving Dataset Distillation with Loss-Value-Based Pruning featured image

Distill the Best, Ignore the Rest: Improving Dataset Distillation with Loss-Value-Based Pruning

arXiv
We improve dataset distillation by distilling only a representative coreset.
brian-bernhard-moser