Distill the Best, Ignore the Rest: A Study in Latent Dataset Distillation on Core-Sets

Abstract

Latent dataset distillation, which exploits pre-trained generative priors, has gained significant interest in recent years because it can be applied agnostic to any distillation algorithm and addresses two significant limitations of classical distillation algorithms: cross-architecture generalization and high-resolution synthesis. However, existing approaches typically distill from the entire dataset, potentially including non-beneficial samples. We introduce a novel “Prune First, Distill After” framework that systematically prunes datasets via loss-based sampling prior to latent distillation. By leveraging pruning before classical distillation techniques and generative priors, we create a representative coreset that leads to enhanced generalization for unseen architectures - a significant challenge of current distillation methods. More specifically, our proposed framework significantly boosts distilled quality, achieving up to a 5.2 percentage points accuracy increase even with substantial dataset pruning, i.e., removing 80% of the original dataset prior to distillation. Overall, our experimental results highlight the advantages of our easy-sample prioritization and cross-architecture robustness, paving the way for more effective and high-quality dataset distillation.

For more information, see the paper pdf.

Citation

If you use this work, please cite our paper:

BibTeX

@inproceedings{moser2025prunedistill,
  author = {Moser, Brian B. and Raue, Federico and Nauen, Tobias C. and Frolov,
            Stanislav and Dengel, Andreas},
  booktitle = {2025 International Joint Conference on Neural Networks (IJCNN)},
  title = {Distill the Best, Ignore the Rest: A Study in Latent Dataset
           Distillation on Core-Sets},
  year = {2025},
  pages = {1-8},
  doi = {10.1109/IJCNN64981.2025.11229108},
}

Abstract

Citation

Authors · 5