Paper-Conference

When 512×512 is not Enough: Local Degradation-Aware Multi-Diffusion for Extreme Image Super-Resolution featured image

When 512×512 is not Enough: Local Degradation-Aware Multi-Diffusion for Extreme Image Super-Resolution

ICIP 2025
We extend pretrained super-resolution models to larger images by using local-aware prompts.
brian-b.-moser
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers featured image

Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

WACV 2025
A comprehensive benchmark and analysis of more than 45 transformer models for image classification to evaluate their efficiency, considering various performance metrics. We find the optimal architectures to use and uncover that model-scaling is more efficient than image scaling.
TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax featured image

TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax

ICPR 2024 (oral)
This paper introduces TaylorShift, a novel reformulation of the attention mechanism using Taylor softmax that enables computing full token-to-token interactions in linear time. We analytically and empirically determine the crossover points where employing TaylorShift becomes more efficient than traditional attention. TaylorShift outperforms the traditional transformer architecture in 4 out of 5 tasks.
avatar
Tobias Christian Nauen