Beschreibung
Human visual perception relies on a dynamic interplay between bottom-up and top-down processes. Our visual system selectively enhances important details, maintains spatial relationships, and processes scenes across multiple scales. This is guided by mechanisms such as selective attention and eye movements, which help us integrate fine details into a coherent whole. Despite their importance, these perceptual strategies are not yet fully integrated into computer vision models, even though such models claim to replicate human vision. Aesthetic perception, in particular, depends on the ability to balance fine detail with broader compositional awareness, an ability that current models lack. Traditional computer vision pipelines rely on fixed-size inputs obtained through resizing or cropping, which disrupts critical spatial and compositional information. This results in the loss of fine details and a reduced capacity for aesthetic assessment. In this talk, I will present CHARM, a novel preprocessing method designed to enhance Vision Transformers (ViTs) by preserving Composition, High-resolution details, Aspect Ratio, and Multiscale information. Inspired by human vision, CHARM selectively retains high-resolution details in key regions while downscaling less relevant areas, avoiding the need for arbitrary cropping or aspect ratio distortion. This allows models to capture richer contextual and compositional cues, improving both performance and generalization in image aesthetic assessment. By incorporating human-like perceptual strategies, CHARM enhances ViTs’ ability to process images in a way that aligns with human visual efficiency, reinforcing the importance of detail preservation, spatial integrity, and multiscale processing in both artificial and biological vision systems. Fatemeh Behrad| Zeitraum | 28 Okt. 2025 |
|---|---|
| Veranstaltungstyp | Seminar |
| Ort | leuven, BelgienAuf Karte anzeigen |