| dc.description.abstract |
Despite extensive research into human visual attention, a comprehensive computational model capturing end-to-end free-viewing behavior remains elusive. In this thesis, we address this gap by designing a biologically inspired autonomous agent that reproduces human visual exploration of natural scenes. The agent is built upon two distinct latent-space frameworks: a linear generative Recognition Model for visual processing and a nonlinear Motor Execution Model for proprioceptive saccadic control. Through a developmental analysis of the Recognition Model, we demonstrate that unbiased, random visual exploration is fundamentally required for the emergence of V1 simple-cell-like basis functions, computationally mirroring the biological "critical period" of heightened plasticity. By evaluating the mature agent against human behavioral data, we reveal a dichotomy in visual processing. Saliency analysis indicates that the V1-level generative model is highly robust and sufficient to predict spatial gaze allocation (where humans look) using marginal log-likelihood. However, temporal analysis shows this low-level representation cannot account for fixation durations (how long humans look) . Instead, fixation duration correlates significantly with the model’s residual reconstruction error. Viewed through the lens of predictive coding, this suggests that high-error stimuli necessitate the recruitment of time-intensive, higher-order cognitive processes. Finally, our main sequence analysis demonstrates that the nonlinear Motor Execution Model, utilizing a novel inverse-observation prior framework, successfully reproduces the kinematic trajectories of human saccades. Ultimately, this work provides a rigorous computational framework that successfully bridges early visual development, spatial-temporal attention, and goal-directed motor execution. |
en_US |