We have trained monkeys to perform a feature conjunction search task for color and motion and have recorded from neurons in area MT during the performance of this task. In order to put the experimental results into a theoretical context, we have developed a system-level model of visual processing incorporating several attentional mechanisms known to function in mammalian visual systems. A reinforcement learning (temporal difference) algorithm was employed to replicate the learning process in monkeys. The model learned to perform the feature-conjunction search task with performance closely resembling that of human and monkey conjunction search. The model builds on the notion of two visual streams: The temporal visual stream, crucial for object recognition, exerts top-down influences on early visual representations; these influences (feature-specific attention) prime feature detectors to bias their sensitivity towards object features of interest (feature selection). The parietal (dorsal) visual stream, known to be involved predominantly in spatial vision (coordinate transformations for various actions), exerts top-down spatial selection on the feature maps. Both feature and spatial selection processes bias bottom-up activation of feature maps so that both stimulus salience and the behavioral goals are reflected in the resulting saliency map (Koch & Ullman, 1985) which determines the site of information readout into memory.