feature binding in human visual working memory

Posted comment on ´Neural architecture for feature binding in visual working memory` by S.Schneegans and P.M. Bays and published in Journal of Neuroscience April 2017 vol 37 (14) page 3913 doi.org/10.1523/JNEUROSCI.3943-16.2017


Schneegans and Bays propose in their article that populations of interacting neurons are required for feature binding in visual working memory and that non-spatial features are related to location by simultaneous conjunctive coding.

Their experiments involved cued recall tasks where subjects memorised object arrays composed of simple visual features eg. colour, orientation and location. Having learnt the items and after a short delay, the subjects were given one feature of the event and had to report on one or both of the other features. Swap errors (where the subject reported an item other than the one indicated by the cue) were determined as binding failures. In Experiment 1,  8 participants with normal or corrected- to-normal vision performed a cued recall task testing their memory for binding of colour and location. The stimuli were presented at a refresh rate of 13HZ suitable for learning. Each trial began with the presentation of a white fixation cross against a black background to maintain subject concentration and gaze fixation. This was then followed by a sample array of 6 coloured discs presented for 2secs each at random locations on the fixation cross. The display then disappeared for 1 sec and then one of the discs chosen randomly was presented on the display. In the report-location test then the subjects were asked to move the disc using a dial to the matching location. In the report-colour test, then the subjects were presented with a location and used the dial to select the matching colour. The responses were not timed. In Experiment 2, the same participants performed the cued recall task, but in this case with 2 responses so that the simultaneous presentation of more than one feature could be investigated. In this experiment, the subjects were presented with a sample array consisting of six bars with randomly chosen colours, orientations and locations. For the orientation-cue test the subjects were presented with a random orientation and had to use a dial to match colour and location. For the colour-cue test then the subjects were shown a colour disc and had to match the orientation and location using a single input dial.

Stimulus features were analysed and reported according to circular parameter space of possible values in radians. To allow easier comparison of the results then orientation values were scaled to cover the same range of colour and angular location. The recall error for each trial was calculated using angular deviation reported in comparison to the true value. Recall variability was measured using circular SD. To measure the distribution of deviations of reports expected by chance a randomisation method for each subject and condition was used where deviations of non-target features values from target feature values were randomly mixed and the deviations of corresponding responses were recorded. An estimate of chance distribution was calculated from over 1000 repetitions and this value was subtracted from the observed response frequencies.

In the case of Experiment 2, tests were additionally classified according to whether the spatial response was directed to the target (a spatial target test) or one of the non-target items (a spatial swap test). Therefore, neural population models were only fitted to spatial responses and for each test the probability that each item had been selected for spatial response generation given the response location was calculated. Only spatial target tests or spatial swap tests giving a probability of 75% were analysed further.

The population-coding model used by Schneegans and Bays in this investigation was an extension of that proposed by Bays in 2014 for memorising individual feature values. It was assumed that during the presentation of the array the memory features of each item were encoded in the activity of a particular population of neurons and therefore, the relationship between the feature and the mean neuron firing rate reflected the neuron`s preferred feature value and a tuning function which the authors assumed was normal. In recall, the memorised feature values were assessed according to the maximum likelihood of decoding which meant that the decoder observed the activity of the neuron population and reported the feature value that was most likely attributed to that particular pattern of activity. Recall errors were explained by random noise in neural activity that caused deviations from the encoded and decoded feature values. The authors assumed that the total neuronal activity was constant over the changes in the amount of information encoded and therefore, larger memory arrays produced fewer spikes for each item`s features and resulted as expected in poorer recall performance.

Using the population coding model, decoding from the neural population activity reproduced quantitative details of error distributions in the cued recall tasks. Distributions showed specific deviations from normality including increased proportion of large deviations from the memorised value accounting for response errors that could have represented guesses. It was found that the proportion of large errors increased as the number of spikes per item decreased which was explained as due to the higher set sizes. For feature binding then population codes were considered for feature conjunctions so that each neuron population had a preferred value and associated tuning curve associated with 2 features. Items that were memorised had separate population activity computed on the item`s feature combination and modulated by random noise. In the cued recall test then the decoded cue feature value closest to the given cue selected was considered the response.  The population coding model was also extended to accommodate binding of multiple visual features by combining several conjunctive population codes representing two features. The authors considered two extensions to the models to explain multiple binding features: they considered the direct-binding model where one conjunctive population existed for each pair of feature dimensions and therefore, the results would explicitly show binding between two features; and also they considered the spatial-binding model where one feature dimension (in this case location) takes a leading role in binding all other features together.

Mathematical analysis of the results obtained from both Experiment 1 and 2 was carried out to calculate the mean firing rates, normalisation of the neuron firing rates, tuning functions, and response probabilities. Analysis of the applied model was simplified by the authors who used rate coding with additive Gaussian noise, did not use the ´palimpsest` model instead computing an explicit distribution of response probabilities for distribution for each trial. The results were fitted to different models. With the colour-location task of Experiment 1 then two versions of the model were considered: the joint model (assumed that a single neuronal population is used to generate responses in both task conditions) or the independent model (assumed that there are two separate neuronal populations which give 6 parameters). Two models were also considered for Experiment 2: the joint model for colour location binding; and the direct binding model (values are assumed to be generated independently from the cue using one population code for one feature associated with location and the second cue to associate cue feature with report feature).  Spatial swap errors were assessed using a reduced model fitting only to the spatial responses from Experiment 2 linking cue feature (colour or orientation) to location.

The results of Experiment 1 investigating colour-location binding showed that there was binding between non-spatial and spatial features and that the memory representation was a single neuron population as shown by successful recall of the feature. The authors found that even though there were substantial differences in the shapes of the diagrammatic representations there was no significance in variability as measured by SD. This indicated that the responses were not noisy estimates of the target. A central tendency in the distribution plots indicated the presence of swap errors (the non-target was reported in place of a target) with the report-location condition showing a strong tendency with the report-colour distribution weaker, but still significant. This indicated to the authors that the swap errors were more common in recalling the locations than recalling the colours. The deviation of location estimates from a non-target`s location varied according to the similarity of the report colour to the colour of the target. Results were consistent with swap errors with the deviation significantly lower than chance, but only when target and non-target had similar colours. The deviation of colour estimates from the non-target colour varied with similarity of that non-target`s location to the location of the target. Therefore, the authors reported swap errors in the report-colour condition only for non-targets that were similar to the target in the cue feature ie. very close together in space. When the authors fitted their results to the neural population model they found that the joint model (ie. a single population for both conditions of the task by changing only the feature dimension either colour or space that takes the role of the cue feature and the feature dimension that takes the role of the report feature) had a better fit, but was not statistically significant for all subjects. The fit also indicated that swap errors were present – about 16% of trials for report-colour and 51% for report-location.

The results for Experiment 2 showed that in both conditions the error distribution for location had a significantly lower SD than for either non-spatial feature with colour producing the better result indicating that colour cues were more effective for reporting location than orientation. The results for colour compared favourably to those obtained in Experiment 1 which indicated that the additional demand of learning orientation did not significantly interfere with recall performance. The pronounced central peak in the figures representing response deviations indicated the occurrence of swap errors. These swap errors were found to be linked to similarity between the non-target and the target of the cue feature and the range of cue features values for which swap errors were reported was comparable to the reporting of the spatial and non-spatial cues. When the results were computed and applied to the models it was found that the spatial-binding model (ie. neural populations representing colour-location or orientation-location binding) provided the better fit for the observed results than the direct binding model. In this model, the spatial response is generated directly from the given cue and then this is used to estimate item location which is used as a cue to generate the non-spatial response. Two types of swap errors were observed to occur: when the spatial response was selected based on the cue information (same as observed in Experiment 1 for report-location task) which occurred 27% of the time; and secondly, the more common error when the estimated spatial location was used to select the memorised item for the non-spatial response (49% for colour-cue trials and 55% affected in orientation –cue trials). The higher percentage observed for orientation reflected, according to the authors, its general lower effectiveness to be used as a cue. Again fitting to the model, the authors found that the neural tuning curves for the spatial dimension were significantly higher than for both colour and orientation which also produced significant values between each other. This supported the observation of a higher proportion of swap errors for orientation than colour.

The authors also investigated the error distributions for both experiments and models since they found that nearly identical fits were achieved for both models. They stated that the differences between the two models were related to the pattern of swap errors across the two predicted responses. In the direct binding model, a swap error on spatial response had no effect on the response for a non-spatial feature and vice versa. However, in the spatial binding model, a swap error in the spatial response meant that the location of the selected non-target item would have been used to generate the response for the non-spatial response. Therefore, the non-spatial response should be centred on the feature value of the non-target at the selected location rather than the target. The mechanism predicts a strong correlation between swap errors and then absolute response errors in spatial and non-spatial responses. To examine this, the authors determined a Pearson`s product-moment correlation coefficient for the absolute response errors for all responses across the trials for all subjects and found that the correlation coefficients calculated showed that predictions of the spatial binding model were met. They did note however, that the results were reproduced only when tests were analysed with the non-spatial response produced first and the spatial response second. This indicated that the spatial response did not force the selection of the memorised item before the non-spatial response was initiated.

Schneegans and Bays concluded that their results showed that the model for feature binding that combined neural population representations with conjunctive coding was correct. Other studies showed that recall errors were as a result of noisy neural activity. The authors in their experiments saw swap errors both in cue and report features eg. a non-target item was judged as one most similar to the cue and attributed these swap errors to decoding noisy activity in their observed active neural populations. The errors seen in Experiment 1 either for the spatial responses with sharp distributions around the target location combined with a large proportion of swap errors, or wider distributions with fewer swap errors in case of the colour responses, were attributed to the different widths of neural tuning curves for the two features investigated.

The authors then used models (direct binding and spatial binding) to compare alternative mechanisms for binding non-spatial features and found behavioural results were fully consistent with the spatial binding model where non-spatial features eg. colour and orientation are bound exclusively via their shared location with no indication of direct binding between them.  This showed that location plays a special role in feature binding and that learning of the non-spatial feature occurs with the location. They also showed that it is only possible to recall one feature directly via the shared location of the second. Schneegans and Bays conclusions supported work by Nissen in 1985 on perception and also studies which showed that spatial attention is engaged when retrieving items from working memory even when cued by non-spatial features. Schneegans and Bays also went on in their discussion to compare favourably their results with the feature binding model for neural populations with conjunctive coding proposed by Matthey et al 2015. Differences between the two theories were explained by the fact that Matthey had not looked at spatial and non-spatial responses. The authors also ruled out other alternative models that were based on low error correlations with spatial cues and concluded their discussion with supporting evidence for their spatial binding model.

Therefore to summarise, Schneegans and Bays in their experiments showed that feature binding in visual working memory is satisfied by a model of neural population activity with conjunctive coding. Their investigations showed that binding of non-spatial features with location occurs and binding of one non-spatial feature to another occurs only via binding to a shared location.


What makes this article interesting is that it explores how visual information is linked together in human visual memory.  Schneegans and Bays found that the colour feature and/or orientation feature of a human visual event is stored and recalled with their respective locations in visual memory. Hence, it is tempting to compare the findings obtained here to the required binding of object information to location information for the more commonly carried out spatial memory experiments of mice. However, before we make any definitive conclusions we should think about what the experiments carried out by Schneegans and Bays actually mean. Schneegans and Bays subjects were presented with a definitive shape (the fixation cross) of a particular ´colour` (white) against a black background. This would in spatial memory experiments be classed as the object albeit the basic one. On the cross shape were placed either discs of colour or bars orientated in different directions and therefore, these would be treated by the subjects as mere additions to the basic shape common to all trials. Therefore, the information learnt would be a common shape with each one having a slightly different ´pattern`. This would correlate to the spatial memory experiments where the objects learnt are shapes (2D or 3D) comprising of visual features which may or may not include colour and different patterns. The objects of Schneegans and Bays experiments are naturally of much lower complexity.

The other piece of information stored with the colour and/or orientation according to Schneegans and Bays was the location. The word ´location` normally relates to a physical place or space and in terms of the mice spatial memory experiments it does actually mean the place where the object is because this is how the mice remember their way around the test maze. In this case, the location is for movement and this is the same if we were talking about moving to grasp something or stepping over something to avoid it. However, in Schneegans and Bays experiments the location is actually the place where the experimental colour disc or bar has been added. Therefore, the location is relative to the shape and outline of the cross and is more a location for object placement rather than that for movement described above. (Even if the fixation cross was not used then the location would be fixed relative to the boundaries of the screen and would still be classed as a location for object placement.) Therefore, again location in Schneegans and Bays experiments is of a lower complexity than that recorded and recalled in the mice spatial memory experiments.  We are not disputing that colour and/or orientation features are recorded with location just that the experiments carried out require the storage and recall of features much less complicated and of a different nature than those recorded in a known system already in place for successful spatial memory and associated with the word location.

Now, that we can see that human visual pathways and memory binds colour and/or orientation features to location we can look to see why and which mechanisms bring this about. Whereas location is bound to an object in the consideration of movement (eg. to achieve reaching and grabbing) colour and shape are bound to location or placement to give an object form and detail complexity. This is required cognitively directly for better perception and object recognition and indirectly for better higher order cognitive functions like creativity and decision-making. The characteristics of visual memory require features such as colour, size, shape, and location with binding (´chunking`) of this material into advantageous groups. The characteristics are bound together in time, but not necessarily in place meaning that many different brain areas and sensory systems may be involved and it is likely that stronger firing of particular features means that these take the role of the reference point or cue in later cognitive tasks such as helping in recall.

Neurochemical systems have to fulfil the demands of the brain`s required functions and for human visual input and memory this is no different. Just like with other forms of memory, the system relies on firing activity of appropriate neuronal cells. The general biochemical mechanisms involved rely on neurotransmitter release and neuronal pathways  of multiple neuronal cells linked by their axons and dendrites. The firing activity occurring at any point in time can form a neuronal cell assembly (Schneegans and Bay neural population) and this would represent the new input as well as learnt information recovered by reactivation of cells whose physiology has been altered by previous events. The synchronicity of firing of these neuronal cell assemblies provides the conditions for the binding of event characteristics in time. Most research suggests that cells represent specific features, but it has been suggested that some cells represent multiple features (eg. the place cells in the hippocampus that appear to represent both location and event features in spatial memory) or multiple cognitive functions (eg. Messenger`s cells involved in both working memory and attention). Although the idea would be logical (a cell that fires representing two features would be an example of energy conservation) there is no evidence of this type of capability for visual features eg. the cells forming the retinotopic map only represent single event features. However, brain areas have multiple layers with specific cells all having many dendrites and axons. It is more likely that in the case of location, that the feature is represented by the firing ´strand` and the location feature by where that strand ends in the relevant brain area, eg. a red rope tied to a post means red represents the colour and the post the location of its end-point. Binding of the information therefore, falls to the responsibility of brain area functioning and in the case of visual information this appears to mean firing from the hippocampus, an area known as a relay station with important roles in informational input, working memory and attention, to the lateral entorhinal cortex for the selectivity process of object information and the medial entorhinal cortex for spatial memory. The frequency of firing also relates to the functioning of these areas with regards to event characteristics with beta brain wave synchronicity in the former for encoding the information and between the lateral entorhinal cortex and medial prefrontal cortex for recall. Theta brain wave synchronicity is also observed between the medial entorhinal cortex and hippocampus in the case of spatial memory.

Although binding will bring event characteristics together in time, not all incoming features are stored for later use. We have already discussed relevance of features to tasks as seen above with reference points and in general since different forms of neuronal cell assemblies are produced during the memory process and so only the strongest firing representing the strongest or most relevant feature would survive. It should also be remembered however, that unattended information (ie. information of which the subject is not aware) could also be included. However, in the case of this experiment this is unlikely to occur since the target of focus is on a very small amount of visual information and the task is simple. The neuronal cell assemblies formed in Schneegans and Bays experiments also do not have to undergo the physiological changes associated with transforming short term memories into long-term memories since there is only a short delay between the input and recall (1 second) and with no interference and therefore, sustained activation to transform temporary firing populations into more permanent stores is not required. The experimental set-up also means that the neuronal cell assembly is not altered by working memory intervention since the features require no supplementary processing such as decision-making. This means essentially ´what is seen is what is learnt` (formation of sensory stores and short-term memory stores) and remembered (recall without processing). There is also limited emotional pathway involvement since there is definitive reward offered by successful recall only and an indirect positive emotional status reflecting self-satisfaction at the personal performance.

The above paragraphs describe the general neurochemical mechanisms associated with neuronal firing and input and recall of information on which the studies of Schneegans and Bays are based. However, the selectivity of the event information ie. colour, orientation and location rely on the firing of specific neuronal pathways associated with visual characteristics and leading from the basic eye (the conduit from the external environment to the internal) to the higher order brain areas associated with the more complex visual characteristics. The initial experimental set-up of Scheegans and Bays ie. the visualisation of the fixation cross evokes retinal firing relating to the object`s shape and outline (the role of the retinal rods) and colour (the cones). The pathway firing occurs in a forward sweep from the lower hierarchical areas  eg. the retina to the highest order areas that of the  visual cortices and this firing is neurotransmitter based, has particular brain wave frequency and exhibits saccades where firing is paused in cells due to cellular exhaustion and subsequent replenishment. Therefore, information received higher up in the hierarchy appears in ´bursts`.

Activation therefore, of the visual pathway hierarchy then provides the individual with the attributed features of form, shape and movement. Location is not recognised at the lower levels, but the characteristics of depth and size are. Therefore, these and movement perception give rise at higher levels to the event characteristic of location. In the case of movement and hence location, the hierarchy followed in the forward sweep of firing involves the retinal rods, bipolar cells (gives edges), M ganglion cells, LGN magnocellular pathway and parvointerblobs (gives orientation, movement and contrast) leading to visual cortex V1 magnocellular pathway (both orientation and direction sensitive) leading to the higher V2 area, then V3, V5 middle temporal lobe leading to eventually the MST which is known for linear motion. Whether all of the visual pathway for movement is involved in the input and perception of location in these experiments was not investigated, but we can assume that it follows the regime given above for other objects. The firing patterns observed relate to the models attributed to visual perception. The WHERE model (or perception-action model) relates to movement perception and location and correlates as expected to the involvement of brain dorsal areas (the V1 to the parietal lobe and medial temporal lobe). Milner and Goodale described this dorsal pathway as providing short-lasting and view point independent events which satisfies the results of Schneegans and Bays` experiments which showed that orientation as an descriptive event feature appeared to be less important than colour. This is understandable if we consider the example of a cup which we still recognise as a cup whether it is viewed from one side, the other or even upside down.

Therefore, we can say that the neurochemical quality that allows events to be perceived is the binding of its characteristics that are perceived by the systems available eg. in the case of human visual system colour, shape and movement. If we consider that only the colour characteristic of an event can be perceived, stored and recalled we can see that the capability of object recognition would be extremely hampered. Therefore, binding of characteristics simultaneously presented into a single population of firing neurons which may cover many systems is ideal and increases considerably the chances of recognition at a later date. Schneegans and Bays have therefore, provided by their experiments further evidence that in the case of the human visual system this at least occurs between colour and location and even, colour, orientation and location. However, nothing is ever that simple and so it also is the case with feature binding for human visual systems. There are at least three examples where the binding of location and event features are more complicated and these are:

  • The case of sequences. In this case, the event characteristic of location would vary with time and neurochemically, this can be portrayed by event content changing, but where location is replaced by the event having a particular order. Natural saccades give firing ´black outs` so that sequences of events are represented by a series of neuronal cell assemblies that portray a majority of features that are the same and a smaller number of features that are slightly changed. Binding of the features together to form the single population representing a point of time or order would be brought about by the firing between different parts of the hippocampus and the entorhinal cortex. For example, the input of depth and size information of the visual system would mean that movement of an object is accounted as well as support from changes in colour density and contrast. Therefore, the hippocampus would coordinate firing with the lateral entorhinal cortex for the object information and the medial entorhinal cortex for the location.
  • The case of expected location such as that seen with facial recognition. In this case, the location of features is already known since each face has features assumed to be in the same places for everyone. Variation of this placement may be slight eg. the width between the eyes or the face shape, but the characteristic that is most likely to change between individuals is colour. Therefore, the neuronal cell assembly has to assign colours to known locations and the face is remembered essentially as a ´whole`. This is why the details are less likely to be remembered individually unless changed or reminded by external prompts. Facial recognition requires a number of specific neuronal pathways including the fusiform gyrus which is known for face processing.
  • The case of feature exchange such as that seen with ´hybrid` images (eg. where the images of two famous people are swapped as the individual looks at the same location). In this case, it appears that the location is assigned different event features that appear linked to time. This is in fact an illusion and results from the visual system`s neurochemical capabilities. It is probably due to the visual features of one image causing firing of the appropriate neuronal cell assembly leading to recognition, but as that firing dies away because of the firing cells refractory period, firing of cells at millimetre distances away will start due to the visual system rule of priority given to firing of unattended cells. Therefore, even though the appearance of the second image occurs seemingly in the same external location and each location then appears to carry the feature of two images, there are differences in internal location of the firing cells.

Therefore, to conclude binding of visual features of an event is important and we can use this knowledge to promote learning and recall. It is for this reason that we have to be careful about how we experiment and what interpretations are made from the results we get particularly in the case of human beings. Schneegans and Bays carried out a series of experiments which on the surface appear to link one event characteristic to another, but only further examination of the details of the experiment show that this interpretation is not as clear-cut as first envisaged. Therefore, any experiments involving human subjects should be considered from all angles and appropriate controls put in place so that any results can be correctly interpreted.

Since we are talking about the topic …………………………..

…..can we assume that if in the experiment`s delay between feature presentation and recall the individuals were exposed to some form of visual interference the recall performance would be adversely affected and this adverse effect would be greater if it involved colour or location?

…..the cross modal effect is said to increase the amount of information a person can remember at any one time due to different sensory pathways being in play. Therefore, if auditory stimuli were included with the visual cue and the experiment repeated would we see no change with the performance of recalling location and colour/orientation, but if the sound was used as a cue would an observed difference reflect a preference between sound information and either colour or location?

…..would the introduction of more complex visual events including patterns and camouflaging lead to the expected change in recall performance due to an increased demand on the visual system?

…..can we assume that the use of reward would engage more brain areas in the recall performance?

This entry was posted in memory recall, recall, Uncategorized, visual input, working memory and tagged , , . Bookmark the permalink.