Dissecting computations in the dopamine reward circuit
It has been proposed that dopamine neurons in the midbrain signal reward prediction errors, that is, the discrepancy between actual and expected reward (Schultz et al., 1997; Bayer and Glimcher, 2005).
Reward prediction error = Actual reward – expected reward
These signals resemble error signals used to train computers in machine learning or artificial intelligence. However, the mechanism underlying this calculation in the brain remains unknown. To probe how dopamine neurons calculate reward prediction error, we have developed a mouse model that allows us to combine electrophysiology in behaving animals with emerging molecular and genetic techniques (Cohen et al., 2012). In a recent study (Eshel et al., in preparation), we recorded from optogenetically-identified dopamine neurons in the ventral tegmental area (VTA) while mice performed classical conditioning tasks that varied expectation level, reward size, or both. We found that a simple, universal function predicts how individual dopamine neurons will respond to unexpected rewards of various sizes. In the presence of expectation, this function shifts downward in a purely subtractive manner, consistent with the above, canonical prediction error equation. Furthermore, the effect of expectation on each neuron's reward response multiplicatively scaled with the responsiveness of that neuron. In other words, each dopamine neuron appears to calculate reward prediction error in the same way, but scaled upwards or downwards. Such a process could naturally emerge from a homeostatic balance of excitation and inhibition, and allows each dopamine neuron to contribute fully to the prediction error signal.