Reward-guided learning and decision making
In the first part, I will present data arguing for separable mechanisms for solving
the credit assignment problem that operate in parallel. When a choice results in a
reward, we should update the value of this choice accordingly. However, in our everyday
lives, rewards are often delayed, with many intervening actions, and some outcomes are
even independent of our behaviour. It is therefore often not trivial to tell which choice
(if any) is causally responsible for a particular outcome. This credit assignment problem
is usually trivial in standard learning experiments with one choice and outcome per trial.
Here, I will show that in an environment only slightly more complex, healthy humans deploy
several learning mechanism operating in parallel. Humans are very proficient at
establishing contingent associations between outcomes and their causal choices.
However, their behaviour was also guided by statistical and heuristic learning
mechanisms. The former relied on spreading credit for a reward to the average history
of choices and rewards, the latter relied on temporal proximity between choices and
rewards. These learning mechanisms were anatomically separable, and only contingent
learning was affected by lesions to orbitofrontal cortex. In the second part, I would
like to show some very early data from a series of recent experiments trying to link
neurochemistry to behaviour via their effects on network oscillations recorded with MEG.
First, we have manipulated dopaminergic transmission during simple value-guided decision making.
Second, we have measured GABA and glutamate concentrations with MRS at 7T to relate baseline
concentrations of these neurotransmitters to oscillations and behaviour during i)
value-guided choice and ii) in a novel patch-leaving kind of decision task.
Jan 09, 2017 | 04:00 PM