Emergence of rapid value inference through meta-reinforcement learning

Publication information:

Lee, J.; Hennig, J. A.; Frelih, V.; Gershman, S. J.; Uchida, N.

Emergence of Rapid Value Inference through Meta-Reinforcement Learning. bioRxiv 2026.

Abstract

The ability to estimate the value associated with a specific stimulus or action is essential for adaptive behavior. Value can be updated either incrementally through experience or rapidly by inference based on latent environmental structure. Yet, how the brain implements and transitions between these modes of value computation remains unclear. To address this question, we examined the neuronal mechanisms underlying reversal learning. Mice were trained in an odor-outcome association task either with stable contingencies or with dynamically changing contingencies. Mice trained on stable contingencies formed long-term value representations that depended on synaptic plasticity in the basolateral amygdala (BLA). In contrast, mice exposed to repeated reversals acquired the ability to infer values, independent from plasticity in BLA, enabling faster learning but with more rapid memory decay. Recurrent neural network models (RNNs) trained with continuous weight updates recapitulated this transition, shifting from plasticity-based to dynamics-based value computation. Neural activity in the BLA encoded both value and contextual information necessary for computing value based on latent task structure, similar to those found in the RNNs. Disrupting BLA activity before cue delivery preferentially impaired dynamics-based value updating. Furthermore, mice could learn distinct correlation structures that enabled structure-specific value inference. Together, these findings provide a mechanistic framework for fast value updates via inference, a core feature of intelligent behavior.