Recurrent neural networks outperform canonical computational models at fitting auditory brain responses

Date:

link to conference page

Abstract

Computational models of auditory neural responses have originated from the ubiquitous Spectro-Temporal Receptive Field (STRF) or Linear (L) model, which states that neural activation is simply a direct weighting of the stimulus frequency-time bins over a past of arbitrarily defined length. However, despite successive additions of nonlinearities to account for specific properties observed in biological brains, such as adaptation, contrast gain control or ON-OFF response generation, the core philosophy when modelling auditory neural activity remains that of a stateless temporal convolution performed on the cochleagram.

Although simple and convenient, this representation appears distant from the established reality of biological neurons as stateful units with an internal “memory” and short to long-range dependencies on previous inputs and states. As an example, the membrane potential at a given instant not only depends on current and past inputs, but also on its own preceding value, and more generally to its history. The corresponding class of autoregressive computational models, also known as recurrent neural networks (RNNs) in modern deep learning literature, is surprisingly lacking among the community of auditory computational neuroscience –sound being a purely temporal stimulus. Therefore, we propose a novel recurrent architectural backbone that we call StateNet, capable of processing auditory signals and accurately fitting real brain responses by leveraging statefulness.

We train our model to reproduce single-unit electrophysiological data recorded in anesthetized animals given the spectrograms of experimental stimuli, and compare it against a broad gamut of traditional models. We find that RNNs systematically outperform stateless networks by a substantial margin; our results are robust and validated on a recent benchmark comprising 3 publicly available datasets obtained across 3 species (rat, mouse, ferret) and 3 areas (A1, AAF, PEG). Finally, we propose a reverse engineering method inspired by the ”deep dream” technique from the AI/DL community, that allows to create interpretable features relatable to nonlinear STRFs for stateful networks.

Together, our findings contribute to bring computational models of audition still closer to biological neurons, and to a better understanding of their computations.