Learning without training: The implicit dynamics of in-context learning

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

This research paper explores In-Context Learning (ICL) in Large Language Models (LLMs), which is the striking ability of these models to learn new patterns from examples given in a prompt without explicit weight updates during inference. The authors hypothesize and demonstrate through theory and experimentation that the combination of a self-attention layer and a Multi-Layer Perceptron (MLP) within the transformer architecture allows the context to implicitly modify the MLP's weights. They generalize this concept with the notion of a contextual block and provide a formula showing that the effect of the context is equivalent to a low-rank weight update of the neural network's first layer. This implicit process, they argue, acts as a form of implicit learning dynamics similar to gradient descent, where tokens consumed sequentially drive the weight adjustments. The findings suggest that ICL is rooted in how regular neural networks can transfer input modifications to their weight structure, rather than solely being about the self-attention mechanism.