student.uva.nl
What is your study programme?
UvA Logo
What is your study programme?
Colloquium credits

Presentation Master's thesis - Berkan Eray Akin - Brain & Cognition

Colloquium credits

Presentation Master's thesis - Berkan Eray Akin - Brain & Cognition

Last modified on 19-06-2025 15:47
Brute Force Versus Comprehension: Can Large Language Models Flexibly Adapt to Novel Task Demands?
Show information for your study programme
What is your study programme?
or
event-summary.start-date
24-06-2025 15:00
event-summary.end-date
24-06-2025 16:00
event-summary.location

Roeterseilandcampus - Gebouw C, Straat: Nieuwe Achtergracht 129-B, Ruimte: GS.34. Vanwege beperkte zaalcapaciteit is deelname op basis van wie het eerst komt, het eerst maalt. Leraren moeten zich hieraan houden.

Transformer-based Large Language Models (LLMs) achieve human-level performance across a range of domains, including language comprehension and multimodal reasoning. However, fundamental computational limits suggest an inherent trade-off between cognitive flexibility, accuracy, and computational resources in LLMs, such that their performance should degrade systematically in out-of-distribution contexts. To test this, we evaluated ten state-of-the-art LLMs on a novel sentence-completion benchmark involving linguistic self-reference and hierarchical meta-rules that require flexible task switching. Overall, LLMs performed worse on self-referential compared to simpler control sentences. Crucially, accuracy declined markedly when meta-rules were introduced, with all models except o3 collapsing to chance-level performance at the first meta-rule level. These results indicate a lack of cognitive flexibility and underscore the limitations of pattern-matching strategies in current LLM architectures. Scaling and optimization appear to delay, but not prevent, breakdown at higher meta rule levels, challenging claims of near-term human-like reasoning abilities in LLMs