Roeterseilandcampus - Gebouw C, Straat: Nieuwe Achtergracht 129-B, Ruimte: GS.34. Vanwege beperkte zaalcapaciteit is deelname op basis van wie het eerst komt, het eerst maalt. Leraren moeten zich hieraan houden.
Transformer-based Large Language Models (LLMs) achieve human-level performance across a range of domains, including language comprehension and multimodal reasoning. However, fundamental computational limits suggest an inherent trade-off between cognitive flexibility, accuracy, and computational resources in LLMs, such that their performance should degrade systematically in out-of-distribution contexts. To test this, we evaluated ten state-of-the-art LLMs on a novel sentence-completion benchmark involving linguistic self-reference and hierarchical meta-rules that require flexible task switching. Overall, LLMs performed worse on self-referential compared to simpler control sentences. Crucially, accuracy declined markedly when meta-rules were introduced, with all models except o3 collapsing to chance-level performance at the first meta-rule level. These results indicate a lack of cognitive flexibility and underscore the limitations of pattern-matching strategies in current LLM architectures. Scaling and optimization appear to delay, but not prevent, breakdown at higher meta rule levels, challenging claims of near-term human-like reasoning abilities in LLMs