Roeterseilandcampus - Gebouw C, Straat: Nieuwe Achtergracht 129-B, Ruimte: GS.34. Vanwege beperkte zaalcapaciteit is deelname op basis van wie het eerst komt, het eerst maalt. Leraren moeten zich hieraan houden.
As artificial intelligence (AI), in particular Large Language Models (LLMs) advance, questions arise about how their reasoning compares to human cognition. Cognitive flexibility is the ability to adapt one’s thoughts and behaviour to shifting environmental demands. It is a core human trait needed for reasoning, often missing from current LLM benchmarks, which focus on narrow domains like coding or STEM tasks. This study tested LLMs using a deliberately simple meta-rules game, designed to be easy for humans. It is expected that once the rule is recognised, it can be applied consistently. The results showed that all models had a significant drop in accuracy when a new meta-rule was introduced. Claude-4 and GPT-o3 declined more gradually but still neared 0% accuracy at the last meta-level. These results reveal that LLMs perform well when following a single rule, but their accuracy drops when new rules have to be recognised and applied to subsequent trials. New benchmarks that involve cognitive flexibility are needed to evaluate and improve reasoning abilities in AI.