AIs like ChatGPT fall apart in classic 'Stroop' psycholog...

AI Platforms & Assistants
OpenAI
ChatGPT

AIs like ChatGPT fall apart in classic 'Stroop' psychological test — and that could stand in the way of achieving artificial general intelligence News By Darren Allan published 4 June 2026

Executive control mechanisms are key

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

A robot standing thoughtfully in front of a giant digital display with code on it

(Image credit: Getty Images)

Copy link
Facebook
X
Whatsapp
Reddit
Pinterest
Flipboard
Threads
Email

Share this article 0 Join the conversation Follow us Add us as a preferred source on Google Newsletter Subscribe to our newsletter

A new study tasked AIs with tackling the 'Stroop' test
GPT and Claude performed very poorly compared to humans
There are nuances here, but broadly, the researchers argue that improving this side of AIs is crucial for achieving artificial general intelligence

A freshly published study has pointed out a limitation of big-name AI models such as ChatGPT, albeit causing some controversy as the primary piece of research uses now outdated versions of those models – but there are nuances therein, and this doesn't make the findings irrelevant.

I'll go into that more shortly, but first, let's look at the study itself, which was highlighted on Reddit ('New study reveals top AI models completely fail the classic 'Stroop' psychological attention test') and published via the Oxford University Press in the journal PNAS Nexus.

The research consists of testing the so-called 'Stroop effect' with GPT-4o and Claude 3.5 Sonnet. As noted, these aren't the cutting-edge versions of those AIs (Large Language Models, or LLMs) – but they were at the time the initial study was carried out.

Latest Videos FromWatch full video here:

The Stroop effect refers to the phenomenon whereby the human brain gets confused when asked to name the color of the ink used to write a word, when that word can be the written version of another (incongruent) color in some cases. So, if the word 'red' is written in blue ink, that'll cause a slower response – or possibly a wrong response, where the viewer will accidentally say "red" rather than the actual color of the ink, which is blue.

This is because the brain is trying to juggle two different tasks – reading comprehension and color recognition – and so cognitive interference arises. Overriding the compulsion to read the word and say the color instead requires "executive control of attention," and this is what the authors were testing in the AI models. Both color-naming and word-reading were tested in shorter and longer lists of words (5, 10, 20, and 40 words).

Analysis: another necessary step on the path to AGI?

An AI face in profile against a digital background.

(Image credit: Shutterstock / Ryzhi)

If you've scanned through the Reddit thread, you doubtless noticed that, as mentioned at the outset, there's a lot of flak fired at this study by commenters due to the usage of outdated models of GPT and Claude.

What to read next

Everyone’s switching to Claude but the smartest AI might surprise you
Studies show top AI models go to 'extraordinary lengths' to stay active
Could ChatGPT suffer Firefox’s fate? — 'The risk of falling behind is growing exponentially' as rival AI tools Gemini and Claude surge while Copilot stalls

Indeed, these older LLMs are called "state of the art" at one point by the authors – and of course, as already noted, they were cutting-edge when the main study was conducted. Still, this is unfortunate phrasing that should've been updated and tweaked now that the paper has just been published (after peer review and so forth).

However, the researchers did conduct tests on GPT-5, Claude Opus 4.1, and Gemini 2.5 Pro in September 2025, although this is somewhat buried in the paper. That more recent testing found that these models offered only "slight" improvements on their predecessors, and that they still exhibited "ongoing executive attention deficiencies, consistent with our comprehensive analysis of earlier transformer models" (as did Gemini 2.5 Pro, which was a new introduction here).

Granted, a smaller sample size was used, but the researchers still argue that overall, their study reflects a fundamental limitation which is "inherent to the architectural constraints of transformer-based LLMs".

The authors note that a caveat is that GPT-5 in 'Thinking' mode can write and then run code to ensure it performs the Stroop test flawlessly – and similar functionality can be utilized by other LLMs – but this is essentially the AI (cleverly) fudging around its inadequacies. It isn't changing the way it works or reasons more broadly, of course.

The researchers note that transformer architecture innovations for LLMs are focused on enhancing memory capabilities, which fail to address the "core limitations of attention mechanisms, specifically the need for sophisticated alerting, orienting, and executive control networks to enable cognitive flexibility."

The ultimate aim is effective goal-directed behavior, and the study observes: "Future [LLM] development might benefit from implementing more sophisticated executive control systems that can handle decision conflicts through structured, goal-directed processing rather than relying solely on enhanced memory capabilities."

The authors argue that "incorporating executive control mechanisms akin to those in biological attention is crucial for achieving artificial general intelligence [AGI]."

Google logo on a black background next to text reading 'Click to follow TechRadar'

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds.

An Apple MacBook Air against a white background

The best laptops for all budgetsOur top picks, based on real-world testing and comparisons

➡️ Read our full guide to the best laptops1. Best overall: Apple MacBook Air 13-inch M52. Best budget: Apple MacBook Neo3. Best Windows 11 laptopMicrosoft Surface Laptop 13-inch4. Best thin and light:Lenovo Yoga Slim 9i5. Best UltrabookAsus Zenbook S 16

TOPICS AI CATEGORIES ChatGPT Claude Gemini OpenAI Darren Allan

Darren is a freelancer writing news and features for TechRadar (and occasionally T3) across a broad range of computing topics including CPUs, GPUs, various other hardware, VPNs, antivirus and more. He has written about tech for the best part of three decades, and writes books in his spare time (his debut novel - 'I Know What You Did Last Supper' - was published by Hachette UK in 2013).

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.

Logout Read more AI Farmer Assistant Picking Fresh Fruit At Plant

AI Platforms & Assistants ChatGPT just announced it can pass the ‘how many “r”s in strawberry’ test, but users found otherwise A man at a desk using a laptop and holding his hands up, while having a confused look on his face

A man at a desk using a laptop and holding his hands up, while having a confused look on his face

AI Platforms & Assistants ChatGPT can threaten to ‘key your car’ and get abusive with certain prompts A person typing on a laptop and using a tablet. Only their upper torso, arms and hands are visible. Text superimposed on the image shows AI

A person typing on a laptop and using a tablet. Only their upper torso, arms and hands are visible. Text superimposed on the image shows AI

Pro Microsoft scientists find most AI models struggle with long-running tasks Math example

AI Platforms & Assistants Everyone’s switching to Claude but the smartest AI might surprise you AI writer

AI Platforms & Assistants Studies show top AI models go to 'extraordinary lengths' to stay active The ChatGPT virtual assistant logo on a smartphone.

Pro Could ChatGPT suffer Firefox’s fate? — 'The risk of falling behind is growing exponentially' as rival AI tools Gemini and Claude surge while Copilot stalls Latest in AI Platforms & Assistants

AI Platforms & Assistants Meta AI's recent hack is a wake-up call for anyone who puts their trust in AI systems Redesigned Samsung Health app

Fitness Trackers Samsung Galaxy Watch users are getting a completely redesigned, AI-first app for a 'personalized experience' Tim Cook at WWDC 2023

Tech What to expect from Tim Cook's final WWDC — assuming he doesn't pull a Wolf of Wall Street

Computing A Californian city just became the first to permanently ban data centers Woman Reviewing a Resume

ChatGPT I used ChatGPT to enhance my resume and job hunt Microsoft Build 2026

Pro Microsoft reveals Scout, its first "Autopilot" Latest in News Nvidia RTX Spark PC showing inside with the N1X CPU

Computing Nvidia’s new RTX Spark chip won’t come to a PC handheld soon says Huang Belkin Gaming Charging Grip for Nintendo Switch 2

Gaming Accessories Belkin's new Nintendo Switch 2 Grip could solve my biggest problem with handheld mode Ruark R710 Music Console and Talisman-R speakers, in a hi-fi listening room

Ruark R710 Music Console and Talisman-R speakers, in a hi-fi listening room

Speakers Ruark's new R710 Music Console supports CD, vinyl, and hi-res streaming Lara Croft in Tomb Raider: Legacy of Atlantis

Gaming 'At least they're honest about it?' —Tomb Raider: Legacy of Atlantis is the latest game to come with an AI-generated content disclosure Lamine Yamal talks to Gavi of Spain during a training session

Lamine Yamal talks to Gavi of Spain during a training session

How to Watch Football How to watch Spain vs Iraq: Free Streams & TV Channels for World Cup 2026 warm-up match David Morrissey and Alan Cumming stand in Manchester's Canal Street

David Morrissey and Alan Cumming stand in Manchester's Canal Street

Entertainment Tip Toe full episode release date on Channel 4 LATEST ARTICLES

1Many US tech firms are turning to China's DeepSeek as the bill for homegrown AI bites – American AI companies could learn a thing or two
2OpenAI’s Codex helps discover HTTP/2 Bomb DoS attack that can nuke over 30GB of RAM within seconds, knocking web servers offline before they can react
3With GTA 6 Online, Rockstar has a chance to free players from its greatest vehicular mistake
4I'm done with multitasking on a single PC, and I'm tired of waiting for Valve — so I built a custom Steam Machine, and here's what it has changed for me
55 things to expect at WWDC 2026 — from Siri 2.0 to Tim Cook's Apple farewell

AIs like ChatGPT fall apart in classic 'Stroop' psychological test — and that could stand in the way of achieving artificial general intelligence

Analysis: another necessary step on the path to AGI?

Related Articles

Leaked Pixel 11 wallpapers hint at what color options you’ll likely get to choose from

Google wants to kill your expensive voice transcription subscription

Google Photos on Android finally gives your custom stickers a home