When AI gets it wrong: Teaching critical thinking through logical puzzles
Dr Yuan Zhao argues that if we want our students to use AI tools critically and ethically, we must create learning experiences that demonstrate both their strengths and pitfalls.

Since ChatGPT became widely available at the end of 2022, it has changed (almost) everything - including how we teach and learn in higher education. Many of us are using large language model-based Generative AI tools daily: to correct grammar, draft lesson plans, or suggest culturally appropriate emojis (I can’t be the only one?!). Similarly, our students are using them to find information, summarise text, write CVs, check - or in some cases complete - their assignments.
We have had plenty of discussion about the limitations of AI - data biases, fabricated facts, and other well-publicised issues. Yet, for the most part, large language models handle our everyday coursework with surprising competence. In one example, students were asked to identify inaccuracies in an AI-generated summary of a research article; however, the summary was so accurate that opportunities for meaningful critical analysis were limited. From a student perspective, this can be disorienting. It reinforces a perception - however misleading - that AI can do the job for them, sometimes better than their own effort would. This risks not only demotivating students, but also encouraging a passive approach to AI: where its outputs can be trusted without critical evaluation from the human user. As generative AI tools become ever more precise, we face a new pedagogical challenge: how can we design learning experiences to develop students’ evaluative and critical thinking skills amid such sophisticated AI output?
There are things we could do to mitigate these risks. We can redesign our learning and assessment activities to be more authentic, so that students’ input is clearly evident and valued. We can also set tasks where AI tends to struggle - at least for now - and guide students to explore the limitations and identify the issues themselves.
To this end, I created a workshop centred around logical-reasoning problems. One activity was based on the classic logical puzzle ‘Einstein’s Riddle’. While non-reasoning models such as ChatGPT 4o were capable of proposing the correct high-level structure for solving the puzzle, the step-by-step reasoning contained errors and unfounded assumptions that survived into the final output. To complement the non-academic puzzle, I also included a subject-specific question from my biology module - one that similarly relies on logical reasoning and that several models had answered incorrectly. Together, the two examples demonstrated that generative AI can stumble in many scenarios, including academic work. With both examples, students were asked to first attempt their own answers, then consult an AI tool (ChatGPT 4o), before dissecting the model’s reasoning to uncover any logical error.
I piloted the workshop with a small cohort of second-year Biology undergraduates as part of their module. Students reported a significant improvement of their awareness of the limitation of generative AI in logical reasoning (p = 0.0025), and also increased appreciation of the importance of critical evaluation of AI-generated content in academic work (p = 0.0379). There are several caveats though. Firstly, more advanced reasoning models such as ChatGPT o3 were able to detect contradictions, validate constraints, backtrack on unfounded assumptions, and generate correct answers without further prompting. This ties into broader discussions about unequal access to AI based on socioeconomic background. Secondly, while students were more aware of the limitations and agreed with the importance of critical evaluation of AI output, they didn’t feel more confident judging AI output after the session (p = 0.7560). It was straightforward to identify the error in Einstein’s Riddle because every clue was explicit, but finding the unfounded assumption in the biology answer was much harder. The missing piece may be the subject knowledge - you can’t recognise a wrong answer unless you know the right one.
As educators, we are still in the early stages of understanding how AI tools will reshape higher education. However, if we want our students to use those tools critically and ethically, we must create learning experiences that demonstrate both their strengths and pitfalls. I hope my attempt contributes to a broader conversation about how we can help our students engage critically in the AI era.
Links to ChatGPT Solutions:
- Non-reasoning model ChatGPT 4o: https://chatgpt.com/share/68518ad1-cb30-8007-ba14-557c15040c01
- Reasoning model ChatGPT o3: https://chatgpt.com/share/68518dc5-a0c4-8007-9e63-0e5b1f556b11
Dr Yuan Zhao
Lecturer in Medical Genetics, School of Biological and Behavioural Sciences
https://www.qmul.ac.uk/sbbs/staff/yuan-zhao.html