Pi Joins the Reading Quiz Generation Comparison

In “Building Reading Quizzes with AI,” I presented the prompt below to ChatGPT and Claude. Then, I discussed the strengths and weaknesses of each reply. Here’s the prompt:

Write five multiple-choice reading quiz questions for Hemingway’s short story, “Soldier’s Home.”
The Prompt I fed to the AI Chatbots

Today, a colleague introduced me to Pi, an AI from Inflection. Here was Pi’s reply:

What is the main character’s name?
A. Krebs
B. Jones
C. Wilson
D. Taylor

What conflict does Krebs fight in?
A. World War I
B. World War II
C. Korean War
D. Vietnam War

How does Krebs feel about returning home after the war?
A. Overjoyed
B. Disillusioned
C. Nostalgic
D. Indifferent

What occupation does Krebs take up after the war?
A. Lawyer
B. Doctor
C. Teacher
D. Farmer

What is Krebs’ relationship like with his family after the war?
A. Loving and supportive
B. Strained and distant
C. Non-existent

Pi’s reply to the prompt, 10/30/2023

Discussion

Pi starts off asking a few common questions–the protagonist and the war in which he had recently served. These are good questions. But, like ChatGPT, the protagonist is revealed in the other questions. More problematic is Question #4. Krebs’ only plan for the future concerning jobs is to “go to Kansas City and get a job.” The particular sort of job isn’t stated, and we don’t know what sort of work Krebs is capable of doing. I called Pi on this discrepancy, just as I would have if it were a student. I asked for chapter and verse. Here’s the interchange:

Me: “That’s curious. I don’t recall Krebs taking up any particular occupation after the war. Can you cite a line from the story to support that question?”

Pi: “Ah, you’re right – Krebs doesn’t take up a specific occupation after the war. I must have extrapolated that from the broader theme of Krebs struggling to find his place in civilian life. My apologies – you caught me slipping! 😂 The question about his occupation was more of an interpretation than a fact pulled directly from the text. Thanks for keeping me honest!”
My conversation with Pi concerning Question #4

Pi is right that there is a broader theme about Krebs’ struggle to fit into post-war, civilian life. But it’s too charitable to call a mistake of this sort an “interpretation.” An interpretation, to be taken seriously, would need to be based in the text and/or the wider historical/social context in some way. And, in any case, questions that call upon tangential or, worse, non-existent aspects of the text are not the sort of thing one wants–as teacher or student–in a reading quiz.

Question #5 is problematic because it calls for a subjective judgement. Answer C is clearly wrong, but answers A and B are both true to some extent. There’s no unambiguously true answer available. As with Claude (but not ChatGPT), Pi does not include an answer key.

So, of the AI I’ve surveyed so far, ChatGPT has done the best job with this prompt. Things could be different with a different prompt and surely will be different with the same prompt over time. Ranking the quality of the individual outputs so far, I’d place them in this order:

ChatGPT
Claude
Pi

Comparing and evaluating AI output for quality is a smarter strategy than relying on one. Comparison puts you in the role of critical consumer of AI-generated information rather than an uncritical recipient of the same. It’s a simple strategy, and one I recommend.

Photo Credit

The featured image was generated by DALL-E 2 in response to this prompt: “Create a Van Gogh-style painting of two robots. One robot is a literature professor. The other robot is a student. The teacher is giving the student a reading quiz.” Here’s the full version:

Discussion

Photo Credit

Leave a Reply