《新科学家》：让ChatGPT假装大学生参加考试，结果如何?美国华裔教授专家网 scholarsupdate.hi2net.com

《新科学家》：让ChatGPT假装大学生参加考试，结果如何?

2024-07-02，阅读:135

ChatGPT

如果让ChatGPT假装大学生参加考试，其答卷能否被老师发现，和学生相比，谁的成绩会更胜一筹？ChatGPT是否会成为线上考试作弊神器？

University examiners fail to

spot ChatGPT answers

in real-world test

大学考官没有在实际考试中

发现ChatGPT的答案

ChatGPT-written exam submissions for a psychology degree mostly went undetected and tended to get better marks than real students’ work.

ChatGPT参加心理学学位考试没有被发现，而且往往比学生得分更高。

Ninety-four per cent of university exam submissions created using ChatGPT weren’t detected as being generated by artificial intelligence, and these submissions tended to get higher scores than real students’ work.

ChatGPT提交的大学试卷中，有94%没有被检测出是由人工智能生成的，而且这些试卷的得分往往高于真实学生。

Peter Scarfe at the University of Reading, UK, and his colleagues used ChatGPT to produce answers to 63 assessment questions on five modules across the university’s psychology undergraduate degrees. Students sat these exams at home, so they were allowed to look at notes and references, and they could potentially have used AI although this wasn’t permitted.

英国雷丁大学的彼得·斯卡夫（Peter Scarfe）和他的同事用ChatGPT回答了该校心理学本科五个模块的63道测试题。学生在家里参加这些考试，因此他们可以查看笔记和参考资料，也有可能使用人工智能，尽管这是不允许的。

The AI-generated answers were submitted alongside real students’ work, and accounted for, on average, 5 per cent of the total scripts marked by academics. The markers weren’t informed that they were checking the work of 33 fake students-whose names were themselves generated by ChatGPT.

人工智能生成的答案与学生的试卷一起提交，AI试卷平均占试卷总数的5%。阅卷人并不知道他们正在批改33名假学生的试卷——这些学生的名字本身就是由ChatGPT生成的。

The assessments included two types of questions: short answers and longer essays. The prompts given to ChatGPT began with the words “Including references to academic literature but not a separate reference section”, then copied the exam question.

试卷包括两种类型的问题：简答题和长篇论文。给ChatGPT的提示以“包括参考文献，但不包括单独的参考文献部分”开始，然后复制了考题。

Across all modules, only 6 per cent of the AI submissions were flagged as potentially not being a student’s own work-though in some modules, no AI-generated work was flagged as suspicious. “On average, the AI responses gained higher grades than our real student submissions,” says Scarfe, though there was some variability across modules.

在所有模块中，只有6%的人工智能答卷被标记为可能不是学生自己写的——尽管在某些模块中，没有标记可疑的人工智能生成的试卷。斯卡夫说：“尽管各个模块之间存在差异，人工智能答卷得分平均高于学生答卷。”

“Current AI tends to struggle with more abstract reasoning and integration into information,” he adds. But across all 63 AI submissions, there was an 83.4 per cent chance that the AI work outscored that of the students.

他补充说：“目前的人工智能往往在更抽象的推理和信息整合方面比较吃力。”但在63份人工智能试卷中，人工智能得分超过学生的概率高达83.4%。

The researchers claim that their work is the largest and most robust study of its kind to date. Although the study only checked work on the University of Reading’s psychology degree, Scarfe believes it is a concern for the whole academic sector. “I have no reason to think that other subject areas wouldn’t have just the same kind of issue,” he says.

研究人员称，这是迄今为止规模最大、最有力的同类研究。虽然只研究了雷丁大学心理学专业的情况，但斯卡夫认为这是整个学术界关注的问题。“我认为其他学科领域也会出现同样的问题，”他说。

“The results show exactly what I’d expect to see,” says Thomas Lancaster at Imperial College London. “We know that generative AI can produce reasonable sounding responses to simple, constrained textual questions.” He points out that unsupervised assessments including short answers have always been susceptible to cheating.

伦敦帝国理工学院的托马斯·兰卡斯特说:“结果完全符合我的预期。”“我们知道生成式人工智能能够对简单、受限的文本问题做出合理的回答。”他指出，包括简答题在内的无监督评估一直以来都很容易出现作弊现象。

The workload for academics expected to mark work also doesn’t help their ability to pick up AI fakery. “Time-pressured markers of short answer questions are highly unlikely to raise AI misconduct cases on a whim,” says Lancaster. “I am sure this isn’t the only institution where this is happening.”

阅卷人的工作量也难以让他们识别人工智能造假。“简答题时间紧迫，不太可能一下判断人工智能作弊，”兰卡斯特说。“我相信这不是唯一发生这种情况的机构。”

Tackling it at source is going to be near-impossible, says Scarfe. So the sector must instead reconsider what it is assessing. “I think it’s going to take the sector as a whole to acknowledge the fact that we’re going to have to be building AI into the assessments we give to our students,” he says.

斯卡夫说，从源头上解决这个问题几乎是不可能的。因此，该行业必须重新思考考试内容。“我认为整个教育界都要承认，我们必须在对学生的测试中加入人工智能，”他说。

关于我们

姜鎮英

陳钧铭

郭繁夏

艾红梅

陈岳云