Austin Z. Henley

Associate Teaching Professor
Carnegie Mellon University


Home | Publications | Teaching | Blog

Learning to code with and without AI

3/31/2024

This post was co-written with Majeed Kazemitabaar, a PhD student at the University of Toronto, whom I've been collaborating with on AI tools for CS education. We summarize two research papers in this post that Majeed led.


Tools like ChatGPT are capable of solving many introductory programming tasks. In fact, you can often just copy-paste the instructions without any additional effort and get a detailed solution and explanation back.

This has caused a lot of discussion about the learning effects of LLMs that generate code from natural language descriptions on programming education (e.g., The Robots Are Coming). Educators are concerned about students becoming over-reliant on AI tools and not learning effectively. In contrast, these tools might also potentially lower the barriers of entry to programming, and even broaden participation into computing!

Towards understanding what is really going on here, we had two fundamental questions:

A screenshot of the two research papers described in this post.

The full details of this research can be found in two papers: Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming (CHI'23) and How Novices Use LLM-Based Code Generators to Solve CS1 Coding Tasks in a Self-Paced Learning Environment (Koli Calling'23).

Methodology

To answer these questions, we conducted a controlled study over 10 sessions in August 2022 with 69 students (ages 10-17) that had no prior Python programming experience. On the first session students were taught the first basic concepts of programming such as variables, conditionals, and loops using Scratch. Students were then immediately evaluated for their high-level computational thinking skills using 25 Scratch code-tracing questions.

For the next seven sessions, students were then divided into two groups: the Baseline group and the Codex group. During these seven sessions, both groups of students worked on 45 two-part tasks using Coding Steps, the tool developed for the study. Coding Steps included novice-friendly documentations and allowed remote TAs to provide real-time feedback on students' submissions. The first part of each task was a code-authoring task in which students had to write code based on the provided instructions, while the second part was a code-modifying task where students had to modify the correct solution to the previous part based on additional requirements. Students in the Codex group only had optional access to the LLM code generator during the code-authoring task.

A screenshot of the Coding Steps tool. It shows the student the task description, an example output, their code, the program output, and the AI output.

The code generator was based on OpenAI Codex and allowed students to type in a natural language description of a program that it would then convert into Python code.

Lastly, the final two sessions focused on evaluating students' performance in which they did not have access to Codex or other assistances. Both evaluation sessions included ten coding tasks and 40 multiple-choice questions. The first evaluation post-test was conducted a day after the seven training sessions, while the second test was conducted a week later, using similar tasks that were slightly modified.

Learning performance with and without AI

First, let's look at the results from the seven training sessions. See Majeed's CHI'23 paper for more details.

Second, let's look at the results from the two evaluation post-tests that were conducted in the last two sessions.

Learning with AI: over-reliance vs. self-regulation

We performed a thematic analysis on 1666 usages of the AI code generator by students in the Codex group from the seven training sessions. We focused on how they used the tool, what prompts they used, and how did they verify and use the AI-generated code. We discovered various signs of over-reliance and signs of self-regulation that are discussed below. See our Koli Calling'23 paper for more details.

Future tool designers and educators, should promote opportunities for self-regulated use of LLM code generators, while discouraging unregulated usages explained above.


For more details, check out the papers, Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming (CHI'23) and How Novices Use LLM-Based Code Generators to Solve CS1 Coding Tasks in a Self-Paced Learning Environment (Koli Calling'23), led by Majeed Kazemitabaar.

Stay tuned to hear about the tools we built based on these findings!

Special thanks to the other co-authors: Justin Chow, Carl Ka To Ma, Xinying Hou, Barbara Ericson, David Weintrop, and Tovi Grossman.