Austin Z. Henley

Associate Teaching Professor
Carnegie Mellon University


Home | Publications | Teaching | Blog

CodeAid: A classroom deployment of an LLM-based programming assistant

5/19/2024

The CodeAid interface. It shows the students code, an area to ask questions, and buttons to ask the AI for specific types of help.

This post was co-written with Majeed Kazemitabaar, who led this project. Majeed is a PhD student in CS at the University of Toronto who has been researching the educational impact and utility of LLMs in computing education. We summarize our recent CHI'24 paper, "CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs". See the paper for more details.

See the discussion of this post on Hacker News.


LLM-powered tools like ChatGPT can assist students that need help in programming classes by explaining code and coding concepts, generating fixed versions of incorrect code, providing examples, suggesting areas of improvement, and even writing entire code solutions.

However, the productivity-driven and direct nature of the AI's responses is concerning in educational settings. Many instructors are prohibiting their usage in introductory programming classes to avoid academic integrity issues and students' over-reliance on AI.

In this research, we explored the design and evaluation of a "pedagogical" LLM-powered coding assistant to scale up instructional support in educational settings.

We iteratively designed a programming assistant, CodeAid, that provides help to students without revealing code solutions. It was developed by Majeed as a web app that uses GPT3.5 to power an assortment of AI features. We then deployed CodeAid in a C programming course of 700 students as an optional resource, similar to office hours and Q/A forums, for their entire 12-week semester.

Overall, we collected data from the following sources:

During the deployment, 372 students used CodeAid and asked 8000 queries. We thematically analyzed 1750 of the queries and CodeAid's responses to understand students' usage patterns and types of queries (RQ1), and CodeAid's response Quality in terms of correctness, helpfulness, and Directness (RQ2). Furthermore, we qualitatively analyzed data collected from the interviews and surveys to understand the perspectives of students (RQ3) and educators (RQ4) about CodeAid.

CodeAid's features

CodeAid was developed with five main features that were iteratively updated during the deployment based on student feedback:

The illustration below shows these features in action:

CodeAid allows students to ask five types of coding questions: General Question, Question From Code, Explain Code, Help Fix Code, and Help Write Code. In response, CodeAid uses LLMs to generate pedagogical answers that do not contain direct code solutions. When asked general questions or to generate code, it provides a natural language response with an interactive pseudo-code that allows students to hover over each line and understand what each line does. Responses also include relevant function documentations retrieved from a database to ensure factual accuracy and approved by course educator. When asked to help fix provided incorrect code, CodeAid does not display the fixed code. Instead, it highlights incorrect parts of the students' code with suggested fixes.

Below are some of the unique properties of CodeAid:

Results

From our 12-week deployment, surveys, and interviews, we aim to answer our four research questions.

RQ1: Students' Usage Patterns and Type of Queries

First, let's look into the high-level statistics of students' usage of CodeAid:

A chart showing daily usage of CodeAid over time. There are spikes at each assignment and exam due date. Peak usage was 400 questions asked by 50 users in one day.

The thematic analysis revealed four types of queries from CodeAid:

  1. Asking Programming Questions (36%)
  2. Debugging Code (32%)
  3. Writing Code (24%)
  4. Explaining Code (6%): like explaining the starter code provided in their assignments.

RQ2: CodeAid's Response Quality

The thematic analysis showed that about 80% of the responses were technically correct and The General Question, Explain Code, and Help Write Code features all responded correctly in 90% of times, while the Help Fix Code and Question from Code were correct in 60% of times.

In terms of not revealing direct solutions, CodeAid almost never revealed direct code. Instead, it generated:

RQ3: Students' Perspectives and Concerns

Based on the student interviews and surveys:

RQ4: Educators' Perspectives and Concerns

Design considerations for future educational AI assistants

We synthesized our findings into four major design considerations for future educational AI assistants, positioned within four main stages of a student's help-seeking process.

A chart of the four design considerations with additional trade-offs.

There is a ways to go before we understand how to best use AI in the classrooms to enhance both instructors and students. Maybe one day it will provide just the right information at just the right time to students to keep them optimally engaged and learning while identifying opportunities for the instructor to intervene.

TODO.

Special thanks to the other co-authors of this work: Runlong Ye, Xiaoning Wang, Paul Denny, Michelle Craig, and Tovi Grossman.

CodeAid is open source. The full details of the design and evaluation are in our paper, CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs. You might also be interested in: