Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STAR_SP24_LLMgroup_Added gpt3.5 feedback question example #14

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified elements/.DS_Store
Binary file not shown.
6 changes: 3 additions & 3 deletions infoCourse.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"uuid": "REPLACE ME WITH A VALID UUID by running uuidgen in a shell",
"name": "CS170",
"title": "Efficient Algorithms and Intractable Problems",
"uuid": "EA537719-4B59-4855-B9A9-A3F23BC79766",
"name": "CS777",
"title": "Test Course",
"options": {
"useNewQuestionRenderer": true
},
Expand Down
Binary file modified questions/.DS_Store
Binary file not shown.
67 changes: 67 additions & 0 deletions questions/gpt3.5_feedback_example_1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Instructions to Create a PL Question with GPT-3.5 Feedback

## Preface
The following instructions are condensed, and are written for advanced question-writers. If you're new to PrairieLearn, or would prefer a more comprehensive breakdown, please consult our [beginner-friendly documentation](https://docs.google.com/document/d/1-3SF2KoPc5EGUDPnR4VhSJAHUxXIDkKyWjJ1_9BdnII/edit?usp=sharing) :-)

## Prerequisites

You will require access to:
- The OpenAI API with a valid API key — consult the instructions [here](#).
- You will have to purchase API tokens — instructions [here](#).
- A text editor.

## Directory Structure

The following instructions will help you write the necessary files for your question. Here’s the overall directory structure you should follow:
```
├── info.json
├── initial_code.py
├── question.html
└── tests/
├── ans.py
├── setup_code.py
└── test.py
```
## Modifying `info.json`

This JSON file defines the metadata of your question. Make sure to use our custom docker image, and enable networking, within the `externalGradingOptions`, as follows:
```
{
"uuid": "<AUTO-GENERATED>",
"title": "<FIXME>",
"topic": "<FIXME>",
"tags": <FIXME>,
"type": "v3",
"singleVariant": true,
"gradingMethod": "External",
"externalGradingOptions": {
"enabled": true,
"image": "rosensh/custom-grader-python:latest",
"entrypoint": "/python_autograder/run.sh",
"timeout": 60,
"enableNetworking": true
}
}
```
You should run `docker pull rosensh/custom-grader-python:latest` in order to use the custom grader image.

## Modifying `test.py`

`Test.py` is where the magic happens. A few things of note:

- We've concocted a GPT-3.5 prompt (leaning heavily on existing literature) that's proven magical in alpha tests. Stick to the script for best results.
- Insert your own API key to unlock the sorcery.
- Feel free to tinker with the scoring logic — our style below is all-or-nothing, but awarding partial credit might call for a dash of nuance.

Please consult the reference file for more details.

## Deleting `server.py`

Please remove this file — we’re using `test.py` for external grading.

## Starting up Docker

Please run the following command. Replace directory names / use the full path if you’re on Mac.
```
docker run -it --privileged --rm -p 3000:3000 -v $PWD:/course -v /Users/USERNAME/ag-temp-pl:/jobs -e HOST_JOBS_DIR=/Users/USERNAME/ag-temp-pl -v /var/run/docker.sock:/var/run/docker.sock --platform linux/amd64 prairielearn/prairielearn
```
38 changes: 38 additions & 0 deletions questions/gpt3.5_feedback_example_1/question.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
<pl-question-panel>
<p>
A valid partition of a positive integer <tt>n</tt> is a sequence of positive integers in non-decreasing order that sum up to <tt>n</tt>.
<p>
Here are a few examples of valid and invalid partitions of 4:
<ul>
<li>(4): Valid.</li>
<li>(1, 3): Valid.</li>
<li>(2, 2): Valid.</li>
<li>(3, 1): Invalid — numbers are not in non-decreasing order.</li>
<li>(0, 4): Invalid — 0 is not positive.</li>
<li>(2, 3): Invalid — 2 + 3 != 4.</li>
</ul>
</p>
Write a Python function named <tt>generate_partitions</tt> that takes in two arguments: <tt>n</tt>, a positive integer, and <tt>I</tt>, a positive integer with a default value of 1. The function should return a list of tuples of valid partitions of <tt>n</tt> — where <tt>I</tt> is the smallest integer used in any partition.
</p>
<p>
Here are some examples:
<ul>
<li><tt>generate_partitions(4)</tt> should return <tt>[(1, 1, 1, 1), (1, 1, 2), (1, 3), (2, 2), (4,)]</tt>.</li>
<li><tt>generate_partitions(4, 2)</tt> should return <tt>[(2, 2), (4,)]</tt>.</li>
<li><tt>generate_partitions(0)</tt> should return <tt>[()]</tt> since there are no valid partitions.</li>
</ul>
</p>

<pl-external-grader-variables params-name="names_from_user">
<pl-variable name="generate_partitions" type="Python function">Function to generate valid partitions of n</pl-variable>
</pl-external-grader-variables>
<pl-external-grader-variables params-name="names_for_user" empty="true"></pl-external-grader-variables>

<pl-file-editor file-name="user_code.py" ace-mode="ace/mode/python" source-file-name="initial_code.py">
</pl-file-editor>
</pl-question-panel>

<pl-submission-panel>
<pl-external-grader-results></pl-external-grader-results>
<pl-file-preview></pl-file-preview>
</pl-submission-panel>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
78 changes: 78 additions & 0 deletions questions/gpt3.5_feedback_example_1/test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
from code_feedback import Feedback
from pl_helpers import name, points
from pl_unit_test import PLTestCase
from openai import OpenAI
import textwrap

file_path = '/grade/run/user_code.py'
with open(file_path, 'r') as file:
student_solution = file.read()

client = OpenAI(
api_key='FIXME' # FIXME: Add your API key here.
)

# FIXME: Replace this with your own problem-statement.
problem_statement = "A valid partition of a positive integer n is a sequence of positive integers in non-decreasing order that sum up to n. Write a Python function named generate_partitions that takes in two arguments: n, a positive integer, and I, a positive integer with a default value of 1. The function should return a list of tuples of valid partitions of n — where I is the smallest integer used in any partition."


# FIXME: Replace this with your own solution.
solution = """
def generate_partitions(n, I=1):
if n == 0: return [()]
partitions = []
for i in range(I, n+1):
for p in generate_partitions(n-i, i):
partitions.append((i,) + p)
return partitions
"""

# FIXME: Replace this with context about your own course in the first paragraph. Further, change the test-cases in the prompt.
prompt = f"""
You are a very talented CS10 tutoring bot, the intro CS class at UC Berkeley; you are helping students learn how to program.
A student has asked for help. The question they are trying to solve is:
{problem_statement}.
Here's all the code the student has written so far:
{student_solution}
The students code does not work. Here are the test cases: [generate_partitions(0), generate_partitions(1), generate_partitions(4), generate_partitions(4,)]
The correct answer is:
{solution}

Here's what you should consider, based on their code:
1. Do they understand the question?
2. Do they understand the concepts involved?
3. Do they have a plan -- if not, help them generate one.

Do not give the student the answer or any code. If there's an obvious bug, direct them to its location. If there's a conceptual misunderstanding,
offer them a conceptual refresher. Limit your response to a sentence or 2 at most. Be as socratic as possible, and be super friendly.

Your response should be addressed to the student (not me), and should ONLY include the suggestions you want to give them.
"""

# FIXME: You might wish to change the test cases and scoring logic for your assessment.
class Test(PLTestCase):
@points(10)
@name("Test various values")
def test_1(self):
test_cases = [(0, 1), (1, 1), (4,1) , (4, 2)]
for n, I in test_cases:
expected = self.ref.generate_partitions(n, I)
user_val = Feedback.call_user(self.st.generate_partitions, n, I)
if not expected == user_val:
Feedback.set_score(0)
response = client.chat.completions.create(
model="gpt-3.5-turbo",
temperature = 0,
messages=[
{
"role": "system",
"content": prompt,
},
],
)
model_response = response.choices[0].message.content
model_response = textwrap.fill(model_response, width=80)
Feedback.add_feedback(model_response)
return

Feedback.set_score(1)
7 changes: 7 additions & 0 deletions questions/my_first_question/info.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"uuid": "8b4891d6-64d1-4e89-b72d-ad2133f25b2f",
"title": "Add two numbers",
"topic": "Algebra",
"tags": ["mwest", "fa17", "tpl101", "v3"],
"type": "v3"
}
7 changes: 7 additions & 0 deletions questions/my_first_question/question.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<pl-question-panel>
<p>Question provided by <strong>Rose Niousha</strong>:</p>
<p> There are two numbers, $a = {{params.a}}$ and $b = {{params.b}}$.</p>
<p> What is the answer of $c = a + b$?</p>
</pl-question-panel>

<pl-number-input answers-name="c" comparison="sigfig" digits="3" label="$c=$"></pl-number-input>
16 changes: 16 additions & 0 deletions questions/my_first_question/server.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import random

def generate(data):
# Sample two random integers between 5 and 10 (inclusive)
a = random.randint(5, 10)
b = random.randint(5, 10)

# Put these two integers into data['params']
data["params"]["a"] = a
data["params"]["b"] = b

# Compute the sum of these two integers
c = a + b

# Put the sum into data['correct_answers']
data["correct_answers"]["c"] = c
16 changes: 16 additions & 0 deletions questions/test_cs1/info.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"uuid": "3551731f-b6b4-4afc-b1f8-a8a4f7f73bcc",
"title": "Fibonacci function, in-browser editor, external grading",
"topic": "Autograder",
"tags": ["code"],
"type": "v3",
"singleVariant": true,
"gradingMethod": "External",
"externalGradingOptions": {
"enabled": true,
"image": "prairielearn/grader-python",
"entrypoint": "/python_autograder/run.sh",
"timeout": 60
}
}

2 changes: 2 additions & 0 deletions questions/test_cs1/initial_code.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
def fib(n):
pass
29 changes: 29 additions & 0 deletions questions/test_cs1/question.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
<pl-question-panel>
<p>
The Fibonacci numbers are 1, 1, 2, 3, 5, 8, ..., where each
number is the sum of the two before it. These are numbered $F_1, F_2, \ldots$, and
\[
F_n = F_{n-1} + F_{n-2}.
\]
</p>
<p>
Write a Python function <tt>fib</tt> that takes a number <tt>n</tt>
and returns the n<sup>th</sup> Fibonacci number, $F_n$.
</p>

<p>Your code snippet should define the following variables:</p>
<pl-external-grader-variables params-name="names_from_user">
<pl-variable name="fib" type="python function">Function to compute the $n^\text{th}$ Fibonacci number</pl-variable>
</pl-external-grader-variables>
<pl-external-grader-variables params-name="names_for_user" empty="true"></pl-external-grader-variables>

<pl-file-editor file-name="user_code.py" ace-mode="ace/mode/python" source-file-name="initial_code.py">
</pl-file-editor>

</pl-question-panel>

<pl-submission-panel>
<pl-external-grader-results></pl-external-grader-results>
<pl-file-preview></pl-file-preview>
</pl-submission-panel>

5 changes: 5 additions & 0 deletions questions/test_cs1/tests/ans.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
def fib(n):
if n <= 1:
return n
else:
return fib(n - 1) + fib(n - 2)
Empty file.
38 changes: 38 additions & 0 deletions questions/test_cs1/tests/test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
import numpy as np
from code_feedback import Feedback
from pl_helpers import name, points
from pl_unit_test import PLTestCase


class Test(PLTestCase):
@points(1)
@name("Check fib(1)")
def test_1(self):
user_val = Feedback.call_user(self.st.fib, 1)
if Feedback.check_scalar("fib(1)", self.ref.fib(1), user_val):
Feedback.set_score(1)
else:
Feedback.set_score(0)
Feedback.add_feedback("fib 1 failed")

@points(2)
@name("Check fib(7)")
def test_2(self):
user_val = Feedback.call_user(self.st.fib, 7)
if Feedback.check_scalar("fib(7)", self.ref.fib(7), user_val):
Feedback.set_score(1)
else:
Feedback.set_score(0)

@points(3)
@name("Check random values")
def test_3(self):
points = 0
num_tests = 10
test_values = np.random.choice(np.arange(2, 30), size=num_tests, replace=False)
for in_val in test_values:
correct_val = self.ref.fib(in_val)
user_val = Feedback.call_user(self.st.fib, in_val)
if Feedback.check_scalar(f"fib({in_val})", correct_val, user_val):
points += 1
Feedback.set_score(points / num_tests)