Usage

Creating exam questions is tedious for professors and often time-consuming, distracting them from their primary responsibilities of teaching and research. Developing a large language model (LLM) pipeline to generate exam questions from lecture notes aims to streamline this process and enhance efficiency and accuracy. This automated solution ensures comprehensive coverage of course material and produces high-quality, consistent questions. It allows customization to suit different question formats and difficulty levels. Ultimately, this pipeline should enable educators to focus more on student engagement and the quality of instruction rather than stressing about exam questions.

To evaluate the performance of our pipeline, we employed two different evaluation processes that used three different LLMs, namely GPT3.5-Turbo, GPT4o and ThePitbull-21B-v2

First, we asked the models to evaluate themselves based mainly on the difficulty and relevance of the generated questions according to the lecture context. Notably, the models did not know they generated the questions themselves.

The takeaway from these results is that all models considered the generated questions relevant to the given context, but they varied in their difficulty. The local model showed much higher variance than GPT-3.5 Turbo and GPT-4o. On the other hand, GPT-4o tended to generate only difficult questions. Additionally, we observed that GPT-3.5 Turbo and the local model had some issues, as indicated by the (0,0) position, meaning their answers could not be validated, whereas GPT-4o had no issues providing the correct format.

Additionally, we conducted a live survey in which we asked 29 participants to identify the human-generated questions from four different groups of questions. Each group contained three questions from an anonymous model (those mentioned above) and an additional group containing human-written questions. Notably, the participants were all knowledgeable about LLMs, and the majority had taken part in the course from which the questions were generated from.

The groups in the survey were as follows:

Before checking the results, try to identify the human-generated questions yourself!

Results

Group 1 (Local)	Group 2 (GPT4o)	Group 3 (GPT3.5T)	Group 4 (Human)
4	9	4	12

Based on the results we see that the LLMs have recived 58% of the votes, while the human questions have recived 42% of the votes. This shows that the LLMs are able to generate questions that are similar to human generated questions.

The human group received the most votes, followed by the GPT-4o group. The local model and GPT-3.5 Turbo received the least votes. After the participants completed the survey, we asked them to reason why they chose the group they did. The most common reason for choosing group 4 was that the questions were more "human-like" and that it contained a typo.

Usage

To use the pipeline, you need to follow these steps:

Make sure the lectures are named as follows {Number of lecture}-{Lecture Title}.pdf
Place the lectures in the data/lectures folder
In order to be able to generate questions that are based on materials other than the lecture slides such as assignments, you need to place the files in the data/assignments folder.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
data/NLP		data/NLP
poster		poster
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
dataset.py		dataset.py
eval.py		eval.py
lecture2exam.ipynb		lecture2exam.ipynb
prompts.py		prompts.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Before checking the results, try to identify the human-generated questions yourself!

Usage

About

Releases

Packages

Contributors 2

Languages

License

Mohammadsaknini/Lecture2Exam

Folders and files

Latest commit

History

Repository files navigation

Before checking the results, try to identify the human-generated questions yourself!

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages