🎓 Master's Thesis: Leveraging Linguistic Insights for Uncertainty Calibration of LLM and Evaluating Crowdsourced Annotations 🚀

Welcome to the repository for my master's thesis project! This research explores the potential of using a popular large language model, ChatGPT-4, to enhance the quality of crowdsourced data labels, particularly within the MAVEN dataset.

📝 Abstract

Assessing and enhancing the quality of crowdsourced annotations is essential for developing reliable AI systems as noise and biases in the data can propagate, undermining the accuracy and fairness of downstream classification and other AI-driven processes. However, relying solely on human evaluators for this task has its own challenges. As the size of datasets increases, the cost and time required for manually scrutinize the crowd annotations also rises. To address this challenge, this paper explores leveraging large language models (LLMs) to evaluate crowdsourced labels. Specifically, we investigated using the certainty expressed in LLM responses as a metric to determine the trustworthiness of LLM evaluations. We introduce a novel method that uses linguistic cues as indicators of LLM's certainty levels. By applying this method with ChatGPT-4 to evaluate a general domain event annotation dataset, we found that ChatGPT-4 successfully identified 58% of the incorrect labels recognised by the human evaluators. However, a closer inspection on the discrepancies between the LLM and human evaluations revealed additional incorrect labels missed or misidentified by the human evaluators. These findings challenge the exclusive reliance on human experts for generating ground truth labels and highlight the potential for improving data label quality through human-AI collaboration. We draw design implications for future annotation evaluation paradigms with LLMs in the loop, and discuss the ethical consideration of using LLM as an evaluation tool for crowdsourced annotations.

🎯 Conclusion

Our findings reveal that ChatGPT-4 successfully identified 58% of the incorrect labels, demonstrating the promising role of LLMs in evaluating crowdsourced data annotations. This approach can enhance the accuracy and fairness of annotations while saving time and costs.

📊 Visualizations

Here are some visualizations from the study:

🔗 Explore more detailed visualizations, results and conclusion by reading the full paper on Purdue Hammer

📬 Get in Touch

Feel free to reach out if you have any questions or feedback! You can contact me via:

Email: [email protected]
LinkedIn: LinkedIn/DivyaSree

🙌 Contributors

A special thanks to those who supported and guided me throughout this project:

Dr. Tianyi Li - Advisor
Dr. Tatiana Ringenberg - Committee Member
Dr Julia Rayz - Committee Member

Thank you for visiting my project repository! If you found this work interesting, please consider leaving a star ⭐ to show your support.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitattributes		.gitattributes
ChatGPT_Responses.ipynb		ChatGPT_Responses.ipynb
Picture1.png		Picture1.png
Picture2.png		Picture2.png
Picture3.png		Picture3.png
README.md		README.md
Supplementary_material.pdf		Supplementary_material.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎓 Master's Thesis: Leveraging Linguistic Insights for Uncertainty Calibration of LLM and Evaluating Crowdsourced Annotations 🚀

📝 Abstract

🎯 Conclusion

📊 Visualizations

📬 Get in Touch

🙌 Contributors

About

Releases

Packages

Languages

Divya2890/Uncertainty-Calibration-of-ChatGPT4

Folders and files

Latest commit

History

Repository files navigation

🎓 Master's Thesis: Leveraging Linguistic Insights for Uncertainty Calibration of LLM and Evaluating Crowdsourced Annotations 🚀

📝 Abstract

🎯 Conclusion

📊 Visualizations

📬 Get in Touch

🙌 Contributors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages