OpenAI rolls out CriticGPT to spot errors in ChatGPT’s code output
Built on the GPT-4 family of large language models, CriticGPT aids in examining the code and identifies potential errors, thus helping humans to detect mistakes that might otherwise be overlooked.
Sam Altman-backed
has rolled out CriticGPT, a model designed to identify errors in ChatGPT’s code output.According to a research paper titled 'LLM Critics Help Catch LLM Bugs', the AI tool can assist human trainers in reviewing the programming code generated by ChatGPT. Built on the GPT-4 family of LLMs (large language models), CriticGPT aids in examining the code and identifies potential errors, thus helping humans to detect mistakes that might otherwise be overlooked.
The new feature seeks to improve ‘AI alignment’— a process that ensures that the AI system behaves according to human expectations—by utilising 'reinforcement learning from human feedback'. This approach helps human reviewers in enhancing the accuracy of LLM outputs.
Reinforcement learning from human feedback is a technique used in machine learning wherein human feedback is incorporated into the training process of an AI model.
“We found that when people get help from CriticGPT to review ChatGPT code, they outperform those without help 60% of the time. This is a step towards being able to evaluate outputs from advanced AI systems that can be difficult for people to rate without better tools,” reads OpenAI’s blog.
In order to train the model, human developers were asked to edit the code generated by ChatGPT, intentionally introducing errors and offering sample feedback. This data was then utilised to train CriticGPT to detect both common and novel coding errors.
“We find that CriticGPT critiques are preferred by trainers over ChatGPT critiques in 63% of cases on naturally occurring bugs, in part because the new critic produces fewer 'nitpicks' (small complaints that are unhelpful) and hallucinates problems less often,” said the company.
However, OpenAI cautioned that the model still produces hallucinations, which may lead trainers to make mistakes after seeing these hallucinations. The firm also said that real-world errors can be distributed across various parts of an answer, and their work focuses on pinpointing mistakes that can be identified in one specific place.
Edited by Swetha Kannan