Member-only story

LLM Evaluation — Everything You Need To Know!

Published in

Artificial Intelligence in Plain English

7 min readFeb 14, 2025

In the last few years, LLMs have taken the field of AI by storm with their amazing ability to generate human-like text. But the question remains, how good or even valid is the text generated?

In this article, we primarily focus on the assessment of LLMs.

What is an LLM evaluation?

LLMs are trained to generate text for a specific task, hence we evaluate the generated text on that task using different metrics. These LLM tasks can vary from generating answers for fact-based questions to generating summaries for given a document.

Sample

Considering, that I have a fine-tuned LLM for QA tasks on the topic of Finance, and it is only bound to answer on finance topic.
For instance, if I ask a question, “What is a stock market?”, LLM’s answer must involve “talking about trading shares on a platform or something close to it”. But if the LLM is answering about the “market where the fruits are sold,” then it means the LLM has a poor context relevance. Similarly, there are various scenarios where the LLM can go wrong from answering fact based questions to generating violating content.

In conclusion, based on the tasks, the LLM‘s metrics changes!

LLM Evaluation Metrics

Unlike traditional machine learning, evaluating LLM’s result is not straightforward. For instance, asking an LLM to write a story “About a dog and a cat”, can generate a wide range of responses, and evaluating this different range of responses requires specialized custom metrics.

However, before we jump into custom metrics for LLM evaluation, let us talk about common metrics used for evaluating LLM.

Answer relevancy
Hallucination
Context relevancy
Ethical metrics
Task-specific metrics

Moreover, the metrics mentioned above are great only if these metrics follow the below…

Artificial Intelligence in Plain English

LLM Evaluation — Everything You Need To Know!

What is an LLM evaluation?

Sample

LLM Evaluation Metrics

Published in Artificial Intelligence in Plain English

Written by Mayur Jain

No responses yet