Skip to main content

Unit-Testing In CI/CD

In the previous section, we learnt how to evaluate your LLM application using the evaluate() function. Unit-testing LLM applications in CI/CD pipelines isn't so different, and it simply involves moving your previous workflow to Pytest-like test files and YAML files to run evaluations in CI/CD pipelines.

Hence, it is extremely easy to setup and you can definitely reuse a lot of code from previous sections.

DID YOU KNOW?

deepeval originally got traction on GitHub due to its "Pytest for LLMs" positioning. No other framework does unit-testing as well as deepeval, which means you're in for a treat when using Confident AI in CI/CD pipelines.

We actually kind of covered how to do unit-testing in the datasets section when showing how to use datasets in CI/CD pipelines, but this page is much more comprehensive and complete than the other.

Create Your Test File

First, let's create a test file, test_llm_app.py, and note that we'll be reusing a lot of code from previous sections.

note

Your test file must start with test_, that's just how things are with Pytest.

test_llm_app.py
import hypothetical_chatbot
from deepeval.test_case import LLMTestCase
from deepeval.dataset import EvaluationDataset
from deepeval.metrics import AnswerRelevancyMetric, FaithfulnessMetric
from deepeval import assert_test

# Use dataset from previous sections
dataset = EvaluationDataset()
dataset.pull(alias="QA Dataset")

# Create a list of LLMTestCase
test_cases = []
for golden in dataset.goldens:
input = golden.input
# Replace with your own LLM application
actual_output, retrieval_context = hypothetical_chatbot(input)
test_case = LLMTestCase(
input=input,
actual_output=actual_output,
retrieval_context=retrieval_context
)
test_cases.append(test_case)

# Make dataset ready for evaluation
dataset.test_cases = test_cases

@pytest.mark.parametrize("test_case", dataset)
def test_llm_app(test_case: LLMTestCase):
assert_test(test_case, [AnswerRelevancyMetric(), FaithfulnessMetric()])
caution

Never ever use the evaluate() function within a test function! Stick to assert_test() as the evaluate() function wasn't built for unit-testing in CI/CD, and you'll miss out on a lot of features assert_test() offers that are specific for CI/CD, and introduce a lot of bugs into your codebase.

You can learn more about how to use assert_test() here.

You should sanity check yourself first by running deepeval test run once before going onto the next step. To execute test_llm_app.py, run this command in the CLI:

deepeval test run test_llm_app.py -n 2

The -n flag behind deepeval test run is an optional flag that allows you to spin up multiple processes to run assert_test() on multiple test cases at once, and is especially useful if you want to speed up the unit-testing process. To see the full list of flags available, click here.

You should see the same test run being created on Confident AI.

Confident AI

Setup Your YAML File

To execute your test file in CI/CD pipelines, simply create a YAML file that runs deepeval test run on push and pull requests.

.github/workflows/uni-testing.yml
name: LLM Unit/Regression Testing

on:
push:
pull_request:

jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout Code
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"

- name: Install Poetry
run: |
curl -sSL https://install.python-poetry.org | python3 -
echo "$HOME/.local/bin" >> $GITHUB_PATH

- name: Install Dependencies
run: poetry install --no-root

- name: Login to Confident AI
env:
CONFIDENT_API_KEY: ${{ secrets.CONFIDENT_API_KEY }}
run: poetry run deepeval login --confident-api-key "$CONFIDENT_API_KEY"

- name: Run DeepEval Test Run
run: poetry run deepeval test run test_llm_app.py

Now each time you do a push or have a pull request on GitHub, your LLM application will be unit-tested.

Also note that you don't have to necessarily use poetry for installation or follow each step exactly as presented. We're merely showing an example of how a sample yaml file to execute a deepeval test run would look like.

tip

Don't forget to supply your Confident AI API Key as you'll not get access to your datasets or generate new testing reports on Confident AI otherwise.

Integrating With GitHub Actions

If you don't already have one, create a .github/workflows directory in your repository and place your uni-testing.yml YAML file there. Now, whenever you make a commit and push this change, GitHub Actions will automatically execute it based on the specified on triggers.

What About Logging Models, Prompts, And Others?

In the previous section, we also saw how we can log hyperparameters such as models and prompts in the evaluate() function. To log it when using deepeval test run for unit-testing, simply add this to your test file:

test_llm_app.py
...

# You should aim to make these values dynamic
@deepeval.log_hyperparameters(model="gpt-4o", prompt_template="...")
def hyperparameters():
# Return a dict to log additional hyperparameters.
# You can also return an empty dict {} if there's no additional parameters to log
return {
"temperature": 1,
"chunk size": 500
}

This allows deepeval to associate evaluation results to these particular hyperparameters. Lastly, try deepeval test run to check that it shows up on Confident AI:

deepeval test run test_llm_app.py

In the next section, we'll show how you can enable no-code workflows to run evaluations on Confident AI directly.