Evaluation

Fine tune your large language model with human feedback.

Overview

The LLM Evaluation project on Datasaur allows users to rate and refine completions corresponding to various prompts; you can set up the LLM Evaluation project in four easy steps. Every completion receiving 5 stars requires no further user feedback, while those rated lower require user refinement. The project offers three export formats, aiding various fine-tuning processes for efficient LLM evaluation and training.

Please note, the LLM Evaluation Project feature is currently not available to all users by default. To gain access, kindly reach out to support@datasaur.ai with your request, and we will gladly enable it for you.

LLM Evaluation supports two different CSV formats for organizing and processing data:

  1. Two-Column CSV Format

    • Columns: prompt, completion

    • Description: A simple structure that includes the prompt and its corresponding completion.

  2. Four-Column CSV Format

    • Columns: prompt_template, prompt, completion, completion_sources

    • Description: A more detailed format containing:

      1. prompt_template: A system message specific to the LLM.

      2. prompt: The actual prompt or question.

      3. sources: JSON list of corresponding chunk(s) and source(s) related to the completion such as website or PDF. (optional). [{"content":"your corresponding chunk here", "url":"your website or PDF URL here"}, …, {"content":"your corresponding chunk here", "url":"your website or PDF URL here"}]

      4. completion: The corresponding answer or completion.

Note

  • The two-column CSV format is suitable for straightforward evaluations where only the prompt and completion are required.

  • The four-column CSV format offers greater detail and flexibility, particularly when additional context or references are needed for the completion.

Creating an LLM Evaluation Project in Datasaur - A 4-Step Guide

  1. On your project homepage, click on the LLM Evaluation project template and upload your file.

Supported file format: .csv

Please Make sure to name the column headers in this order: prompt_template, prompt, sources, completion.

  1. On the preview step, check the box (Convert first row as header) to transform the first-row information into a table header.

  1. The LLM Evaluation task is automatically selected after verifying the uploaded file.

  1. Assign at least one labeler to the project.

Voilà! You've successfully set up the LLM Evaluation Project.

LLM Evaluation Labeling Process

Each prompt comes with a corresponding completion, requiring a rating, ranging from 1 star (worst) to 5 stars (best).

Completions rated 5 stars don't require user feedback. For ratings below 5 stars, refine the completion, save your feedback, and when satisfied with the rating, submit to proceed to the next prompt.

We automatically copy the completion before you make edits for your convenience!

Exporting Your LLM Evaluation Labeling

We provide three different export file formats:

  1. LLM Evaluation (CSV): This comprehensive format includes all prompts, completions, expected completions, and ratings. LLM Evaluation - Credit card - LLM Evaluation Format.csv

  2. Demonstration Dataset (CSV): Contains pairs of prompts and expected completions, ready for your supervised fine-tuning process. LLM Evaluation - Credit card - Demonstration Dataset.csv

  3. Demonstration Dataset (JSONL): Following OpenAI's JSONL format for fine-tuning, this also pairs prompts with expected completions. LLM Evaluation - Credit card - Demonstration Dataset.jsonl

Happy labeling!

Last updated