Ranking (RLHF)

Overview

The LLM Ranking project on Datasaur enables users to easily rank three different completions for each prompt through a simple four-step setup process. By dragging and dropping, you can sequence the completions based on preference, with your preferred completion on the top. The project provides a comparison dataset which is ideal for training reward models.

Please note, the LLM Ranking Project feature is currently not available to all users by default. To gain access, kindly reach out to support@datasaur.ai with your request, and we will gladly enable it for you.

LLM Ranking supports two different CSV formats for organizing and processing data:

  1. Prompt and Completion CSV Format

    • Columns: prompt, completion_1, completion_2, … , completion_10

    • Description: A simple structure that includes the prompt and its corresponding completion (Minimum number of completions are 2 and maximum number of completions are 10).

  2. Prompt Template, Prompt, Completion, and Sources CSV Format

    • Columns: prompt_template, prompt, completion_1, completion_2, … , completion_10, sources

    • Description: A more detailed format containing:

      1. prompt_template: A system message specific to the LLM.

      2. prompt: The actual prompt or question.

      3. sources: JSON list of corresponding chunk(s) and source(s) related to the completion, such as website or PDF (optional). [{"content":"your corresponding chunk here", "url":"your website or PDF URL here"}, …, {"content":"your corresponding chunk here", "url":"your website or PDF URL here"}]

      4. completion: The corresponding answer or completion.

Note

  • The Prompt and Completion CSV format is suitable for straightforward evaluations where only the prompt and completion are required.

  • The Prompt Template, Prompt, Completion, and Sources CSV format offers greater detail and flexibility, particularly when additional context or references are needed for the completion.

Creating an LLM Ranking Project in Datasaur - A 4-Step Guide

  1. On your project homepage, click on the LLM Ranking project template and upload your file.

Supported file format: .csv

Please make sure to name the column headers in this order: prompt_template, prompt, sources, completion_1, completion_2, … , completion_10

  1. On the preview step, check the box (Convert first row as header) to transform the first-row information into a table header.

  1. The LLM Ranking task is automatically selected after verifying the uploaded file.

  1. Assign at least one labeler to the project.

LLM Ranking Labeling Process

Each prompt comes with three corresponding complements. You can drag and drop the completions, to sequence the completions based on your preference, with your preferred completion on the top.

Exporting Your LLM Ranking Labeling

We provide one export file formats:

  1. Comparison Dataset (csv): Three columns csv that contain a permutation of your ranked completions from each prompt LLM Ranking - Credit card - Comparison Dataset.csv

Happy labeling!

Last updated