Custom OCR

OCR Custom Text Extraction API

This custom text extraction API is a Datasaur feature which allows creating a custom OCR project using your own text extraction API.

Request from Datasaur

POST https://custom-text-extractor.com/text-extraction/example

Request headers

Accept

application/json, text/plain

Form Data Parameters

upload

Expected API Response

Datasaur can process the response differently based on the Content-Type header returned from the API response.

Text response (Content-Type: text/plain)

SHIHLIN TAIWAN
STREET SNACKS
Grand Galaxy Park
DATE 26/02/20 15:53
CASHIER: Reny
No. Customer: 1

JSON response (Content-Type: application/json)

Datasaur uses Importable format to process the API response.

{
  "cells": [
    {
      "content": "SHIHLIN TAIWAN",
      "index": 0,
      "line": 0,
      "metadata": [],
      "tokens": [
        "SHIHLIN",
        "TAIWAN"
      ]
    },
    {
      "content": "STREET SNACKS",
      "index": 0,
      "line": 1,
      "metadata": [],
      "tokens": [
        "STREET",
        "SNACKS"
      ]
    }
  ],
  "labelSets": [],
  "labels": [
    {
      "startCellLine": 0,
      "startCellIndex": 0,
      "startTokenIndex": 0,
      "startCharIndex": 0,
      "endCellLine": 0,
      "endCellIndex": 0,
      "endTokenIndex": 0,
      "endCharIndex": 6,
      "layer": 0,
      "counter": 0,
      "pageIndex": 0,
      "type": "BOUNDING_BOX",
      "nodeCount": 4,
      "x0": 130,
      "y0": 154,
      "x1": 255,
      "y1": 154,
      "x2": 255,
      "y2": 186,
      "x3": 130,
      "y3": 186
    },
    {
      "startCellLine": 0,
      "startCellIndex": 0,
      "startTokenIndex": 1,
      "startCharIndex": 0,
      "endCellLine": 0,
      "endCellIndex": 0,
      "endTokenIndex": 1,
      "endCharIndex": 5,
      "layer": 0,
      "counter": 0,
      "pageIndex": 0,
      "type": "BOUNDING_BOX",
      "nodeCount": 4,
      "x0": 261,
      "y0": 154,
      "x1": 375,
      "y1": 154,
      "x2": 375,
      "y2": 186,
      "x3": 261,
      "y3": 186
    }
  ],
  "name": "receipt.jpg",
  "pages": [
    {
      "pageIndex": 0,
      "pageHeight": 619,
      "pageWidth": 551
    }
  ],
  "type": "BOUNDING_BOX"
}

Apply custom API

ASR Custom Text Extraction API

This custom text extraction API is a Datasaur feature which allows creating a custom Audio project using your own text extraction API.

Request from Datasaur

POST https://custom-text-extractor.com/text-extraction/example

Request headers

Accept

application/json, text/plain

Form Data Parameters

upload

Your document file (e.g.: audio2.flac)

Expected API Response

Datasaur can process the response differently based on the Content-Type header returned from the API response.

Text response (Content-Type: text/plain)

A quick brown fox jumps over a lazy dog
Speaker 2: A quick brown fox jumps over a lazy dog
Speaker 1: A quick brown fox jumps over a lazy dog
Speaker 2: A quick brown fox jumps over a lazy dog
Speaker 1: A quick brown fox jumps over a lazy dog

JSON response (Content-Type: application/json)

Datasaur uses Importable format to process the API response.

{
  "cells": [
    {
      "content": "SHIHLIN TAIWAN",
      "index": 0,
      "line": 0,
      "metadata": [],
      "tokens": ["SHIHLIN", "TAIWAN"]
    },
    {
      "content": "STREET SNACKS",
      "index": 0,
      "line": 1,
      "metadata": [],
      "tokens": ["STREET", "SNACKS"]
    }
  ],
  "labelSets": [],
  "labels": [
    {
      "id": 1,
      "startCellLine": 0,
      "startCellIndex": 0,
      "startTokenIndex": 0,
      "startCharIndex": 0,
      "endCellLine": 0,
      "endCellIndex": 0,
      "endTokenIndex": 1,
      "endCharIndex": 5,
      "layer": 0,
      "counter": 0,
      "startTimestampMillis": 1375,
      "endTimestampMillis": 4250,
      "type": "TIMESTAMP"
    },
    {
      "id": 2,
      "startCellLine": 1,
      "startCellIndex": 0,
      "startTokenIndex": 0,
      "startCharIndex": 0,
      "endCellLine": 1,
      "endCellIndex": 0,
      "endTokenIndex": 1,
      "endCharIndex": 5,
      "layer": 0,
      "counter": 0,
      "startTimestampMillis": 4437,
      "endTimestampMillis": 8218,
      "type": "TIMESTAMP"
    }
  ],
  "name": "ASR API Response Sample",
  "type": "TIMESTAMP"
}

Apply custom API

Custom API capabilities are only supported in team workspaces. If you would like access, please email us at support@datasaur.ai.

Last updated