Links
Comment on page

Custom OCR

OCR Custom Text Extraction API

This custom text extraction API is a Datasaur feature which allows creating a custom OCR project using your own text extraction API.

Request from Datasaur

Request headers
Accept
application/json, text/plain
Form Data Parameters
upload
Your document file (e.g.: receipt.jpg)

Expected API Response

Datasaur can process the response differently based on the Content-Type header returned from the API response.

Text response (Content-Type: text/plain)

SHIHLIN TAIWAN
STREET SNACKS
Grand Galaxy Park
DATE 26/02/20 15:53
CASHIER: Reny
No. Customer: 1

JSON response (Content-Type: application/json)

Datasaur uses Importable format to process the API response.
{
"cells": [
{
"content": "SHIHLIN TAIWAN",
"index": 0,
"line": 0,
"metadata": [],
"tokens": [
"SHIHLIN",
"TAIWAN"
]
},
{
"content": "STREET SNACKS",
"index": 0,
"line": 1,
"metadata": [],
"tokens": [
"STREET",
"SNACKS"
]
}
],
"labelSets": [],
"labels": [
{
"startCellLine": 0,
"startCellIndex": 0,
"startTokenIndex": 0,
"startCharIndex": 0,
"endCellLine": 0,
"endCellIndex": 0,
"endTokenIndex": 0,
"endCharIndex": 6,
"layer": 0,
"counter": 0,
"pageIndex": 0,
"type": "BOUNDING_BOX",
"nodeCount": 4,
"x0": 130,
"y0": 154,
"x1": 255,
"y1": 154,
"x2": 255,
"y2": 186,
"x3": 130,
"y3": 186
},
{
"startCellLine": 0,
"startCellIndex": 0,
"startTokenIndex": 1,
"startCharIndex": 0,
"endCellLine": 0,
"endCellIndex": 0,
"endTokenIndex": 1,
"endCharIndex": 5,
"layer": 0,
"counter": 0,
"pageIndex": 0,
"type": "BOUNDING_BOX",
"nodeCount": 4,
"x0": 261,
"y0": 154,
"x1": 375,
"y1": 154,
"x2": 375,
"y2": 186,
"x3": 261,
"y3": 186
}
],
"name": "receipt.jpg",
"pages": [
{
"pageIndex": 0,
"pageHeight": 619,
"pageWidth": 551
}
],
"type": "BOUNDING_BOX"
}

Apply custom API

ASR Custom Text Extraction API

This custom text extraction API is a Datasaur feature which allows creating a custom Audio project using your own text extraction API.

Request from Datasaur

Request headers
Accept
application/json, text/plain
Form Data Parameters
upload
Your document file (e.g.: audio2.flac)

Expected API Response

Datasaur can process the response differently based on the Content-Type header returned from the API response.

Text response (Content-Type: text/plain)

A quick brown fox jumps over a lazy dog
Speaker 2: A quick brown fox jumps over a lazy dog
Speaker 1: A quick brown fox jumps over a lazy dog
Speaker 2: A quick brown fox jumps over a lazy dog
Speaker 1: A quick brown fox jumps over a lazy dog

JSON response (Content-Type: application/json)

Datasaur uses Importable format to process the API response.
{
"cells": [
{
"content": "SHIHLIN TAIWAN",
"index": 0,
"line": 0,
"metadata": [],
"tokens": ["SHIHLIN", "TAIWAN"]
},
{
"content": "STREET SNACKS",
"index": 0,
"line": 1,
"metadata": [],
"tokens": ["STREET", "SNACKS"]
}
],
"labelSets": [],
"labels": [
{
"id": 1,
"startCellLine": 0,
"startCellIndex": 0,
"startTokenIndex": 0,
"startCharIndex": 0,
"endCellLine": 0,
"endCellIndex": 0,
"endTokenIndex": 1,
"endCharIndex": 5,
"layer": 0,
"counter": 0,
"startTimestampMillis": 1375,
"endTimestampMillis": 4250,
"type": "TIMESTAMP"
},
{
"id": 2,
"startCellLine": 1,
"startCellIndex": 0,
"startTokenIndex": 0,
"startCharIndex": 0,
"endCellLine": 1,
"endCellIndex": 0,
"endTokenIndex": 1,
"endCharIndex": 5,
"layer": 0,
"counter": 0,
"startTimestampMillis": 4437,
"endTimestampMillis": 8218,
"type": "TIMESTAMP"
}
],
"name": "ASR API Response Sample",
"type": "TIMESTAMP"
}

Apply custom API

Custom API capabilities are only supported in team workspaces. If you would like access, please email us at [email protected].
Last modified 26d ago