Custom Text Extraction API

Custom text extraction API is a Datasaur feature which allows creating a custom OCR project using your own text extraction API.

Request from Datasaur

POST https://custom-text-extractor.com**/text-extraction/example**

Request headers

Accept

application/json, text/plain

Form Data Parameters

upload

Your document file (e.g.: receipt.jpg)

Expected API Response

Datasaur can process the response differently based on the Content-Type header returned from the API response.

Text response (Content-Type: text/plain)

SHIHLIN TAIWAN
STREET SNACKS
Grand Galaxy Park
DATE 26/02/20 15:53
CASHIER: Reny
No. Customer: 1

JSON response (Content-Type: application/json)

Datasaur uses Importable format to process the API response.

{
"cells": [
{
"content": "SHIHLIN TAIWAN",
"index": 0,
"line": 0,
"metadata": [],
"tokens": [
"SHIHLIN",
"TAIWAN"
]
},
{
"content": "STREET SNACKS",
"index": 0,
"line": 1,
"metadata": [],
"tokens": [
"STREET",
"SNACKS"
]
}
],
"labelSets": [],
"labels": [
{
"startCellLine": 0,
"startCellIndex": 0,
"startTokenIndex": 0,
"startCharIndex": 0,
"endCellLine": 0,
"endCellIndex": 0,
"endTokenIndex": 0,
"endCharIndex": 6,
"layer": 0,
"counter": 0,
"pageIndex": 0,
"type": "BOUNDING_BOX",
"nodeCount": 4,
"x0": 130,
"y0": 154,
"x1": 255,
"y1": 154,
"x2": 255,
"y2": 186,
"x3": 130,
"y3": 186
},
{
"startCellLine": 0,
"startCellIndex": 0,
"startTokenIndex": 1,
"startCharIndex": 0,
"endCellLine": 0,
"endCellIndex": 0,
"endTokenIndex": 1,
"endCharIndex": 5,
"layer": 0,
"counter": 0,
"pageIndex": 0,
"type": "BOUNDING_BOX",
"nodeCount": 4,
"x0": 261,
"y0": 154,
"x1": 375,
"y1": 154,
"x2": 375,
"y2": 186,
"x3": 261,
"y3": 186
}
],
"name": "receipt.jpg",
"pages": [
{
"pageIndex": 0,
"pageHeight": 619,
"pageWidth": 551
}
],
"type": "BOUNDING_BOX"
}