Comment on page
Custom OCR
This custom text extraction API is a Datasaur feature which allows creating a custom OCR project using your own text extraction API.
Request headers | |
Accept | application/json, text/plain |
Form Data Parameters | |
upload |
Datasaur can process the response differently based on the
Content-Type
header returned from the API response.SHIHLIN TAIWAN
STREET SNACKS
Grand Galaxy Park
DATE 26/02/20 15:53
CASHIER: Reny
No. Customer: 1
{
"cells": [
{
"content": "SHIHLIN TAIWAN",
"index": 0,
"line": 0,
"metadata": [],
"tokens": [
"SHIHLIN",
"TAIWAN"
]
},
{
"content": "STREET SNACKS",
"index": 0,
"line": 1,
"metadata": [],
"tokens": [
"STREET",
"SNACKS"
]
}
],
"labelSets": [],
"labels": [
{
"startCellLine": 0,
"startCellIndex": 0,
"startTokenIndex": 0,
"startCharIndex": 0,
"endCellLine": 0,
"endCellIndex": 0,
"endTokenIndex": 0,
"endCharIndex": 6,
"layer": 0,
"counter": 0,
"pageIndex": 0,
"type": "BOUNDING_BOX",
"nodeCount": 4,
"x0": 130,
"y0": 154,
"x1": 255,
"y1": 154,
"x2": 255,
"y2": 186,
"x3": 130,
"y3": 186
},
{
"startCellLine": 0,
"startCellIndex": 0,
"startTokenIndex": 1,
"startCharIndex": 0,
"endCellLine": 0,
"endCellIndex": 0,
"endTokenIndex": 1,
"endCharIndex": 5,
"layer": 0,
"counter": 0,
"pageIndex": 0,
"type": "BOUNDING_BOX",
"nodeCount": 4,
"x0": 261,
"y0": 154,
"x1": 375,
"y1": 154,
"x2": 375,
"y2": 186,
"x3": 261,
"y3": 186
}
],
"name": "receipt.jpg",
"pages": [
{
"pageIndex": 0,
"pageHeight": 619,
"pageWidth": 551
}
],
"type": "BOUNDING_BOX"
}
- In Step 2,
- Select +Add new API... as the OCR method
- Put your API name, API URL, and the secret
- After clicking Save, the custom API will be saved to the list, and you can choose it as the OCR method
- The interface will appear side-by-side with the PDF on the left and the transcription on the right
This custom text extraction API is a Datasaur feature which allows creating a custom Audio project using your own text extraction API.
Request headers | |
Accept | application/json, text/plain |
Form Data Parameters | |
upload | Your document file (e.g.: audio2.flac) |
Datasaur can process the response differently based on the
Content-Type
header returned from the API response.A quick brown fox jumps over a lazy dog
Speaker 2: A quick brown fox jumps over a lazy dog
Speaker 1: A quick brown fox jumps over a lazy dog
Speaker 2: A quick brown fox jumps over a lazy dog
Speaker 1: A quick brown fox jumps over a lazy dog
{
"cells": [
{
"content": "SHIHLIN TAIWAN",
"index": 0,
"line": 0,
"metadata": [],
"tokens": ["SHIHLIN", "TAIWAN"]
},
{
"content": "STREET SNACKS",
"index": 0,
"line": 1,
"metadata": [],
"tokens": ["STREET", "SNACKS"]
}
],
"labelSets": [],
"labels": [
{
"id": 1,
"startCellLine": 0,
"startCellIndex": 0,
"startTokenIndex": 0,
"startCharIndex": 0,
"endCellLine": 0,
"endCellIndex": 0,
"endTokenIndex": 1,
"endCharIndex": 5,
"layer": 0,
"counter": 0,
"startTimestampMillis": 1375,
"endTimestampMillis": 4250,
"type": "TIMESTAMP"
},
{
"id": 2,
"startCellLine": 1,
"startCellIndex": 0,
"startTokenIndex": 0,
"startCharIndex": 0,
"endCellLine": 1,
"endCellIndex": 0,
"endTokenIndex": 1,
"endCharIndex": 5,
"layer": 0,
"counter": 0,
"startTimestampMillis": 4437,
"endTimestampMillis": 8218,
"type": "TIMESTAMP"
}
],
"name": "ASR API Response Sample",
"type": "TIMESTAMP"
}
- In Step 2,
- Select +Add new API... as the ASR method
- Put your API name, API URL, and the secret
- After clicking Save, the custom API will be saved to the list, and you can choose it as the ASR method
- The interface will appear like the screenshot below, with the audio on the top and the transcription on the bottom
Custom API capabilities are only supported in team workspaces. If you would like access, please email us at [email protected].
Last modified 26d ago