Datasaur
Search…
Importable Format
Importable format is a JSON format which is used to import data to Datasaur project.
A Importable JSON format may contain the following data structures:
    1.
    type: the Importable type (which identified with value "BOUNDING_BOX")
    2.
    cells: an array containing the intersection of a row and a column
      1.
      content: a sentence in the cell
      2.
      index: the column index for the cell
      3.
      line: the row index for the cell
      4.
      metadata: additional information for a cell
      5.
      tokens: array of strings to define custom tokenization
    3.
    labelSets: the label set which is used by the project.
    4.
    labels: an array of labels
      1.
      Bounding box label type
        1.
        type: identified with value "BOUNDING_BOX"
        2.
        startCellLine: starting line sentence position
        3.
        startCellIndex: starting line column position
        4.
        startTokenIndex: starting token index position
        5.
        startCharIndex: starting character index position (relative to tokenIndex, start from 0 again when tokenIndex incremented)
        6.
        endCellLine: ending line sentence position
        7.
        endCellIndex: ending line column position
        8.
        endTokenIndex: ending token index position
        9.
        endCharIndex: ending character index position
        10.
        layer: the layer where the token is positioned
        11.
        counter:
        12.
        pageIndex: index of the page if the document contain multiple pages
        13.
        nodeCount: total number of the bounding box points
        14.
        x0: x coordinate of top left position of the bounding box
        15.
        y0: y coordinate of top left position of the bounding box
        16.
        x1: x coordinate of top right position of the bounding box
        17.
        y1: y coordinate of top right position of the bounding box
        18.
        x2: x coordinate of bottom right position of the bounding box
        19.
        y2: y coordinate of bottom right position of the bounding box
        20.
        x3: x coordinate of bottom left position of the bounding box
        21.
        y3: y coordinate of bottom left position of the bounding box
    5.
    pages: an array of page information
      1.
      pageIndex: index of the page if the document contain multiple pages
      2.
      pageHeight: original page height in pixel
      3.
      pageWidth: original page width in pixel

Example (with bounding box label type)

1
{
2
"cells": [
3
{
4
"content": "SHIHLIN TAIWAN",
5
"index": 0,
6
"line": 0,
7
"metadata": [],
8
"tokens": [
9
"SHIHLIN",
10
"TAIWAN"
11
]
12
},
13
{
14
"content": "STREET SNACKS",
15
"index": 0,
16
"line": 1,
17
"metadata": [],
18
"tokens": [
19
"STREET",
20
"SNACKS"
21
]
22
}
23
],
24
"labelSets": [],
25
"labels": [
26
{
27
"startCellLine": 0,
28
"startCellIndex": 0,
29
"startTokenIndex": 0,
30
"startCharIndex": 0,
31
"endCellLine": 0,
32
"endCellIndex": 0,
33
"endTokenIndex": 0,
34
"endCharIndex": 6,
35
"layer": 0,
36
"counter": 0,
37
"pageIndex": 0,
38
"type": "BOUNDING_BOX",
39
"nodeCount": 4,
40
"x0": 130,
41
"y0": 154,
42
"x1": 255,
43
"y1": 154,
44
"x2": 255,
45
"y2": 186,
46
"x3": 130,
47
"y3": 186
48
},
49
{
50
"startCellLine": 0,
51
"startCellIndex": 0,
52
"startTokenIndex": 1,
53
"startCharIndex": 0,
54
"endCellLine": 0,
55
"endCellIndex": 0,
56
"endTokenIndex": 1,
57
"endCharIndex": 5,
58
"layer": 0,
59
"counter": 0,
60
"pageIndex": 0,
61
"type": "BOUNDING_BOX",
62
"nodeCount": 4,
63
"x0": 261,
64
"y0": 154,
65
"x1": 375,
66
"y1": 154,
67
"x2": 375,
68
"y2": 186,
69
"x3": 261,
70
"y3": 186
71
}
72
],
73
"name": "receipt.jpg",
74
"pages": [
75
{
76
"pageIndex": 0,
77
"pageHeight": 619,
78
"pageWidth": 551
79
}
80
],
81
"type": "BOUNDING_BOX"
82
}
Copied!
Last modified 3mo ago