Import Transformer
By using Import Transformer, you can import almost anything into Datasaur. Currently, we only accept files with
.csv
, .txt
, and .json
extension.Your new import transformer will have this template:
/**
* This function should be written as this template and correctly implements ImportFunction interface.
*/
(fileContent: string): SimpleDocument => {
/// Implement import function here
return {
cells: [],
labels: [],
};
};
The Import Transformer is a function that takes the
fileContent
in string, parsed using UTF-8 encoding, and return a SimpleDocument
that is understood by Datasaur.SimpleDocument
is an object representation of a Document in Datasaur. It is a combined type that support token-based labeling and row-based labeling. Below is the structure of SimpleDocument
:- cells: an array of cells. Datasaur's document is stored in tabular structure. The cell represents a single cell in a table. For token-based projects, we only support a single column table at this moment. Each row/line of the document must have the same number of columns.
- line: A zero-based number indicating the row
- index: A zero-based number indicating the column.For token-based projects, this value can only be set to 0.
- content: The original content of a cell
- tokens: A tokenized version of the content. This field is only used For token-based project only.
- metadata: an optional array of key-value data to be stored per cell.
- key: The name of the metadata in string
- value: The value of the metadata in string.
- Default: text/plain
- Supported type:
- text/plain: standard metadata as a plain text.
- text/html: to display metadata in HTML.
- image/*: to display metadata as an image. The supported image format will depend on the browser support.
- audio/*: to display metadata as an audio player. The supported audio format will depend on the browser support.
- pinned: Boolean value indicating whether the metadata should be displayed at the top of each cell. Non pinned metadata can be seen through Metadata Extension.
- config: A customized configuration specifically for text/plain
- color: Determine the text color of the metadata in string. Accepts any HTML color codes and names.
- backgroundColor: Determine the background color of the metadata in string. Accepts any HTML color codes and names.
- borderColor: Determine the background color of the metadata in string. Accepts any HTML color codes and names.
- labels: an array of labels
- common fields
- id: a unique number to identify the label. To be referred by the arrow labels.
- startCellLine: starting line position
- startCellIndex: starting line column position
- startTokenIndex: starting token index position, relative to cell
- startCharIndex: starting character index position relative to token
- endCellLine: ending line sentence position
- endCellIndex: ending line column position
- endTokenIndex: ending token index position, relative to cell
- endCharIndex: ending character index position, relative to token
- type: type of the labels. Accept one of these values:
"SPAN"
,"ARROW"
,"BOUNDING_BOX"
,"TIMESTAMP"
- specific fields by its type:
- "SPAN" or "ARROW"
- labelSetIndex: replaces layer. Configures how the labelset items are grouped
- labelName: replaces labelSetItemId. The text provided here will be displayed in web UI
- "ARROW"
- originId: id of a span label as the arrow's origin.
- destinationId: id of a span label as the arrow's destination.
- "BOUNDING_BOX"
- pageIndex: page information for multiple page files, such as PDF and TIFF. Set field to 0 for common image formats, such as JPG, PNG, BMP, etc.
- nodeCount: number of nodes, this is used for future support for polygons. Only support 4 nodes in rectangular shape for now.
- x0: the first node's x value in screen coordinate system.
- y0: the first node's y value in screen coordinate system.
- x1: the second node's x value in screen coordinate system.
- y1: the second node's y value in screen coordinate system.
- x2: the third node's x value in screen coordinate system.
- y2: the third node's y value in screen coordinate system.
- x3: the fourth node's x value in screen coordinate system.
- y3: the fourth node's y value in screen coordinate system.
- "TIMESTAMP"
- startTimestampMillis: the starting timestamp in millisecond.
- endTimestampMillis: the ending timestamp in millisecond.
We want to label a subtitle file in .srt format and show the timestamp as metadata. The file transformer will be shown below.
/**
* This function should be written as this template and correctly implements ImportFunction interface.
*/
(fileContent: string): SimpleDocument => {
/// Implement import function here
const lines = fileContent.split('\r\n\r\n');
let currLine: number = 0;
const cells: Cell[] = [];
lines.forEach((line) => {
const [, timestamp, ...subtitles] = line.split('\r\n');
subtitles.forEach((subtitle) => {
cells.push({
index: 0,
line: currLine,
content: subtitle,
tokens: subtitle.split(' '),
metadata: [
{key: "timestamp", value: timestamp, pinned: true, config: { color: "#3399cc", backgroundColor: "", borderColor: "#cc3399"}}
]
});
currLine += 1;
});
});
const labels: SpanAndArrowLabel[] = [];
let labelId = 0;
// Label the first two tokens on the second line as "Example label"
const secondTokenOnSecondLine = cells[1].tokens[1];
labels.push({
id: ++labelId,
type: "SPAN",
startCellLine: 1,
startCellIndex: 0,
startTokenIndex: 0,
startCharIndex: 0,
endCellLine: 1,
endCellIndex: 0,
endTokenIndex: 1,
endCharIndex: secondTokenOnSecondLine.length - 1,
labelSetIndex: 0,
labelName: "Example label"
})
// Label each occurence of "Sherlock" as "Person's name".
const sherlock = "sherlock";
cells.forEach(cell => {
cell.tokens.forEach((token, tokenIndex) => {
if (token.toLowerCase() === sherlock) {
labels.push({
id: ++labelId,
type: "SPAN",
startCellLine: cell.line,
startCellIndex: cell.index,
startTokenIndex: tokenIndex,
startCharIndex: 0,
endCellLine: cell.line,
endCellIndex: cell.index,
endTokenIndex: tokenIndex,
endCharIndex: token.length - 1,
labelSetIndex: 0,
labelName: "Person's name",
})
}
})
})
return {
cells,
labels,
};
};
The first step is, you have to rename the file by adding .txt. You can use the following sample file.
sherlock-holmes.txt
106KB
Binary
Sherlock Holmes (2009)
Click File Transformer, then copy and paste the script above.
%20(1)%20(1)%20(1).png?alt=media)
After uploading the file, choose the Subtitle script on the Import File Transformer dropdown. Finish the project creation and launch the project.

Your project is ready!
%20(1).png?alt=media)
- You need to add Metadata extension to the project.
- If you want to the metadata is readable in the text editor, set
pinned: true
. - Use HTML code color for text color, border color, and background color.
If you have any questions, please reach out to [email protected]
Last modified 1mo ago