Datasaur
Search
⌃K

Create Projects

How It Works

$ npm run start -- create-projects -h
Usage: robosaur create-projects [options] <configFile>
Create Datasaur projects based on the given config file
Options:
--dry-run Simulates what the script is doing without creating the projects
--without-pcw Use legacy Robosaur configuration (default: false)
--use-pcw Use the payload from Project Creation Wizard in Datasaur UI (default: true)
-h, --help display help for command
  • Robosaur will try to create a project for each folder inside the create.files folder. If the contents of quickstart/token-based/documents looks like the example below, Robosaur will create two projects named Project 1 and Project 2 with each project has one document named lorem.txt and ipsum.txt respectively. This attribute could be a path to your local drive or any supported object storage, the details can be seen here.
    $ ls -lR quickstart/token-based/documents
    total 0
    drwxr-xr-x 3 user group Project 1
    drwxr-xr-x 3 user group Project 2
    quickstart/token-based/documents/Project 1:
    total 8
    -rw-r--r-- 1 user group lorem.txt
    quickstart/token-based/documents/Project 2:
    total 8
    -rw-r--r-- 1 user group ipsum.txt
  • All successful project creation is saved on the state that is configured by projectState attribute. So, the next time you run the same command, there will be no project duplication. It will only process the new project(s) or the failed ones.
  1. 1.
    Select a configuration example from the quickstart folder.
  2. 2.
    Specify the create.files value. As mentioned above, this attribute will be the data source of the projects.
  3. 3.
    Open the app and select your preferred team to work on by clicking your profile on the top right corner.
  4. 4.
    Create a new project using the Project Creation Wizard (PCW) by clicking the + Custom Project.
  5. 5.
    Configure what kind of projects that you want to automate. Go through until the last step, including choosing labelers and reviewers, and click <> View Script in the top right corner (see this video to help visualize the step).
  6. 6.
    Copy the values.
  7. 7.
    Paste the value directly to create.pcwPayload and make sure the create.pcwPayloadSource value is properly filled. See the detailed below.
  8. 8.
    Specify the pcwAssignmentStrategy. The value could be ALL (default) or AUTO. See the detailed below.
  9. 9.
    Run the command.

PCW Payload

  1. 1.
    Directly on the configuration file which is the recommended approach. Paste the payload to create.pcwPayload and make sure the value of create.pcwPayloadSource is like the example below.
    {
    ...
    "create": {
    ...
    "pcwPayloadSource": { "source": "inline" },
    "pcwPayload": <paste the values from PCW>
    }
    ...
    }
  2. 2.
    Use a storage (could be local file or any supported cloud storage). Below is the example using GCS. Paste the value to a JSON file in your bucket and fill create.pcwPayload with the path. Another attributes that must be filled are create.pcwPayloadSource and credentials. For other supported object storage, see here.
{
...
"credentials": {
"gcs": { "gcsCredentialJson": "<path-to-JSON-service-account-credential>" }
},
"create": {
...
"pcwPayloadSource": {
"source": "gcs",
"bucketName": "my-bucket-name"
},
"pcwPayload": <path-to-the-payload-in-JSON-file>
}
...
}

Assignment

List of Assignees (Labelers and Reviewers)

There are two ways to specify the list.
  1. 1.
    Using the labelers and reviewers that are already assigned on PCW. This is the default approach and you won't have to do a thing because it's already included on the configuration when you paste it from PCW.
  2. 2.
    Specify the list on your own. Create a file and specify the path on create.assignment attribute. The values of the file should be like this below.
    • If useTeamMemberId is true, fill both labelers and reviewers with teamMembeId.
    • If useTeamMemberId is false, fill both labelers and reviewers with their emails.
    {
    "labelers": [...], // list of emails
    "reviewers": [...], // list of emails
    "useTeamMemberId": false
    }

Distribution

Currently, we are supporting two assignment distributions.
  1. 1.
    Across documents (default approach). You would only need to specify create.pcwAssignmentStrategy value. Here is the supported approach.
    • AUTO: distribute documents to labelers using round-robin algorithm, i.e. each document will only be assigned by exactly one labeler.
    • ALL: labelers will be assigned to all documents.
    Please note that the reviewers will be assigned to all projects and documents.
  2. 2.
    Across projects. To use this approach, you would have to specify the labelers and reviewers list on your own just like mentioned on the List of Assignees section. Follow the steps below.
    {
    ...
    "create": {
    ...
    "assignment": {
    "source": "local",
    "path": "quickstart/token-based/config/assignment.json",
    "by": "PROJECT",
    "strategy": "AUTO"
    },
    // remove pcwAssignmentStrategy
    // remove documentAssignments from pcwPayload
    ...
    }
    }
    1. 1.
      Create the assignment file and specify it on create.assignment.
    2. 2.
      Fill project as the value of create.assignment.by attribute.
    3. 3.
      Select assignment strategy by filling the create.assignment.strategy. There are two ways supported.
      1. 1.
        AUTO: distribute both labelers and reviewers using round-robin. Only one labeler and reviewer for each project.
      2. 2.
        ALL: all reviewers and labelers will be assigned to each project.
    4. 4.
      Remove create.pcwAssignmentStrategy attribute and documentAssignments attribute from pcwPayload.

Tagging Projects

Newly created projects from Robosaur can be tagged automatically.
From the PCW payload that you have copied using the recommended approach from the previous section (directly on the config file), add a new field called tagNames under create.pcwPayload.variables.input, and specify the tags for the projects. If the tags did not exist yet, they will be created for you.
{
...
"create": {
...
"pcwPayload": {
...
"variables": {
...
"input": {
...
"tagNames": ["TAG 1", "TAG 2"]
}
}
}
}
...
}
Or, if the PCW Payload is on an external file (whether it is local or from a cloud storage), add the tagNames field in variables.input, and specify the tags for the projects.
{
...
"variables": {
...
"input": {
...
"tagNames": ["TAG 1", "TAG 2"]
}
}
}

ML-Assisted Labeling

Automate the labeling process on the newly created projects using ML-assisted labeling.
In the config file, add the autoLabel field under create and fill in the required fields. The target API requires the project to have a label set to be able to work properly.
{
...
"create": {
...
"autoLabel": {
"enableAutoLabel": true,
"labelerEmail": "<EMAIL>", // use your Datasaur's account email
"targetApiEndpoint": "<API_ENDPOINT>", // your custom API model
"targetApiSecretKey": "<API_SECRET>", // if needed
"numberOfFilesPerRequest": 1
}
}
...
}
With this, every time a project is created, the ML-assisted labeling will be triggered and there will be labels applied on the new project, depending on your custom API model response.