AWS S3

Using your own S3 bucket for Datasaur projects

File Key

This attribute will be used when you create a project to tell Datasaur which file should be used. You can get it by using the path after bucket name on S3 URI. See the example below.

  • Bucket name: datasaur-test

  • S3 URI: s3://datasaur-test/some-folder/image.png

  • File key: /some-folder/image.png

Setup

By integrating your bucket into Datasaur, you would be able to create projects using files directly from your S3.

1. Setup External Object Storage Integration in Datasaur Team Settings

Let's begin by setting up an Integration in Team Settings. By default, Datasaur uses its own storage to manage your projects. By adding another one, we can use your preferred storage provider when creating projects.

  1. Open your team page, then go to Settings > Integrations.

  2. Click on "Add External Object Storage". A new window will pop up. Do not close the pop up because we will use the External ID and it will be generated each time you close the form.

  3. You can start by filling the name attribute. It will be used to reference and differentiate between external object storage.

We'll get back to this window later. Let's leave it for now.

2. Setup CORS for your S3 bucket

This step would allow Datasaur to access resources in your bucket.

  1. Log into your AWS account, then go to S3 management console.

  2. Click on your preferred bucket. And also, it's highly recommended to enable the lifecycle policy for both temp/ and export/ prefix to be removed in 7 days.

  3. Open Permissions. Edit the Cross-origin resource sharing (CORS) section, and paste the following configurations.

[
  { 
    "AllowedHeaders": ["*"], 
    "AllowedMethods": [
      "GET",
      "PUT",
      "POST",
      "HEAD",
      "DELETE"
    ],
    "AllowedOrigins": ["https://app.datasaur.ai"],
    "ExposeHeaders": []
  }
]
  • Bucket name: fill with the name of the bucket that you just set the CORS for.

  • Bucket prefix: will be added at the start of the bucket so that you can group it according to your needs, e.g. test will refer to /{bucket-name}/test.

3. Create a policy for Datasaur role in AWS

You need to create a policy to access your S3 bucket. If you have already setup a policy for accessing the bucket, feel free to skip this step.

  1. In your AWS IAM management console, go to Policies, then click on Create Policy.

  2. Choose the JSON tab, and paste the following configurations. Don't forget to replace the resource with your bucket name. The write permission will be used to upload the selected files to your bucket whereas the get bucket location will be used to configure the request based on your bucket's region.

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Action": [
            "s3:ListBucket",
            "s3:ListBucketVersions",
            "s3:PutObjectAcl",
            "s3:PutObject",
            "s3:GetObjectAcl",
            "s3:GetObject",
            "s3:DeleteObjectVersion",
            "s3:DeleteObject",
            "s3:GetBucketLocation"
          ],
          "Effect": "Allow",
          "Resource": [
            "arn:aws:s3:::<your-bucket-name>/*",
            "arn:aws:s3:::<your-bucket-name>"
          ]
        }
      ]
    }
  3. Click on Next: Tags. We don't require tags to be added, but you can add tags here if you want.

  4. Click on Next: Review. Input a name for the AWS Policy, a description (optional), and click on Create Policy.

4. Create a role for Datasaur

After we've created a policy for your S3 bucket, we need to attach it to a role which will be assumed by Datasaur to access your bucket.

  1. Back on the IAM management console, go to Roles, then click on Create role.

  2. Choose AWS account in the trusted entity type section.

  3. Click on Another AWS account for the radio button. Fill the value from the first step by copying the Datasaur AWS Account ID (682361690817).

  4. Check the Require external ID, then paste the external ID from the first step just like above. After that, click on Next.

  5. In the Add permissions section, pick the policy that we've just created from the previous step. Then, click on Next.

  6. Input a name, (optional) a description, and click on Create role.

  7. After that, back on the Roles page, click on your newly created role.

  8. Copy the Role ARN from the page and paste it in Datasaur Team Settings Page.

5. Check connection

Before you create the integration, you do a check connection to make sure your setup is done correctly. If it's a success, you can continue to create the external object storage.

6. Good to go!

Now, you will be able to create projects using files directly from your S3 bucket, and also change the Default Storage option to whichever one you want from Team Settings page.

If you have any questions or comments, please let us know, and we'll be happy to support you.

Last updated