With IRSA

Using your own S3 bucket for Datasaur projects with IRSA delegated permission

Specific for self-hosted from AWS Marketplace, the delegated permission method being used would be through IAM Roles for Service Accounts (IRSA). Although the overall approach is almost the same as the original approach (parent page), this page is still needed to avoid any confusion between the two and make it clear for the self-hosted users through AWS Marketplace. One of the main differences is when creating the IAM role, specifically the 4th step.

File Key

This attribute will be used when you create a project to tell Datasaur which file should be used. You can get it by using the path after bucket name on S3 URI. See the example below.

  • Bucket name: datasaur-test

  • S3 URI: s3://datasaur-test/some-folder/image.png

  • File key: /some-folder/image.png

Setup

By integrating your bucket into Datasaur, you would be able to create projects using files directly from your S3.

1. Setup External Object Storage Integration in Datasaur Team Settings

Let's begin by setting up an Integration in Team Settings. By default, Datasaur uses its own storage to manage your projects. By adding another one, we can use your preferred storage provider when creating projects.

  1. Open your team page, then go to Settings > Integrations.

  2. Click on "Add External Object Storage". A new window will pop up. Do not close the pop up because we will use the External ID and it will be generated each time you close the form.

  3. You can start by filling the name attribute. It will be used to reference and differentiate between external object storage.

We'll get back to this window later. Let's leave it for now.

2. Setup CORS for your S3 bucket

This step would allow Datasaur to access resources in your bucket.

  1. Log into your AWS account, then go to S3 management console.

  2. Click on your preferred bucket. And also, it's highly recommended to enable the lifecycle policy for both temp/ and export/ prefix to be removed in 7 days.

  3. Open Permissions. Edit the Cross-origin resource sharing (CORS) section, and paste the following configurations.

[
  { 
    "AllowedHeaders": ["*"], 
    "AllowedMethods": [
      "GET",
      "PUT",
      "POST",
      "HEAD",
      "DELETE"
    ],
    "AllowedOrigins": ["<FILL_THIS_WITH_YOUR_DOMAIN>"],
    "ExposeHeaders": []
  }
]
  • Bucket name: Fill with the name of the bucket that you just set the CORS for.

  • Bucket prefix: It will be added at the start of the bucket so that you can group it according to your needs, e.g. test will refer to /{bucket-name}/test.

  • Allowed origins: Change it to your self-hosted domain for the Datasaur app.

3. Create a policy for Datasaur role in AWS

You need to create a policy to access your S3 bucket. If you have already setup a policy for accessing the bucket, feel free to skip this step.

  1. In your AWS IAM management console, go to Policies, then click on Create Policy.

  2. Choose the JSON tab, and paste the following configurations. Don't forget to replace the resource with your bucket name. The write permission will be used to upload the selected files to your bucket whereas the get bucket location will be used to configure the request based on your bucket's region.

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Action": [
            "s3:ListBucket",
            "s3:ListBucketVersions",
            "s3:PutObjectAcl",
            "s3:PutObject",
            "s3:GetObjectAcl",
            "s3:GetObject",
            "s3:DeleteObjectVersion",
            "s3:DeleteObject",
            "s3:GetBucketLocation"
          ],
          "Effect": "Allow",
          "Resource": [
            "arn:aws:s3:::<your-bucket-name>/*",
            "arn:aws:s3:::<your-bucket-name>"
          ]
        }
      ]
    }
  3. Click on Next: Tags. We don't require tags to be added, but you can add tags here if you want.

  4. Click on Next: Review. Input a name for the AWS Policy, a description (optional), and click on Create Policy.

4. Create a role for Datasaur

After we've created a policy for your S3 bucket, we need to attach it to a role which will be assumed by Datasaur to access your bucket.

  1. Back on the IAM management console, go to Roles, then click on Create role.

  2. Choose AWS account in the trusted entity type section.

  3. Click on the Custom trust policy for the trusted entity type attribute. You can then paste this configuration below.

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::<DATASAUR_AWS_ACCOUNT_ID>:role/<IRSA_ROLE_NAME>"
          },
          "Action": "sts:AssumeRole",
          "Condition": {
            "StringEquals": {
              "sts:ExternalId": "<YOUR_EXTERNAL_ID>"
            }
          }
        }
      ]
    }
  4. Replace the values for AWS account ID, IRSA role name, and external ID accordingly. Use the displayed AWS Account ID. You can define your own external ID, just be sure to update the value in the external object storage form.

  5. In the Add permissions section, pick the policy that we've just created from the previous step. Then, click on Next.

  6. Input a name, (optional) a description, and click on Create role.

  7. After that, back on the Roles page, click on your newly created role.

  8. Copy the Role ARN from the page and paste it in Datasaur Team Settings Page.

5. Check connection

Before you create the integration, you do a check connection to make sure your setup is done correctly. If it's a success, you can continue to create the external object storage.

6. Good to go!

Now, you will be able to create projects using files directly from your S3 bucket, and also change the Default Storage option to whichever one you want from Team Settings page.

If you have any questions or comments, please let us know, and we'll be happy to support you.

Last updated