Automatically Copy Files from Amazon S3 to Amazon Redshift

This article shows how to use Amazon Redshift S3 event integration to automatically copy files from an S3 bucket into a Redshift table. When a new file is added to the S3 bucket, Redshift automatically runs a COPY command to load the data into the target table.

About this Use Case

Amazon S3 stores structured and unstructured data.
Amazon Redshift is a managed data warehouse for storing, processing, and analyzing structured data.

While Xtract Universal offers an Amazon S3 destination and an Amazon Redshift destination to write SAP data to both target environments, there are benefits to using S3 event integration:

Only extracting SAP data once reduces load on the SAP source system.
Data in S3 is accessible to multiple AWS services (Redshift, Athena, EMR, Glue, Lake Formation).
If Redshift is unavailable or under maintenance, auto-copy resumes when Redshift is ready.
The S3 event integration tracks loaded files and handles errors automatically.

Set Up Access Policies in AWS

To allow Redshift to access data in S3 buckets:

Add an S3 bucket policy that allows Redshift to read notifications and metadata from your S3 bucket:

Bucket Policy

{
    "Version": "2012-10-17",
    "Statement": [{
        "Sid": "Auto-Copy-Policy-01",
        "Effect": "Allow",
        "Principal": { "Service": "redshift.amazonaws.com" },
        "Action": [
            "s3:GetBucketNotification",
            "s3:PutBucketNotification",
            "s3:GetBucketLocation"
        ],
        "Resource": "arn:aws:s3:::amzn-s3-demo-bucket",
        "Condition": {
            "StringLike": {
                "aws:SourceArn": "arn:aws:redshift:us-east-1:123456789012:integration:*",
                "aws:SourceAccount": "123456789012"
            }
        }
    }]
}

Note

Make sure that the S3 bucket and Redshift S3 event integration use the same AWS Region.

Attach the following permissions to the Redshift cluster's IAM role.

IAM Policy

{
    "Version": "2012-10-17",
    "Statement": [{
        "Sid": "AutoCopyReadId",
        "Effect": "Allow",
        "Action": ["s3:GetObject", "s3:ListBucket"],
        "Resource": [
            "arn:aws:s3:::amzn-s3-demo-bucket",
            "arn:aws:s3:::amzn-s3-demo-bucket/*"
        ]
    }]
}

Verify the IAM trust relationship, so Redshift can assume the role:

Trust Relationship

{
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Principal": { "Service": ["redshift.amazonaws.com"] },
        "Action": "sts:AssumeRole"
    }]
}

If the S3 bucket and Redshift use different accounts, a resource policy is required. Skip this step, if the S3 bucket and Redshift cluster are in the same account.

Create a resource policy (rs-rp.json) that lets Redshift create and authorize inbound data integrations from the S3 bucket:

rs-rp.json

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": { "Service": "redshift.amazonaws.com" },
            "Action": "redshift:AuthorizeInboundIntegration",
            "Resource": "arn:aws:redshift:us-east-1:123456789012:namespace/cc4ffe56-ad2c-4fd1-a5a2-f29124a56433",
            "Condition": {
                "StringEquals": {
                    "aws:SourceArn": "arn:aws:s3:::amzn-s3-demo-bucket"
                }
            }
        },
        {
            "Effect": "Allow",
            "Principal": { "AWS": "arn:aws:iam::123456789012:role/myRedshiftRole" },
            "Action": "redshift:CreateInboundIntegration",
            "Resource": "arn:aws:redshift:us-east-1:123456789012:namespace/cc4ffe56-ad2c-4fd1-a5a2-f29124a56433",
            "Condition": {
                "StringEquals": {
                    "aws:SourceArn": "arn:aws:s3:::amzn-s3-demo-bucket"
                }
            }
        }
    ]
}

Apply the resource policy using the AWS command-line interface:

AWS CLI

aws redshift put-resource-policy \
--policy file://rs-rp.json \
--resource-arn 
"arn:aws:redshift:us-east-1:123456789012:namespace/cc4ffe56-ad2c-4fd1-
a5a2-f29124a56433"

Redshift is now allowed to access data from S3 buckets.

Create an S3 Event Integration

To set up an S3 event integration that automatically copies files from your S3 bucket into a Redshift target table:

Open the Amazon Redshift console.
In the S3 event integrations menu, click Create Amazon S3 event integration.
Follow the steps in the setup wizard.
Once the S3 event integration is created, click [Create autocopy job]. The Redshift Query Editor opens.

Create a target table in Redshift.
Make sure that column names and data types of the target table match the structure of the expected SAP data in the S3 bucket.

Target Table

CREATE TABLE public.makt1 (
    mandt CHAR(3),
    matnr VARCHAR(40),
    spras CHAR(1),
    maktx VARCHAR(40),
    maktg VARCHAR(40)
);

Create the auto-copy job:

COPY Job

COPY "public"."makt1"
FROM 's3://amzn-s3-demo-bucket/staging-folder'
IAM_ROLE 'arn:aws:iam::123456789012:role/MyLoadRoleName'
JOB CREATE makt_copy_job_new
AUTO ON;

copy-job-in-redshift

This job detects new files in the S3 bucket and loads them automatically into the target table.

Run SAP Data Extractions

To write your SAP data to an S3 bucket and by extension to your Redshift target table:

Run an Xtract Universal extraction that writes a Parquet file to your S3 bucket. For more information, see Documentation: Amazon S3 Destination.
Verify that the file is created in the S3 bucket.
Verify that the data was auto-copied into your Redshift target table.

Tip

For information on how to schedule and automate your Xtract Universal extractions, see Documentation: Call via Scheduler.

Last update: October 1, 2025
Written by: Bharath Gorapalli, Valerie Schipka