Automate AWS RedShift Snapshot And Restore

Automate AWS RedShift Snapshot And Restore

Redshift will help to handle a massive data warehouse workload. I used to manage some redshift cluster in past. Whenever the developers or I wanted to test something on RedShift, we generally take a snapshot and then launch a new cluster or launch it from the automated snapshot. This is fine for Ad-Hoc workloads. Think something like, if your developers want to continually test and run their sample queries on the cluster on daily basis with updated data then there will be a headache for AWS Admins, So I have prepared a shell script for this to mate this process and it’ll send the email alerts when any one of the steps are failed. This script will help you to automate AWS Redshift snapshot and restore. 

How this works?

I create this shell script which will work using AWS CLI.  The flow of this process is,

  • Remove old Dev/Test Cluster (which was created yesterday).
  • Take a snapshot of current Prod Cluster .
  • Wait for Snapshot complete .
  • Launch a new cluster from the snapshot.
  • wait for the creation complete. 
  • Delete older than one day snapshot which is created by this script.

Pre-Requirements:

  1. Create an IAM user with Access and Secret keys then attach the below policy.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ses:SendEmail",
                "redshift:CreateClusterSnapshot",
                "redshift:DeleteClusterSnapshot",
                "redshift:DescribeClusterSnapshots",
                "redshift:CreateCluster",
                "ses:SendRawEmail"
            ],
            "Resource": "*"
        }

2. Install AWS CLI. 
Follow this AWS documentation for configuring aws cli.

Before running the script:

Please change the necessary values which are mentioned below.

YOUR_ACCESS_KEY IAM user’s Access Key
YOUR_SECRET_KEY IAM user’s Secret Key  
prod-cluster Prod/Main cluster name
dev-cluster New Test/DEV cluster name
REDSHIFT-REGION Region where your cluster located
ses-region Region for your SES
[email protected] From Address for SES (this should be verified one)
[email protected],[email protected] Who all are needs to get the email notification
default.redshift-1.0 If you are using custom parameter group then replace this with that name.
“sg-id1” “sg-id2” Security group ids that you want to attach it to Redshift Cluster.

Parameters

Snapdate=`date +%Y-%m-%d-%H-%M-%S`
SourceRedshift='prod-cluster'
DestRedshift='dev-cluster'
Region='REDSHIFT-REGION'

Drop old TEST/DEV Cluster:

aws redshift  delete-cluster \
--region $Region  \
--cluster-identifier  $DestRedshift \
--skip-final-cluster-snapshot 

Initiate the Snapshot of PROD/MAIN cluster

aws redshift create-cluster-snapshot \
--region $Region  \
--cluster-identifier $SourceRedshift  \
--snapshot-identifier $SourceRedshift-refresh-snap-$Snapdate

Restore the Snapshot

aws redshift restore-from-cluster-snapshot \
--region $Region \
--cluster-identifier $DestRedshift  \
--snapshot-identifier $SourceRedshift-refresh-snap-$Snapdate \
--cluster-subnet-group-name reshiftsubnet \
--cluster-parameter-group-name default.redshift-1.0 \
--vpc-security-group-ids  "sg-id1" "sg-id2"

Delete Old snapshot(Which is created by this script):

Deldate=prod-cluster-refresh-snap-`date -d "1 days ago" +%Y-%m-%d`
Delsnap=$(aws redshift describe-cluster-snapshots --region ses-region --query 'Snapshots[].SnapshotIdentifier' --output json | grep $Deldate |   sed -n '2p' |  sed 's|[",,]||g')
aws redshift delete-cluster-snapshot \
--region $Region \
--snapshot-identifier $Delsnap

The complete script with email alert:

You may also like this

Leave a Reply

Your email address will not be published. Required fields are marked *