Redshift
Amazon Redshift is a data warehouse product which forms part of the larger cloud-computing platform Amazon Web Services. Amazon Redshift is based on an older version of PostgreSQL and Redshift has made changes to that version.An initial preview beta was released in November 2012 and a full release was made available on February 15, 2013.
Sprinkle now integrates with Redshift and moves your data to your preferred data warehouse.
Now lets see how users can configure Redshift in the Sprinkle platform.
First the users need to go on Admin -> Drivers.
Admin tab consists of a number of subcategories of which Driver is one. The drivers tab routes the user to Storage, Warehouse and Compute tabs.
On clicking the Create new warehouse, the pop up displays a number of data warehouses. In this case, Redshiftis selected.
This will take the users to a new page wherein he needs to provide the details as requested.
Copy the jdbc url for your Redshift cluster with the following steps. More documentation at https://docs.aws.amazon.com/redshift/latest/mgmt/configuring-connections.html#connecting-connection-string.
1. Redshift console -> Clusters -> Choose the cluster
2. Copy the JDBC url from General Information. It will be of the following form jdbc:redshift://xxxxx.xxxxxxxxxx.<region>.redshift.amazonaws.com:5439/<databasename>
The user is required to specify a master user name, master user password and spectrum schema.
The spectrum schema can be created with the following steps. For more detailed information users can visit https://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum.html
- Create IAM role with following steps
IAM console -> Create role
Choose AWS service ->Redshift -> Redshift-Customizable
Choose AmazonS3FullAccess and AmazonAthenaFullAccess policies. If you want to use granular permissions, create a new Policy with following permissions.
{ "Effect": "Allow", "Action": [ "glue:BatchCreatePartition", "glue:UpdateDatabase", "glue:CreateTable", "glue:DeleteDatabase", "glue:GetTables", "glue:GetPartitions", "glue:BatchDeletePartition", "glue:UpdateTable", "glue:BatchGetPartition", "glue:DeleteTable", "glue:GetDatabases", "glue:GetTable", "glue:GetDatabase", "glue:GetPartition", "glue:CreateDatabase", "glue:BatchDeleteTable", "glue:CreatePartition", "glue:DeletePartition", "glue:UpdatePartition" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:ListBucketMultipartUploads", "s3:ListBucket", "s3:DeleteObject", "s3:GetBucketLocation", "s3:ListMultipartUploadParts" ], "Resource": [ "arn:aws:s3:::bucketname/*", "arn:aws:s3:::bucketname" ] }
Create role by specifying name : SprinkleSpectrumRole
Copy the role ARN arn:aws:iam::
294374035562:role/SprinkleSpectrumRole
- Associate IAM role with Redshift cluster
- Redshift console -> Clusters -> Choose the cluster
- Actions -> Manage IAM roles
- Choose the IAM role created in Step 1 -> Done
- Create external schema named sprinkle_spectrum using following command , replace iam_role with ARN copied in step 1
create external schema sprinkle_spectrum
from data catalog
database 'dev'
iam_role 'arn:aws:iam::~~294374035562~~:role/SprinkleSpectrumRole'
create external database if not exists;
Once these details are provided users need to click on “Create”and a new Redshift data warehouse will be created, provided all the details provided are correct. If a user wants to check if the details are correct, he can click on “Test Connection” before clicking on “Create”.