Data Warehouse

Tutorial Videos

API

Storage and Compute

Data Sources

CDC Setup

Transform

KPI

Models
Segments

Dashboard

Drill Down

Explores

Machine Learning

Sharing

Scheduling

Notifications

View Activity

Admin

Launch On Cloud

FAQs

FAQ's

Security

Feedback

Option to take feedback from UI

Release Notes

Release Notes

Amazon Athena

Models

Segments

Amazon Athena is an interactive query warehouse service that makes it easy to analyze data using standard SQL.

Sprinkle Data integrates with Amazon Athena’s warehouse which is serverless. It runs on standard SQL and is built on presto. Athena provides interactive performance even for large data sets, and also at a much faster rate.

Let’s dive deep on the configuration part of Amazon Athena data warehouse in Sprinkle

Admin -> Drivers

The driver’s tab routes the user to Storage, Warehouse, and Compute tabs.

Driver list

Create new warehouse -> Amazon Athena

On clicking the Create new warehouse, the screen displays a number of data warehousing options, Amazon Athena is selected in this case.

Create Athena Warehouse

Athena Setup

If you are setting up Athena in your AWS account for the first time, you can follow the following steps

  1. open Athena service on the console for the region of your interest : For example https://ap-south-1.console.aws.amazon.com/athena/home?region=ap-south-1.
  2. Setup ‘Query result location’ for storing query outputs in athena. For example s3://query-results-bucket/folder/ (Don’t forget to give / (slash) at the end of path, it wont recognize the path otherwise) Note down this path further.
  3. Create database using ‘create database <dbname>’ query in Query editor.

Purging of S3 output location

  1. Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.
  2. In the Buckets list, choose the name of the bucket that you want to create a lifecycle rule for.
  3. Choose the Management tab, and choose Create lifecycle rule.
  4. In the Lifecycle rule name, enter a name for your rule. The name must be unique within the bucket.
  5. Choose the scope of the lifecycle rule as Limit the scope of this rule using one or more filters. To limit the scope by prefix, in Prefix, enter the prefix. (Folder name which we specified in Athena Query result location)
  6. Under Lifecycle rule actions, choose the following actions that this lifecycle rule will perform:
  • Expire current versions of objects
  • Permanently delete previous versions of objects
  1. To expire current versions of objects, under Expire current versions of objects, in Number of days after object creation, enter the number of days as 1.
  2. To permanently delete previous versions of objects, under Permanently delete previous versions of objects, in Number of days after objects become previous versions, enter the number of days as 1.
  3. Verify the rule once again and choose Create rule.
  4. The created rule will appear in the Life cycle rules.

Sprinkle Driver configuration

Athena warehouse configuration on sprinkle requires following parameters to be configured.

  1. Athena JDBC url : Configure url as _'jdbc:awsathena://athena.<region>.amazonaws.com:443/' _by replacing the region in which athena is setup on aws account.
  2. Provide Access key and secret key for accessing athena. The Access key should have AmazonS3FullAccess and AmazonAthenaFullAccess permission/policies enabled. Detailed documentation available at https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys for creating keys.
  3. For providing access with limited permissions refer DB level access.
  4. S3 output location : This is the path of the Amazon S3 location where you want to store query results, prefixed by s3://; This is the same path given on your athena setup.
  5. Schema : The name of the database to use in sprinkle, when a schema is not explicitly specified in a query.

Database Level Access in Athena

Steps to be followed

  1. Create new database in Athena
  2. Create IAM Policy
  3. Create IAM User
  4. Create Access key
  5. Update Sprinkle driver

Create Athena Database

  1. Login to AWS console and goto Athena.
  2. Run the create database command in the query editor by updating db name.
  3. _CREATE DATABASE <DB_NAME> _

Create IAM Policy

  1. Login to AWS console, goto IAM and select Policy from left pane.
  2. Click on Creare policy and goto JSON tab.
  3. Paste the following policy in the editor with updating region, database name, account number and S3 bucket name.

Access Policy

" {
   "Version": "2012-10-17",
   "Statement": [
       {
           "Sid": "VisualEditor0",
           "Effect": "Allow",
          "Action": [
              "glue:GetDatabase",
              "glue:GetDatabases",
              "glue:GetPartition",
              "glue:CreateTable",
              "glue:GetTables",
              "glue:GetPartitions",
              "glue:CreateDatabase",
              "glue:UpdateTable",
              "glue:DeleteTable",
              "glue:CreatePartition",
              "glue:DeletePartition",
              "glue:UpdatePartition",
              "glue:GetTable",
              "athena:StartQueryExecution",
              "athena:GetQueryExecution",
              "athena:GetQueryResults",
              "athena:GetQueryResultsStream"
          ],


           "Resource": [
               "arn:aws:glue:<REGION>:<ACCOUNT_NO>:catalog",
               "arn:aws:glue:<REGION>:<ACCOUNT_NO>:database/<DB_NAME>",
               "arn:aws:glue:<REGION>:<ACCOUNT_NO>:table/<DB_NAME>/*",
               "arn:aws:athena:<REGION>:<ACCOUNT_NO>:workgroup/primary"
           ]
       },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucketMultipartUploads",
                "s3:ListBucket",
                "s3:DeleteObject",
                "s3:GetBucketLocation",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::<BUCKET_NAME>/*",
                "arn:aws:s3:::<BUCKET_NAME>"
            ]
        }
   ]
} "

Create IAM user

  1. Login to AWS console, goto IAM and select User from the left pane.
  2. Click on Add User, provide User name and select access type as Programmatic Access.
  3. Attach the policy created for Athena access from Attach existing policy directly.

Create Access key

  1. Goto the IAM users and select the newly created user.
  2. Goto Security Credentials tab and click on create access key.
  3. This will create the access key. Download the csv file and click on show secret to copy the secret.
Credentials

import requests
from requests.auth import HTTPBasicAuth

auth =  HTTPBasicAuth(<API_KEY>, <API_SECRET>)
response = requests.get("https://<hostname>/api/v0.4/explore/streamresult/<EXPLORE_ID>", auth)

print(response.content)

library('httr')

username = '<API KEY>'
password = '<API SECRET>'

temp = GET("https://<hostname>/api/v0.4/explore/streamresult/<EXPLORE ID>",
           authenticate(username,password, type = "basic"))

temp = content(temp, 'text')
temp = textConnection(temp)
temp = read.csv(temp)

/*Download the Data*/

filename resp temp;
proc http
url="https://<hostname>/api/v0.4/explore/streamresult/<EXPLORE ID>"
   method= "GET"  
   WEBUSERNAME = "<API KEY>"
   WEBPASSWORD = "<API SECRET>"
   out=resp;
run;

/*Import the data in to csv dataset*/
proc import
   file=resp
   out=csvresp
   dbms=csv;
run;

/*Print the data */
PROC PRINT DATA=csvresp;
RUN;

import requests
import json

url='http://hostname/api/v0.4/createCSV'

username='API_KEY'
password='API_SECRET'

files={'file':open('FILE_PATH.csv','rb')}
values={'projectname':PROJECT_NAME','name':'CSV_DATASOURCE_NAME'}

r=requests.post(url, files=files, data=values, auth=(username,password))

res_json=json.loads(r.text)

print(res_json['success'])

import requests
import json

url='http://hostname/api/v0.4/updateCSV'

username='API_KEY'
password='API_SECRET'

files={'file':open('FILE_PATH.csv','rb')}
values={'projectname':PROJECT_NAME','name':'CSV_DATASOURCE_NAME'}

r=requests.post(url, files=files, data=values, auth=(username,password))

res_json=json.loads(r.text)

print(res_json['success'])

import requests

url='https://<hostname>/api/v0.4/explore/streamresult/<EXPLORE ID>'

username='API_KEY'
password='API_SECRET'

r=requests.get(url,auth=(username,password))
print(r)
print(r.text)

import requests
import pandas as pd

url='https://<hostname>/api/v0.4/explores/infoByFolder/<SPACE_ID>'

username='API_KEY'
password='API_SECRET'

r=requests.get(url,auth=(username,password)).json()
df = pd.DataFrame(r)
print(df)

import requests
import pandas as pd

url='https://<hostname>/api/v0.4/folders/byOrgName/<ORG_NAME>'

username='API_KEY'
password='API_SECRET'

r=requests.get(url,auth=(username,password)).json()
df = pd.DataFrame(r)
print(df.loc[:,['name','id']])

import requests

import pandas as pd

import io

url='https://<hostname>/api/v0.4/explore/streamresult/<EXPLORE ID>'

secret='API_SECRET'

r=requests.get(url,headers = {'Authorization': 'SprinkleUserKeys ' +secret } )

df = pd.read_csv(io.StringIO(r.text),sep=',')

import requests

import pandas as pd

import io

url='https://<hostname>/api/v0.4/segment/streamresult/<SEGMENT ID>'

secret='API_SECRET'

r=requests.get(url,headers = {'Authorization': 'SprinkleUserKeys ' +secret } )

df = pd.read_csv(io.StringIO(r.text),sep=',')

import requests

import json

url='http://hostname/api/v.o4/createCSV'

files={'file':open('path/file.csv’')}

values={'projectname':PROJECT_NAME,'name':'csv_datasource_name/table_name'}

secret='API_SECRET'

r=requests.post(url, files=files, data=values, headers = {'Authorization': 'SprinkleUserKeys ' +secret } )

res_json=json.loads(r.text)

import requests

import json

url='http://hostname/api/v.o4/updateCSV'

files={'file':open('path/file.csv’')}

values={'projectname':PROJECT_NAME,'name':'csv_datasource_name/table_name'}

secret='API_SECRET'

r=requests.post(url, files=files, data=values,headers = {'Authorization': 'SprinkleUserKeys ' +secret } )

res_json=json.loads(r.text)