Data Warehouse

Tutorial Videos

API

Storage and Compute

Data Sources

CDC Setup

Transform

KPI

Models
Segments

Dashboard

Drill Down

Explores

Machine Learning

Sharing

Scheduling

Notifications

View Activity

Admin

Launch On Cloud

FAQs

FAQ's

Security

Feedback

Option to take feedback from UI

Release Notes

Release Notes

Handling Ingestion

Models

Segments

How Sprinkle handles the ingestion if there is a change in schema in the client DB?

**Adding a new column in the table. **

If a new column gets added in the table at client DB and if the table is under incremental or complete ingestion mode, ingestion will not fail.

In Sprinkle, if the table is under complete loading, ingestion will succeed and data reflects in the same way as in the client DB without any failure.

If the table is under incremental with updated_at time column then it reflects the same in Sprinkle without any failure. Because if a row gets updates in the table, updated_at column also updates with the latest time. Sprinkle automatically detects that row and re-ingests it.

But if it is a cosmos or mongodb or some other data source where manual schema is given, then the new key will not appear in the table. So, the user needs to drop the table and reingest with the desired schema.

If the table is under incremental with some other time column or it is under new_rows incremental, then old data of the new column will be null as shown below. The user needs to drop and recreate to update it

In sprinkle even if the column position gets interchanged, the schema looks the same as below and there won’t be any changes in the column values.

If a column is dropped in the table at client DB and if the table is under incremental or complete ingestion mode, it will not fail.

If it is under incremental loading with updated_at or some other time column, then null values are present in that dropped column in the new rows. Until this goes under complete ingestion (every night or every week) the dropped column still appears. But if it is a Cosmos or MongoDB or some other data source where manual schema is given, then null values are present in the dropped column for new rows. So, The user needs to drop and reingest with the desired schema.

**Change in the data type of a column in the table. **

If the data type of a column gets changed in the table at client DB and if the table is under incremental or complete ingestion mode, ingestion will not fail.

In Sprinkle, if the table is under complete loading or incremental loading the data type of a column changes according to clientDB. But if it is a Cosmos or MongoDB or some other data source where manual schema is given then data is ingested according to the schema. So, the user needs to drop the table and reingest with the desired schema.

Column case changed in the table.

In a Hive warehouse’s client database table, If there is any change in the column name from lowercase to uppercase or vice versa, there won’t be any change in Sprinkle for both complete and incrementally ingested tables. Column names are always present in the lowercase.

import requests
from requests.auth import HTTPBasicAuth

auth =  HTTPBasicAuth(<API_KEY>, <API_SECRET>)
response = requests.get("https://<hostname>/api/v0.4/explore/streamresult/<EXPLORE_ID>", auth)

print(response.content)

library('httr')

username = '<API KEY>'
password = '<API SECRET>'

temp = GET("https://<hostname>/api/v0.4/explore/streamresult/<EXPLORE ID>",
           authenticate(username,password, type = "basic"))

temp = content(temp, 'text')
temp = textConnection(temp)
temp = read.csv(temp)

/*Download the Data*/

filename resp temp;
proc http
url="https://<hostname>/api/v0.4/explore/streamresult/<EXPLORE ID>"
   method= "GET"  
   WEBUSERNAME = "<API KEY>"
   WEBPASSWORD = "<API SECRET>"
   out=resp;
run;

/*Import the data in to csv dataset*/
proc import
   file=resp
   out=csvresp
   dbms=csv;
run;

/*Print the data */
PROC PRINT DATA=csvresp;
RUN;

import requests
import json

url='http://hostname/api/v0.4/createCSV'

username='API_KEY'
password='API_SECRET'

files={'file':open('FILE_PATH.csv','rb')}
values={'projectname':PROJECT_NAME','name':'CSV_DATASOURCE_NAME'}

r=requests.post(url, files=files, data=values, auth=(username,password))

res_json=json.loads(r.text)

print(res_json['success'])

import requests
import json

url='http://hostname/api/v0.4/updateCSV'

username='API_KEY'
password='API_SECRET'

files={'file':open('FILE_PATH.csv','rb')}
values={'projectname':PROJECT_NAME','name':'CSV_DATASOURCE_NAME'}

r=requests.post(url, files=files, data=values, auth=(username,password))

res_json=json.loads(r.text)

print(res_json['success'])

import requests

url='https://<hostname>/api/v0.4/explore/streamresult/<EXPLORE ID>'

username='API_KEY'
password='API_SECRET'

r=requests.get(url,auth=(username,password))
print(r)
print(r.text)

import requests

import pandas as pd

import io

url='https://<hostname>/api/v0.4/explore/streamresult/<EXPLORE ID>'

secret='API_SECRET'

r=requests.get(url,headers = {'Authorization': 'SprinkleUserKeys ' +secret } )

df = pd.read_csv(io.StringIO(r.text),sep=',')

import requests

import pandas as pd

import io

url='https://<hostname>/api/v0.4/segment/streamresult/<SEGMENT ID>'

secret='API_SECRET'

r=requests.get(url,headers = {'Authorization': 'SprinkleUserKeys ' +secret } )

df = pd.read_csv(io.StringIO(r.text),sep=',')

import requests

import json

url='http://hostname/api/v.o4/createCSV'

files={'file':open('path/file.csv’')}

values={'projectname':PROJECT_NAME,'name':'csv_datasource_name/table_name'}

secret='API_SECRET'

r=requests.post(url, files=files, data=values, headers = {'Authorization': 'SprinkleUserKeys ' +secret } )

res_json=json.loads(r.text)

import requests

import json

url='http://hostname/api/v.o4/updateCSV'

files={'file':open('path/file.csv’')}

values={'projectname':PROJECT_NAME,'name':'csv_datasource_name/table_name'}

secret='API_SECRET'

r=requests.post(url, files=files, data=values,headers = {'Authorization': 'SprinkleUserKeys ' +secret } )

res_json=json.loads(r.text)