Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that runs on the same infrastructure that Google uses internally for its end-user products. This storage space is used as a cloud storage data source.
Sprinkle supports a wide range of data sources. On clicking the “+sign”, a list of data sources pops up. In this case, Google Cloud Storage Datasource is selected. A new Google Cloud Storage Data source is named and created.
After naming the data source, the connection tab would require the user to provide the Bucket Name, Private key JSON. The credentials can be tested if they are valid or not by testing the connection before updating.
Bucket name is the storage bucket name created in GCP. For example, Twx-Bucket
For accessing the storage, provide the Private key in json format. Follow the documentation at https://bit.ly/39tmz9H to generate a private key.
In Datasets, the user is required to specify a table name and select the type of ingestion, whether it is complete ingestion or incremental ingestion. Complete ingestion loads the entire data at once irrespective of the pre-existing data. This takes significant time if data is huge. In Incremental loading only new and latest data is ingested.
After selecting the ingestion mode, the File Type needs to be selected as either ORC, JSON, CSV or PARQUET. Then, the user can optionally define a directory path to pull data from, so that it pulls all the files in that specific path, Eg: gs://test-sprinkle-bucket/sprinkle//bigquery/datasource/big
In the Ingestion jobs tab, the concurrency (number of tables that can run in parallel, a maximum of 7) can be set preferentially before running the job. The status of the job will be updated in the tab below once it’s complete. The jobs can also be set to run automatically by enabling autorun. By default, the frequency is set to every night. Frequency can be changed by clicking on More --> Autorun-->Change Frequency.
Sprinkle supports different types of delimiters for CSV ingestion in GCS. When a user chooses CSV as the type of file then drop downs related to CSV file appear.
In the drop down there are delimiters like comma,tab,pipe, dash or other.
If the user chooses OTHER_CHARACTER as a type of CSV delimiter then one more field appears where the user can write the symbol for the delimiter.