Best Practices in Data Source
- If table is a transactional, log based or event track type then data will be very huge as it generates millions of records every day. So, these type of tables can be ingested under incremental mode instead of complete loading. This decreases the run time of the ingestion.
- Please follow this link for better understanding of incremental ingestion http://docs.sprinkledata.com/docs/feature_data_source_ingestion/#incremental-ingestion-mode.
Create different data sources based on schedule frequency. For example if tables need to pulled on real time then add that table in real time scheduled data source. If tables need to pulled on hourly basis then add that table in hourly scheduled data source.
Avoid adding same table in different data sources.
Increase concurrency if you are ingesting multiple tables. If you are ingesting 5 tables then you can fix concurrency as
5. So, that all the five tables ingest parallely. For the best performance max concurrency you can use is