Apache Kafka is an open-source publishing/subscribing messaging system developed by LinkedIn. Streaming large data on a real-time basis is a stern test and it’s a time-consuming process but with Kafka, the data is streamed into the server as it comes.
Kafka was generally claimed to be an alternative for log aggregation but it gives a better abstraction and as a whole, Kafka is better than most message brokers.
Sprinkle supports a wide range of data sources. On clicking the “+sign”, a list of data sources pops up. In this case, Kafka is selected. A new Kafka data source is named and created.
After naming the data source, the configure tab would require the user to select between “zookeeper” and “bootstrap server.” On selecting Zookeeper, its connection string should be applied.
On selecting Bootstrap, its server id should be applied.
In Add Tables, the user must name the topic before selecting between automatic schema and manual schema. Sprinkle specializes in automatic schema features i.e. creating tables with automatic warehouse schema. On selecting “No” for automatic schema feature, it requires the user to fill in the warehouse schema.
In the Run and Schedule tab, the concurrency (number of tables that can run in parallel, a maximum of 7) can be set preferentially before running the job. The status of the job will be updated in the tab below once it’s complete. The jobs can also be set to run automatically by enabling autorun. By default, the frequency is set to every night. Frequency can be changed by clicking on More --> Autorun-->Change Frequency.