mPokket success story with Sprinkle
How mPokket scaled their data pipelines using Sprinkle?
“Sprinkle is a one-stop solution for all our data engineering needs. By using Sprinkle, the data teams can leverage their limited time and concentrate on aspects that are core to the business. Sprinkle's support team has been working closely with us. We derive a lot of value from Sprinkle's solutions.”
mPokket began in 2016 with the goal of providing loans for students and young professionals in India who cannot access traditional financing. Depending on their profile and credit history, consumers can take instant cash loans ranging from Rs. 500 to Rs. 20,000, which are transferred directly into their bank accounts or Paytm accounts. Repayment plans are flexible with borrowers having the option of paying back the money over 3 months with a nominal interest rate. With over 1200 employees, mPokket is headquartered in Kolkata, and the tech team is based in Bengaluru.
We interviewed Ajay Garga, VP of Engineering in mPokket. He is responsible for driving the technology strategy and roadmap for mPokket's future. He oversees the development and delivery of products and initiatives from the engineering side. The 12 member data team, consists of data engineers and analysts. It is a relatively small team but their role is pivotal to the business. They are responsible for developing statistical and predictive models as well as creating reports for business users and other teams. A few of its key models are used to analyze credit limit and user behaviour for its users on its mobile application.
mPokket is a growing startup and the teams are actively developing applications and infrastructure. They follow the microservices and cloud architecture. Data from different microservices like user onboarding, user activation and activity, loan repayments, etc. are all stored in separate transactional databases. These transactional databases are built on MySQL and cockroach DB. The same transactional database was also used for analytics purposes. The teams would query the transactional database for their reporting tasks.
The data team was facing challenges on multiple fronts. Since the same database was being utilized for both, its mobile application and for reporting tasks, it hindered scalability. As the number of users on the application grew, it increased the size of the database. A growing transactional database meant increased load on the server during the reads. The open-source version of the Cockroach database they were using didn’t support read replica, further limiting the use for analytics purposes. Thus, the team felt the need to have clear segregation between the analytical and the functional database.
The other problem at hand was to build an archival view of the historical data collected on the application. Historic data was needed to build the predictive models for credit limit analysis and draw insights into the user behaviour from their interaction on the mPokket application. Building the historical database was not possible with the transactional systems.
The team initially tried to build the data pipeline using the open-source solutions using R for scripting and Hive for data storage. But the team ran into both functional and performance issues in the process. It is then they decided to look out for a commercial solution that could quickly help them and is scalable according to their needs.
The data teams contacted 4 vendors to provide them with solutions. The problem at hand was to isolate analytical and transactional databases. And then build a data warehouse and the complete data pipeline to ingest data into the warehouse and use the data for analytical purposes.
A PoC was done with the Sprinkle team incorporating Sprinkle’s Data Engineering solution. The PoC continued for a couple of months in which the teams built for the real use cases of providing reporting pipeline to the analytics and business teams. The data team at mPokket had gained expertise with using Hive as the data warehouse and thus wanted to continue with it. The sprinkle team helped build the initial data ingestion and ETL pipeline incorporating Hive as the data warehouse.
After benchmarking the vendors across parameters like functionality, performance and cost, the data team decided to go ahead with Sprinkle’s solution. Once the team gained expertise in building pipelines using Sprinkle, they developed pipelines to import data from the MySQL database into Hive. The Sprinkle team also helped them build the data pipeline for the cockroach DB, by exporting data into the S3 bucket and then ingesting it to Hive using Sprinkle.
Presently over 400 tables in MySQL and over 150 tables in Cockroach DB is being ingested into Hive using Sprinkle. Sprinkle is being used for multiple use cases. The data from the production database is being pushed to Hive periodically. The tables from various microservices are joined to build tables covering important metrics. These metrics are tracked hourly to ensure the smooth running of the business. Metrics like the number of new users onboarded, repayments and mode of repayments are being scheduled on Sprinkle.
Sprinkle is being used to get data from multiple tables for ad-hoc analytical queries, like tracking the number of requests across branches, checking seasonal spikes, etc. The product team also uses Sprinkle for A/B testing to test different functionality. For example, tracking the user behaviour and the business impact by increasing the credit limit of a certain section of users. Additionally, the analytics capabilities of Sparkle are being used to create reports that are used to track the KPIs of different verticals of business. The normalized tables created on Sprinkle are also being used in Tableau to perform analysis.
With ready-to-use connectors available for data sources, the data team can now quickly set up ingestion jobs. Not so tech-savvy users, as well as advanced users, have been able to derive value out of the Sprinkle platform.
Ajay says, “Getting data into Hive and setting the ingestion frequency is simple. Sprinkle manages these jobs and we need not be bothered about it. ETL is as simple as drag and drop and even non-tech savvy users can set it up without the need of knowing the complexities of data warehouse and data querying concepts. Advanced users are able to develop pipelines using SQL and python ”.
In this data journey, mPokket's data team and Sprinkle's team have been working together. While the data team works continuously to integrate more tables into the flow, the Sprinkle team's expertise enables mPokket to do it more efficiently.