The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, narrative text, equations, and visualizations.
It’s basically used for data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning.
- Jupyter supports over 40 programming languages, including Python, R, and PySpark
- Your code can produce rich, interactive output: HTML, images, videos, LaTeX, and custom MIME types.
- Leverage big data tools, such as Apache Spark, from Python, R and PySpark. Explore that same data with pandas, scikit-learn, ggplot2, TensorFlow.
Notebooks is a new feature in Sprinkle, on clicking it, it routes the user to a screen where a new notebook can be created. Notebook name and its type should be selected before creating one.
To commence with notebook and to run your scripts, you need to click on the “Start” button. Once the notebook is started the user can import the libraries.
How to import data from sprinkle’s explore and segment reports to the notebook?
Sprinkle created a library named “sprinkleSdk” to import data from the reports.
Please find the below script to import the library and to import data into the data frame.
Import sprinkle SDK:
from sprinkleSdk import SprinkleSdk as sp
df = sp.read_segment('<segment_id>')
df = sp.read_explore('<explore_id>')
Once data is imported, you can run every kind of analysis like descriptive, predictive, prescriptive, diagnostic analysis using these data.
How to create a table and update an existing table in Sprinkle post-analysis?
Create table in warehouse using dataframe:
Update existing table in warehouse:
How to work on Spark session operations?
Get spark session with default configurations:
spark = sp.getOrCreate()
Change spark app name while creating default spark session:
spark = sp.appName('some-name').getOrCreate()
Get spark session where the user can customise your configuration:
spark = sp.sparkBuilder()