Step-by-step process on how to move data from MongoDB to BigQuery

BlogsData Engineering

Are you looking to move your data from MongoDB to BigQuery but unsure where to start?

In this blog post, we will guide you through transferring your data seamlessly and efficiently with two methods. Whether you are a developer, data analyst, or business owner, understanding how to migrate your data between these two powerful databases can help improve your data management and analytics skills.

What is BigQuery and Its Benefits?

Google Bigquery

BigQuery is a powerful cloud-based data warehousing tool that allows users to analyze large datasets quickly and efficiently. Google developed it as a part of its Cloud Platform suite of services. BigQuery uses SQL queries to process and manipulate data, making it accessible to users with various levels of technical expertise.

There are many benefits to using BigQuery for data analysis, including: 

  • BigQuery can handle massive amounts of data, making it ideal for companies with large datasets. 
  • BigQuery processes queries incredibly quickly, allowing users to get results in seconds or minutes rather than hours. 
  • Users only pay for the volume of data processed, making it a cost-effective solution for businesses of all sizes. 
  • BigQuery seamlessly integrates with other tools and services, such as Google Analytics and Google Sheets. 
  • With BigQuery, users can run real-time analytics on streaming data sources. 
  • BigQuery automatically backs up your data regularly, ensuring that you never lose important information. 

What is MongoDB and its benefits? 

Mongodb

MongoDB is an open-source NoSQL database management system that is popular due to its flexibility, scalability, and ease of use. Unlike traditional relational databases, MongoDB stores data in a flexible JSON-like format called BSON (Binary JSON), which allows for easier integration with contemporary web development technologies.

With its document-oriented data model, MongoDB offers several benefits that make it an attractive choice

  • MongoDB is designed to scale horizontally across multiple servers, making it easy to store unstructured data.
  • MongoDB's schema-less design allows for the storage of diverse types of data without the need for complex joins or migrations. 
  • MongoDB's native support for indexing and sharding ensures fast query performance, even on large datasets. 
  • MongoDB supports automatic failover and replica sets, ensuring that your data remains accessible even in the event of hardware failures. 
  • MongoDB offers robust security features such as authentication, authorization, encryption at rest, and auditing capabilities. 
  • MongoDB's powerful aggregation framework allows for complex queries and analytics operations on your stream data. 
  • With its support for change streams and triggers, MongoDB enables real-time monitoring and analysis of your real-time or historical data. 

How to Move Data from MongoDB to BigQuery table?

In this section, we will guide you through the process of importing MongoDB data to BigQuery using two methods.  

Method 1: Using Sprinkle Data

Sprinkle Data simplifies the data integration journey by collaborating data from multiple data sources.

To move your data from MongoDB to BigQuery using Sprinkle Data follow these steps:

Step 1: Integrating Bigquery

1. Create a Service Account

The first step is to create a service account which will be utilized by Sprinkle to connect to BigQuery. The steps to be taken in this process are:

  • Create a Service Account, and provide any name like “sprinkle”.
  • In the service account provide BigQuery Admin role permission.
  • Create a JSON key for this service account, and then download and keep it.

2. Create a BigQuery dataset

Create a BigQuery dataset, and provide any name like “sprinkle-dataset”. All the tables created in Sprinkle Data will be within this dataset.

3. Configure BigQuery Connection

  • Log into the Sprinkle Data account and navigate to Admin -> Warehouse -> New Warehouse Connection
  • From the list of options select BigQuery as the new warehouse option.
  • Provide all the mandatory details
    • Distinct Name: Name to identify this connection
    • Project Id: GCP Project Id
    • Private JSON key: Copy and paste the contents of the JSON key downloaded while creating a service account.
    • Dataset: Name of the BigQuery dataset created above.
  • Click on the test connection button and then click on Create.

Step 2: Create a Cloud Bucket

To create a cloud bucket with Sprinkle Data follow the below steps:

1. Create a Cloud bucket

Create a Google Cloud Storage bucket in the same GCP project, and provide any name like “sprinkle” in the same location/region as your BigQuery project.

2. Provide Cloud Bucket access to Service Account Storage

This bucket should be accessible by BigQuery and Sprinkle Data. So carefully configure the access for the service account (created for BigQuery above)

Bucket -> Add Permissions -> Add Principal (provide the name of the service account created in Bigquery setup above) -> Add Role Storage Admin

3. Configure the GCP Cloud bucket connection in Sprinkle

  • Log into the Sprinkle Data account and navigate to Admin -> Warehouse -> New Warehouse Connection -> Add Storage
  • From the given options, select GCP as the storage option.
  • Provide all the mandatory details
    • Distinct Name: Name to identify this connection
    • Private Key JSON: Copy and paste the contents of the JSON key downloaded from the service account created in the BigQuery setup
    • Bucket Name: Name of the bucket created above
  • Click on the test connection button and then create.

Step 3: Create a MongoDB connector in BigQuery as the Destination Warehouse

1. Allow Mongo to accept connection from Sprinkle

  • If the MongoDB server is on the public network, accessible over public IP, allow inbound connection on MongoDB port (default is 27017) from Sprinkle IPs (34.93.254.126, 34.93.106.136)
  • If the MongoDB server is on a private network, configure SSH Tunnel in Advanced Settings.
  • Create a Read-Only user, providing any name like "sprinkle"

2. Configure MongoDB connection

  • In the Sprinkle Data application, navigate to the ingest tab, click on data sources, and click on the setup source button.
  • Select the MongoDB card, give the connector a name, and click on the Create button.
  • Provide all the mandatory details
    • Distinct Name: Name to identify this connection
    • Host: Provide the IP address or Host name.
    • Port: Provide the Port number.
    • Database: Provide a database name if there is any, it should be an existing database.
    • Username: provide the username
    • Password: provide the password.
    • Advanced Settings: If Yes:-
      • Connection Scheme: Enter the connection Schema.
      • Connection Properties: You can provide optional connection properties. ex- key1=value1&key2=value2. The most common properties used for Mongo is authSource=admin
      • Binlog Replication: If enabled, mongo binlog is used to fetch changed/new rows. This setting cannot be changed later once selected.
      • Connect via SSH Host: If Yes:-
        • SSH Host: IP address or hostname of the SSH server.
        • SSH Public Key: Add this public key to the ~/.ssh/authorized_keys file on the ssh host machine.
        • SSH Login Username
  • Click on the Test Connection button and once the test connection has passed then click on Create.

3. Configure Mongo data import

  • In the Sprinkle Application, navigate to Ingest -> Data Imports Tab -> Setup Sources
  • Select Mongo database provide the name and then click on Create.
  • Connection Tab:
    • From the drop-down, select the name of the connection created above and click on update

4. Create a Dataset

Add a Dataset for each collection that you want to replicate, providing the following details:

  • Database Name (Required): Database name to select the collection from
  • Collection Name (Required): Collection to ingest
  • Destination Schema (Required): Data warehouse schema where the table will be ingested into
  • Destination Table name (Required): It is the table name to be created in the warehouse.
  • Destination Create Table Clause: Provide additional clauses to warehouse-create table queries such as clustering, partitioning, and more, useful for optimizing DML statements.
  • Ingestion Mode: (Required)
    • Complete: Ingest full data from the source table in every ingestion job run.
    • Incremental: Ingest only the changed or inserted rows in every ingestion job run.
  •         Time Column needs to be provided here
  •         Time Column Type: The type can be of three different data types:- String, Date, or Numeric.
  •         Time Column Name: Give the name of the time column
  • Periodic Full Fetch (Required): If yes, periodically full table will be fetched from the source.
    • Full Fetch Interval (Required): This will fetch complete rows in the table based on the interval selected.
  • Automatic Schema (Required):
    • Yes: Schema is automatically discovered by Sprinkle (Recommended)
      • Flatten Level (Required): Select from One Level or Multi Level.
    • No: Warehouse Schema to be provided Format for Warehouse schema is: Col1 datatype, Col2 datatype, Col3 datatype Datatype should be warehouse specific.
  • Filter: For the Date column filter select the Date type and give the value in the string(format:yyyy-mm-dd) or long format. For other data types filter must give value in a proper format.
    • Date Type Filter, Other Date Type Filter:
      • Filter JSON: Provide filter json enclosed in {}. like {"key1":{"$gte":value1},"key2":{"$gt":value2}}.
    • No Filter: No filter is required

STEP-5: Run and schedule Ingestion

In the Ingestion Jobs tab:

  • Trigger the Job, using the Run button
  • To schedule, enable Auto-Run. Frequency can also be changed as needed.

Method 2: Using mongoexport and bq load command 

Another method to move data from MongoDB to BigQuery is by exporting the data from MongoDB using mongoexport and then loading it into BigQuery using the bq load command. 

Step 1: Export MongoDB data 

First, export the desired collection from MongoDB using the mongoexport command. For example, if you want to export the "users" collection, you can run the following command: 

  • mongoexport --uri="mongodb://localhost:456546/your_database" --collection=users --out=users.json 

This command will export the "users" collection from your MongoDB database and save it as a JSON file named users.json. 

Step 2: Install Google Cloud SDK 

Next, you need to install the Google Cloud SDK on your machine. You can download and install it by following the official Google Cloud website instructions. 

Steps to install Google Cloud SDK on your system command line tool:

  • Download the Google Cloud SDK installer from the official website and follow the installation instructions to install it.
  • Once the installation is complete, open a terminal window and run the following command to initialize the SDK: 

``` 

gcloud init 

``` 

  • Follow the prompts to authenticate with your Google account and configure the SDK settings. 

Step 3: Authenticate with Google Cloud 

After installing the Google Cloud SDK, you need to authenticate with your Google Cloud account by running the following command in your terminal: 

``` 

gcloud auth login 

``` 

This will open a browser window prompting you to log in with your Google account and grant permission for the gcloud tool to access your Google Cloud resources.

Step 4: Create a BigQuery Dataset 

Before importing data into BigQuery, you need to create a dataset where the data will be stored. You can create a dataset by running the following command: 

```

bq mk ``

```

Replace `` with your desired dataset name. 

Step 5: Load data into BigQuery 

Next, upload data extracted into BigQuery using the bq load command. Make sure you have set up authentication credentials for your Google Cloud account before running this command. 

  • bq load --source_format=NEWLINE_DELIMITED_JSON your_dataset.your_table gs://your_bucket/users.json schema.json 

Replace "your_dataset", "your_table", and "gs://your_bucket/users.json" with your actual dataset name, table name, and JSON file location respectively. 

Conclusion:

You can efficiently move data from MongoDB to BigQuery with these two methods discussed above. Using the mongoexport and bq load command method to load data can be complex and time-consuming, especially for new users or small teams. To overcome this problem Sprinkle Data should be used.

Click here to get started with the platform.

Frequently Asked Questions FAQs

How to transfer data from MongoDB to BigQuery? 

To transfer data from MongoDB to BigQuery, you can use tools like Sprinkle Data or Stitch that provide connectors for both MongoDB and BigQuery, making the data transfer process easier. 

How to migrate data from MongoDB to SQL? 

Migrating data from MongoDB to SQL involves exporting the data from MongoDB into a common format like JSON or CSV and then importing it into your SQL database using tools like mongoexport or third-party ETL tools like Sprinkle Data.

How to fetch data from MongoDB using query? 

To fetch data from MongoDB using a query, you can use the find() method in the MongoDB shell or a programming language like Node.js with the native MongoDB driver.

How to export data from collection in MongoDB? 

Exporting data from a collection in MongoDB can be done using the mongoexport utility provided by MongoDB. This tool allows you to export collections as JSON or CSV files locally or remotely.

How do I extract data from MongoDB to excel? 

To extract data from MongoDB to Excel, you can export your MongoDB data as JSON/CSV files and then import them into Excel using its import feature.

How to fetch data fast from MongoDB? 

To fetch data fast from MongoDB, you should consider creating indexes on frequently queried fields in your collections. Use the explain() method to analyze query execution plans and optimize index usage accordingly. 

How to get data from MongoDB to MySQL? 

For getting data from MongoDB to MySQL you can use third-party ETL tools like Sprinkle Data that offer connectors for both databases for seamless data transfer.  

How to sync data from MongoDB to SQL Server? 

Syncing data from MongoDB to SQL Server can be achieved through custom scripts utilizing libraries like pyodbc for Python connectivity between both databases.

How to get CSV data load from MongoDB to PostgreSQL? 

Loading data from MongoDB to PostgreSQL involves similar steps as transferring it to other relational databases - exporting MongoDB collections as CSV files followed by loading them into PostgreSQL tables via COPY command provided by PostgreSQL's psql client interface or pgAdmin GUI tool for bulk insertion of records efficiently. 

Written by
Rupal Sharma

Blogs

Step-by-step process on how to move data from MongoDB to BigQuery