How to import data from CSV to postgres

BlogsData Engineering

With the advancement of technology, the amount of data created and collected is rising exponentially as information becomes more accessible through the internet and other digital channels. To store and manage data, many different types of data sources are used: Postgres is one such data source. It is an open-source relational database widely used by many organizations to manage their data effectively.

CSV stands for Comma Separated Values. It is a file format that holds tabular data, numbers, and text in plain text form, separated by commas commonly used to store and exchange data between applications.

What is Postgres?

Postgres is an open-source object-relational database management system (ORDBMS) widely used in web applications and data warehouses. It is a robust and scalable database management system for data storage and management. Postgres offers features like data integrity, scalability, security, and flexibility and supports a wide range of programming languages and tools for application development. It is an excellent choice for mission-critical applications.

Key features of Postgres

  1. Object-Relational Database: An object-relational database system allows users to store data and objects such as images, sounds, and videos. 
  2. Robust Data Integrity: Postgres uses advanced features like transactions, triggers, and foreign keys to ensure data integrity. 
  3. Replication: Postgres supports multiple types of replication, including master-slave, multi-master, and cascading replication. 
  4. Security: Postgres offers many built-in security features, including authentication, authorization, encryption, and auditing. 
  5. Scalability: Postgres can quickly scale up to handle large datasets and complex queries, with the ability to add additional nodes or clusters. 
  6. Flexible Data Types: Postgres supports many data types, including JSON, XML, and custom types. 
  7. Cross-Platform Support: Postgres can be deployed on various operating systems, including Linux, Windows, and macOS.

What is CSV?

CSV (comma-separated values) is a file format that holds tabular data in which numbers and text are stored in a plain-text form separated by commas. The format is straightforward and is generally used to exchange data between different applications. It is supported by almost all spreadsheet and database applications and is used to store data in a table format. 

Benefits of CSV

  1. Readability: CSV is a simple data format that is easy to use and read. 
  2. Data Transfer: It provides easy data transfer between different programs and applications and can be used with various databases. 
  3. Compact: CSV files are often more compact than other standard data formats, such as JSON, XML, and YAML
  4. Data Storage Capacity: CSV files can store large amounts of data in a relatively small file size, making them ideal for large datasets.
  5. Data Sharing: CSV files are easy to import and export, making them a great option for data sharing.
  6. Accessibility: CSV files can be opened by any text editor, making them accessible to anyone with basic computer skills.

Methods to Import Data from CSV to Postgres 

Ways to import data from CSV to Postgres:

  1. Importing CSV data to Postgres using copy command.
  2. Importing CSV data to Postgres using pgAdmin.
  3. Importing CSV data to POstgres using Sprinkle Data.

In this article, we will go through these three techniques in detail that how data can be moved from CSV to Postgres.

Method-I How to Import Data from CSV to Postgres Using Copy Command.

The first and foremost step to move data from CSV to Postgres is to create the table that you want to import the CSV. Then use the following COPY command to import the CSV into Postgres:

COPY table_name FROM 'file_path_to_csv' DELIMITER ',' CSV HEADER;

The copy command has the following important factors: 

  1. Put in the name of the table in which you want to import the CSV data instead of table_name and replace '/path/to/csv/file' with the actual path to the CSV file. Make sure to include the file name and extension in the path. 
  2. The DELIMITER option is used to specify the separator used in the CSV file. The default value is ‘,’ (comma). If your CSV file uses a different separator, you can specify it accordingly. 
  3. The CSV HEADER option tells Postgres that the first row of the CSV file contains the column names. 

Once you’ve specified the options, run the command and the CSV file will be imported into the specified table.

Method-II How to Import Data From CSV to Postgres Using PgAdmin

PgAdmin is a graphical user interface for managing PostgreSQL databases. It provides multiple powerful features for creating, editing, and managing databases, roles, tables, and other objects. It also includes a SQL query and scripting tool to help users interact with their databases. 

Before importing a CSV file, one must make sure that the table is already created, if not, you can create a new table either from the GUI or use the SQL query scripting tool. After the table, follow the steps below:

  1. Open pgAdmin and connect to the database you want to import your CSV file into.
  2. Create the table in that database.
  3. Right-click on the table name and select "Import/Export" from the options. 
  1. On the next screen, select the import toggle button, select the format as CSV, and specify the file path. Select the appropriate encoding for the data as well.

  1. Set the appropriate delimiter and click "Next". 
  1. The CSV file will now be imported into the database.

Method-III: How to Import Data From CSV to Postgres Using Sprinkle Data

Sprinkle data is a no-code data pipeline and analytics platform organizations use to unify their data in a single place. It also allows importing data from CSV and Excel files, giving users flexibility. After creating data pipelines, Sprinkle also provides many features to maintain them, helping keep all your data up to date. 

With sprinkle, data can be easily moved by CSV to Postgres by following the steps mentioned below:

Integrating Postgres

STEP-1: Authorize Sprinkle connection

  1. Postgres database should be accessible on public IP.
  2. Allow inbound connection on Postgres port (default is 5432) from Sprinkle IPs (34.93.254.126, 34.93.106.136).
  3. Create a Read-Only user, providing any name like "sprinkle"

STEP-2: Configure Postgres Connection

  • Log into the Sprinkle Data application.
  • Navigate to Admin -> Warehouse -> New Warehouse Connection.
  • Select Postgres as the data warehouse.
  • Provide all the mandatory details
  • Distinct Name: Name to identify this connection
  • JDBC URL: Provide the URL in the format: jdbc:postgresql://<IP_address>:<port>/<database_name>
  • Username: Enter the Postgres account username
  • Password: Enter the Postgres account password

SSH Tunnel in Postgres:

  • Connect via SSH Host: Click on "Yes" to configure SSH Tunnel.
  • SSH Host*: Enter the IP address or hostname of the SSH server
  • SSH Public Key*: Copy the SSH public key and paste it into the ~/.ssh/authorized_keys file of the SSH user on the ssh host machine
  • SSH Login Username*: Provide the SSH Login Username where you added the public key. Further, verify the provided details by testing the connection.

Networking rules from the user’s end:

  • Make sure SSH Host has public IP
  • Whitelist Sprinkle IPs (34.93.254.126, 34.93.106.136) in your SSH Host. SSH host should be able to accept traffic from these Sprinkle IPs.
  • Make sure SSH port 22 is open.

  • Test Connection: Once this setting is completed you can test the connection and if successful, new changes can be updated.
  • Test the connection and then click on create.

Creating a Cloud Bucket

Sprinkle requires a Cloud Bucket to store intermediate data and report caches. Sprinkle supports creating a bucket in AWS, GCP, or Azure. In this section, we will build a GCP storage bucket.

STEP-1: To Create a Cloud bucket

Create a Cloud bucket in the same GCP project, and provide any name in the same location/region as your BigQuery project.

STEP-2: Provide Cloud Bucket access to Service Account Storage

This bucket should be accessible by BigQuery as well as the Sprinkle application. So configure the access for the service account.

Bucket -> Add Permissions -> Add Principal (provide the name of the service account created in Bigquery setup)-> Add Role Storage Admin

STEP-3: Configure GCP Cloud bucket connection in Sprinkle

  • Log into the Sprinkle application
  • Navigate to Admin -> Warehouse -> New Warehouse Connection -> Add Storage
  • Select GCP
  • Provide all the mandatory details
  • Distinct Name: Give a unique name to identify this connection
  • Private Key JSON: Copy and paste the contents of the JSON key downloaded from the service account created in the BigQuery setup
  • Bucket Name: Give the name of the bucket created above
  • Click on Test Connection

Importing Data from CSV to Sprinkle:

Once the destination warehouse is created you can easily assemble data in Sprinkle from multiple CSV files by following the steps mentioned below:

Step-I: Create a File Upload:

  • In sprinkle navigate to Ingest tab and click on file uploads.
  • Click on the setup files button to create a new file upload.

  • Give the file a unique name and click on create.
  • Click on the plus icon. You can either drag and drop the file or browse it.
  • Once the file is uploaded you need to provide the following details:
  • Destination table name: Enter a unique name with which you want to create the file
  • Skip before Header: Here you can specify the number of rows to skip.
  • Exclude columns: Here you can specify the columns that you want to exclude
  • In the preview section, you can easily edit the column names and alter the data type according to your convenience.
  • Once all the details are filled in, click on proceed and the table will be created.

Conclusion

You can quickly move data from CSV to Postgres by following any of the three methods discussed in the article. Importing CSV data using the COPY command and pgAdmin can sometimes require high skill and be time-consuming. So to overcome this problem data pipeline tools should be used. Sprinkle data is one such tool that helps to unify data not only from CSV but has support for more than 100+ data connectors, thus helping organizations to keep their data centralized.

Written by
Soham Dutta

Blogs

How to import data from CSV to postgres