As you know, Python is one of the widely used Programming languages for the data analysis, data science and machine learning. When analyzing data using Python, you will use Numpy and Pandas extensively. In this article, we will check how to create Redshift table from DataFrame in Python. Here DataFrame is actually referred to pandas not Spark. I will write another article on how to create tables out of Spark DataFrame, but for now let us stick to pandas df.
Python Pandas DataFrame
The pandas DataFrame’s are really very useful when you are working on the non-numeric values. You can view your data in the form of rows and columns just like relational database and that allows you to view data in a more structured format.
For example, consider below pandas dataFrame.
df = pd.DataFrame([{'Name': 'Ram', 'Age': 50},
{'Name': 'Bhim', 'Age': 23},
{'Name': 'Shyam', 'Age': 25}])
print(df)
Age Name
0 50 Ram
1 23 Bhim
2 25 Shyam
As you can see, data is organized into rows and columns.
Create Redshift Table from DataFrame using Python
As mentioned in the previous section, Pandas DataFrame organize your data into rows and column format.
Now, the question is there any way we can create a table out of pandas dataFrame?
Yes, you can dump pandas data frame into relational database table. For simplicity, we will be using Redshift as a relational database for our demonstration.
Pandas data from provides many useful methods. One of such methods is to_sql, you can use to_sql to push dataFrame data to a Redshift database. In this article, we have used SQLAlchemy engine to create an active database connection to Redshift.
Note that, you might have to install psycopg2 to connect to Redshift.
Pandas DataFrame to_sql options
Below are the some of the important to_sql options that you should take care of.
- Set index = False
- if_exists = ‘replace’ – The table will be created if it doesn’t exist, and you can specify if you want you call to replace the table, append to the table, or fail if the table already exists.
Example to Create Redshift Table from DataFrame using Python
Below is a working example that will create Redshift table from pandas DataFrame.
from sqlalchemy import create_engine
import pandas as pd
conn = create_engine('postgresql://username:password@redshift-cluster.com:5439/database')
df = pd.DataFrame([{'Name': 'Ram', 'Age': 50},
{'Name': 'Bhim', 'Age': 23},
{'Name': 'Shyam', 'Age': 25}])
df.to_sql('test_table', conn, index=False, if_exists='replace')
Now, verify in your Redshift cluster for created table.
databse=# select * from test_table;
age | name
-----+-------
50 | Ram
23 | Bhim
25 | Shyam
(3 rows)
Related Articles
- Connect Redshift using Python and Jdbc Driver- Example
- Connect Redshift using Python Pyodbc – ODBC Driver Example
- How to Export Spark DataFrame to Redshift Table
Hope this helps 🙂