At the workplace where I work, we decided to use Superset open source business intelligence product instead of licensed business intelligence programs. We are at the beginning of the work for now. At the beginning of the work, we encountered an obstacle somewhere. As someone who has been using Power BI for a long time, this obstacle clearly surprised me. We use more than one software at work and since we use different databases between these software, we needed to combine this data while preparing a dashboard. Unfortunately, ‘Superset’ did not give us this possibility. Although we did various researches for this, we could not reach a result. In other words, Superset connects to different databases, but it could not provide a communication between them in the way we wanted. For the solution, we decided to connect to Postgre, Oracle and SQL server databases with Python and transfer this data to the SQL server database installed on our server in a unified way. In this way, we were able to easily create the dashboard by pulling this data from the database we transferred to the Superset business intelligence software. In this article, you can see how to combine the data from the databases with Python and transfer them to the SQL Server (it can be a different database) database.
In the example below, records are taken according to the queries written in Postgre, Oracle and SQL Server databases and sent to the Sql server database by merging them. Then a dashboard can be prepared with this combined data in business intelligence, which is our main job.
Python libraries we will use :
import pandas as pd
##Pandas is a frequently used tool for retrieving data from various sources.
from sqlalchemy import create_engine, Table, Column, DateTime, MetaData,REAL
##sqlalchemy is a library in Python that provides access to relational databases (e.g. PostgreSQL, MySQL, SQLite).
import pyodbc
##The pyodbc library allows you to connect to a database, send queries and get results using an ODBC-compliant driver.
import os
##The os module is one of Python's core libraries and is often used to perform various operating system operations.
os.environ["PYTHONIENCODING"]="UTF-8″
# This setting determines which character (UTF-8) encoding Python uses, especially for file read/write and text processing
#Databases connection settings are entered as follows:
#Databases connection settings are entered as follows::
# PostgreSQL connection
pg_engine = create_engine('postgresql://user_name:password@host/database_name', echo=True)
# Oracle connection
oracle_engine = create_engine('oracle://user_name:password@host:port/database_name', echo=True)
# SQL Server connection
sql_server_connection = pyodbc.connect('DRIVER={SQL Server};SERVER=host;DATABASE=database_name;UID=user_name;PWD=password')
# Database queries are entered as follows:
# Oracle SQL query
query_oracle = """
select id, name from table_name
"""
# PostgreSQL query
query_postgres = """
select id, number from table_name
"""
# SQL Server query
query_sql_server = """
select id, address from table_name
"""
#codes that runs database queries and converts the result into a Pandas DataFrame:
# Get PostgreSQL datas
pg_df = pd.read_sql_query(query_postgres, pg_engine)
# Get Oracle data
oracle_df = pd.read_sql_query(query_oracle, oracle_engine)
# Get SQL Server data
sql_server_df = pd.read_sql_query(query_sql_server, sql_server_connection)
# The process of merging data : Since id fields are common here, id fields are written with left method.
merged_df = pd.merge(oracle_df, sql_server_df, on='id', how='left')
merged_df = pd.merge(merged_df, pg_df, on='id', how='left')
#After the data is merged, the SQL Server connection settings to be transferred are entered as below.
sql_server_connection = pyodbc.connect('DRIVER={SQL Server};SERVER=host;DATABASE=database_name;UID=user_NAME;PWD=PASSWORD')
# If the customer table already exists, it is deleted. If the transfer will be made for the first time, this process is closed. I prefer to delete the records every time when transferring and transfer again
sql_server_connection.execute("DROP TABLE Customer;")
# Import the `merged_df` data frame into the `Customers` table :
with sql_server_connection as con:
merged_df.to_sql('Customers', con, if_exists='append', index=None, chunksize=1000)
# Close the database connection
sql_server_connection.close()
print("Data has been successfully imported into Microsoft SQL Server database.")
In my next article, we will transfer data from Excel program to SQL database program with Python
AHMET GÜNOĞLU
ERP Consultant and Business Intelligence Expert
https://www.ahmetgunoglu.com/blog
https://medium.com/@ahmetgunnnnn