How to convert SQL Query result to PANDAS Data Structure?
Any help on this problem will be greatly appreciated.
So basically I want to run a query to my SQL database and store the returned data as Pandas data structure.
I have attached code for query.
I am reading the documentation on Pandas, but I have problem to identify the return type of my query.
I tried to print the query result, but it doesn't give any useful information.
Thanks!!!!
from sqlalchemy import create_engine
engine2 = create_engine('mysql://THE DATABASE I AM ACCESSING')
connection2 = engine2.connect()
dataid = 1022
resoverall = connection2.execute("
SELECT
sum(BLABLA) AS BLA,
sum(BLABLABLA2) AS BLABLABLA2,
sum(SOME_INT) AS SOME_INT,
sum(SOME_INT2) AS SOME_INT2,
100*sum(SOME_INT2)/sum(SOME_INT) AS ctr,
sum(SOME_INT2)/sum(SOME_INT) AS cpc
FROM daily_report_cooked
WHERE campaign_id = '%s'",
%dataid
)
So I sort of want to understand what's the format/datatype of my variable "resoverall" and how to put it with PANDAS data structure.
Solution 1:
Here's the shortest code that will do the job:
from pandas import DataFrame
df = DataFrame(resoverall.fetchall())
df.columns = resoverall.keys()
You can go fancier and parse the types as in Paul's answer.
Solution 2:
Edit: Mar. 2015
As noted below, pandas now uses SQLAlchemy to both read from (read_sql) and insert into (to_sql) a database. The following should work
import pandas as pd
df = pd.read_sql(sql, cnxn)
Previous answer: Via mikebmassey from a similar question
import pyodbc
import pandas.io.sql as psql
cnxn = pyodbc.connect(connection_info)
cursor = cnxn.cursor()
sql = "SELECT * FROM TABLE"
df = psql.frame_query(sql, cnxn)
cnxn.close()
Solution 3:
If you are using SQLAlchemy's ORM rather than the expression language, you might find yourself wanting to convert an object of type sqlalchemy.orm.query.Query
to a Pandas data frame.
The cleanest approach is to get the generated SQL from the query's statement attribute, and then execute it with pandas's read_sql()
method. E.g., starting with a Query object called query
:
df = pd.read_sql(query.statement, query.session.bind)
Solution 4:
Edit 2014-09-30:
pandas now has a read_sql
function. You definitely want to use that instead.
Original answer:
I can't help you with SQLAlchemy -- I always use pyodbc, MySQLdb, or psychopg2 as needed. But when doing so, a function as simple as the one below tends to suit my needs:
import decimal
import pyodbc #just corrected a typo here
import numpy as np
import pandas
cnn, cur = myConnectToDBfunction()
cmd = "SELECT * FROM myTable"
cur.execute(cmd)
dataframe = __processCursor(cur, dataframe=True)
def __processCursor(cur, dataframe=False, index=None):
'''
Processes a database cursor with data on it into either
a structured numpy array or a pandas dataframe.
input:
cur - a pyodbc cursor that has just received data
dataframe - bool. if false, a numpy record array is returned
if true, return a pandas dataframe
index - list of column(s) to use as index in a pandas dataframe
'''
datatypes = []
colinfo = cur.description
for col in colinfo:
if col[1] == unicode:
datatypes.append((col[0], 'U%d' % col[3]))
elif col[1] == str:
datatypes.append((col[0], 'S%d' % col[3]))
elif col[1] in [float, decimal.Decimal]:
datatypes.append((col[0], 'f4'))
elif col[1] == datetime.datetime:
datatypes.append((col[0], 'O4'))
elif col[1] == int:
datatypes.append((col[0], 'i4'))
data = []
for row in cur:
data.append(tuple(row))
array = np.array(data, dtype=datatypes)
if dataframe:
output = pandas.DataFrame.from_records(array)
if index is not None:
output = output.set_index(index)
else:
output = array
return output
Solution 5:
MySQL Connector
For those that works with the mysql connector you can use this code as a start. (Thanks to @Daniel Velkov)
Used refs:
- Querying Data Using Connector/Python
- Connecting to MYSQL with Python in 3 steps
import pandas as pd
import mysql.connector
# Setup MySQL connection
db = mysql.connector.connect(
host="<IP>", # your host, usually localhost
user="<USER>", # your username
password="<PASS>", # your password
database="<DATABASE>" # name of the data base
)
# You must create a Cursor object. It will let you execute all the queries you need
cur = db.cursor()
# Use all the SQL you like
cur.execute("SELECT * FROM <TABLE>")
# Put it all to a data frame
sql_data = pd.DataFrame(cur.fetchall())
sql_data.columns = cur.column_names
# Close the session
db.close()
# Show the data
print(sql_data.head())