How to import data from mongodb to pandas?
I have a large amount of data in a collection in mongodb which I need to analyze. How do i import that data to pandas?
I am new to pandas and numpy.
EDIT: The mongodb collection contains sensor values tagged with date and time. The sensor values are of float datatype.
Sample Data:
{
"_cls" : "SensorReport",
"_id" : ObjectId("515a963b78f6a035d9fa531b"),
"_types" : [
"SensorReport"
],
"Readings" : [
{
"a" : 0.958069536790466,
"_types" : [
"Reading"
],
"ReadingUpdatedDate" : ISODate("2013-04-02T08:26:35.297Z"),
"b" : 6.296118156595,
"_cls" : "Reading"
},
{
"a" : 0.95574014778624,
"_types" : [
"Reading"
],
"ReadingUpdatedDate" : ISODate("2013-04-02T08:27:09.963Z"),
"b" : 6.29651468650064,
"_cls" : "Reading"
},
{
"a" : 0.953648289182713,
"_types" : [
"Reading"
],
"ReadingUpdatedDate" : ISODate("2013-04-02T08:27:37.545Z"),
"b" : 7.29679823731148,
"_cls" : "Reading"
},
{
"a" : 0.955931884300997,
"_types" : [
"Reading"
],
"ReadingUpdatedDate" : ISODate("2013-04-02T08:28:21.369Z"),
"b" : 6.29642922525632,
"_cls" : "Reading"
},
{
"a" : 0.95821381,
"_types" : [
"Reading"
],
"ReadingUpdatedDate" : ISODate("2013-04-02T08:41:20.801Z"),
"b" : 7.28956613,
"_cls" : "Reading"
},
{
"a" : 4.95821335,
"_types" : [
"Reading"
],
"ReadingUpdatedDate" : ISODate("2013-04-02T08:41:36.931Z"),
"b" : 6.28956574,
"_cls" : "Reading"
},
{
"a" : 9.95821341,
"_types" : [
"Reading"
],
"ReadingUpdatedDate" : ISODate("2013-04-02T08:42:09.971Z"),
"b" : 0.28956488,
"_cls" : "Reading"
},
{
"a" : 1.95667927,
"_types" : [
"Reading"
],
"ReadingUpdatedDate" : ISODate("2013-04-02T08:43:55.463Z"),
"b" : 0.29115237,
"_cls" : "Reading"
}
],
"latestReportTime" : ISODate("2013-04-02T08:43:55.463Z"),
"sensorName" : "56847890-0",
"reportCount" : 8
}
Solution 1:
pymongo
might give you a hand, followings are some codes I'm using:
import pandas as pd
from pymongo import MongoClient
def _connect_mongo(host, port, username, password, db):
""" A util for making a connection to mongo """
if username and password:
mongo_uri = 'mongodb://%s:%s@%s:%s/%s' % (username, password, host, port, db)
conn = MongoClient(mongo_uri)
else:
conn = MongoClient(host, port)
return conn[db]
def read_mongo(db, collection, query={}, host='localhost', port=27017, username=None, password=None, no_id=True):
""" Read from Mongo and Store into DataFrame """
# Connect to MongoDB
db = _connect_mongo(host=host, port=port, username=username, password=password, db=db)
# Make a query to the specific DB and Collection
cursor = db[collection].find(query)
# Expand the cursor and construct the DataFrame
df = pd.DataFrame(list(cursor))
# Delete the _id
if no_id:
del df['_id']
return df
Solution 2:
You can load your mongodb data to pandas DataFrame using this code. It works for me. Hopefully for you too.
import pymongo
import pandas as pd
from pymongo import MongoClient
client = MongoClient()
db = client.database_name
collection = db.collection_name
data = pd.DataFrame(list(collection.find()))
Solution 3:
Monary
does exactly that, and it's super fast. (another link)
See this cool post which includes a quick tutorial and some timings.
Solution 4:
As per PEP, simple is better than complicated:
import pandas as pd
df = pd.DataFrame.from_records(db.<database_name>.<collection_name>.find())
You can include conditions as you would working with regular mongoDB database or even use find_one() to get only one element from the database, etc.
and voila!
Solution 5:
import pandas as pd
from odo import odo
data = odo('mongodb://localhost/db::collection', pd.DataFrame)