Python and MongoDB
Kevin Swingler
Why?
• Python is becoming the scripting language of
choice in big data
• It has a library for connecting to a MongoDB:
PyMongo
• Nice mapping betwen Python data structures
and JSON objects
• Reduces relational impedance as database
Python Dictionary
b = {'one': {'a':'A','b':'B'}, 'two': 2, 'three': [3,2,1]}
b= 'two'=2 'one'= 'a'='A' 'b'='B' 'three'= [3,2,1] b['one']['b']='B' b['two']=2 b['three'][3]=1 dict list dict
PyMongo
• Let’s jump right in
# connect and authorise (see later) d=db.col.find_one()
for k,v in one.iteritems():
# Do something with k (the key) # and v (the value)
Connect and Authenticate
<%
import pymongo
from pymongo import MongoClient
client = MongoClient(“mqr1.cs.stir.ac.uk") client.kms.authenticate(username,pwd)
db = client.name # Database colc=db.colcname # Collection %>
Python Server Pages (PSP)
• You can use Python to drive the output of a
web page
• This can connect to a MongoDB • Much like using PHP and MySQL
• Many methods exist from simple scripts to
large frameworks
Simple Form to DB
• Make an HTML form
• Run the Python script when the form is
submitted
• Extract the values and put them into a
dictionary object
• Put the dictionary object into the database
First, the form in HMTL:
:<form action=form.psp method=post> <input type=text name=key>
<input type=text name=val><br> <input type=submit value=Add> </form>
Extract Values and Add to DB
<%key=form.getfirst('key')# Get key value val=form.getfirst('val')# Get row value
if key and val: # If both filled in
dct[key] = val # Add to dictionary if len(dct) > 0:
colc.insert(dct) # Add to db
%>
Search and Display
<%f=colc.find() # Find every entry
for a in f:
# Each key,value pair
for k,v in a.iteritems():
if k != "_id": # Ignore _id
req.write(“%s = %s<br>”%(k,v)) %>
Some Details
d=db.col.find_one()Returns a single document – the first match
f=db.col.find()Returns a cursor which iterates through documents for example:
for d in f:
Specific Search
db.col.find({Name:"Bill"}) # Find all documents where Name = "Bill"
db.col.find({Name: "Bill", Age: 18}) # Find all documents where Name= "Bill " AND Age=18
is the same as
db.col.find({$and: [{Name: "Bill"}, {Age: 18}]})
but we must be explicit if we want to use OR:
db.col.find({$or: [{Name: "Bill"}, {Age: 17}]}) $Notdoes as you'd expect and$normeans not and not.
Regular Expression Search
Find names that contain "ev" anywherefind({Name: /ev/})
Names that start with "K", ignoring case (K or k)
find({Name: /^K/i})
Names that contain "a" or "b" or "c" anywhere
find({Name: /[abc]/})
Find names that end with some numbers
find({Name: /\w{1,}\d{1,}/})
Bulk Inserts
• You can use colc.insert(docs)where docs is
iterable
• That is, it is a list of dictionaries that can be
iterated
names=[{"Name":"Bill", " Age":18}, {"Name":"Ted", "Age":18}]
Counting
colc.count(query)
Sorting
Aggregation
• In SQL we use Count(), Sum() .. GROUP BY • In MongoDB there are a number of
aggregation options:
–Aggregate() method –Map Reduce
–Group() method
Aggregate
db.things.insert({"x": 1, "tags":["dog", "cat"]}) db.things.insert({"x": 2, "tags":["cat"]})
db.things.insert({"x": 2, "tags":["mouse", "cat", "dog"]}) db.things.insert({"x": 3, "tags":[]})
Then
from bson.son import SON db.things.aggregate([
{"$unwind": "$tags"},
{"$group": {"_id": "$tags", "count": {"$sum": 1}}}, {"$sort": SON([("count", -1), ("_id", -1)])}
])
In Detail -
unwind
{"$unwind": "$tags"}
• $unwind operator turns a single document
with and array value into several documents, each with one value from the array in place of the array
unwind({"x": 1, "tags":["dog", "cat"]})
• Produces {"x": 1, "tags":"dog"}, {"x": 1, "tags":"cat"}
In Detail -
group
{"$group": {"_id":"$tags", "count":{"$sum":1}}}• Means create documents like:
In Detail -
sort
{"$sort": SON([("count", -1), ("_id", -1)])}
• Means sort all the documents descending by
count.
• Python dictionaries don’t maintain order, so
we use SONto sort
• SON is ‘Serialized Ocument Notation’ – like a
dictionary, but ordered
Map Reduce
• PyMongo’s API supports all of the features of
MongoDB’s map/reduce engine
map_reduce(map, reduce, out, full_response=False, **kwargs)
• map: map function (as a JavaScript string) • reduce: reduce function (as a JavaScript string)
• out: output collection name or out object(dict). SON can be used to specify multiple options. e.g.
SON([(‘replace’, <collection name>), (‘db’, <database name>)])
• full_response(optional): if True, return full response to this command - otherwise just return the result collection
• **kwargs(optional): additional arguments to the map reduce command may be passed as keyword arguments to this helper method
Group
• Write the reducer code in Javascriptreducer = Code(""" function(obj, prev){
prev.count++; }""")
• Run a query providing the key to group by, initial value for the reducer, and
the name of the reducer function
results = db.things.group(key={"x":1}, condition={}, initial={"count": 0}, reduce=reducer)
• Iterate through the documents in the result
for doc in results: print doc
{'count': 1.0, 'x': 1.0}{'count': 2.0, 'x': 2.0}{'count': 1.0, 'x': 3.0}
Cursors
• You might not want your program to retrieve
every match to a query at once
• Perhaps you want to show 20 per page, for
example
• Cursors are how to do this
db.test.find()[50] – returns an entry
db.test.find()[20:25] – returns a cursor (contains
Cursors
• Find() returns a cursor:
cur = db.test.find()
cur.sort(‘field’:1)
PyDev
• Eclipse plug-in
• In all machines in CS labs
• Python development environment • Includes interactive console
PyDev Interactive Console
• Nice alternative to the Mongo shell