• No results found

Python and MongoDB. Why?

N/A
N/A
Protected

Academic year: 2021

Share "Python and MongoDB. Why?"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

Python and MongoDB

Kevin Swingler

Why?

• Python is becoming the scripting language of

choice in big data

• It has a library for connecting to a MongoDB:

PyMongo

• Nice mapping betwen Python data structures

and JSON objects

• Reduces relational impedance as database

(2)

Python Dictionary

b = {'one': {'a':'A','b':'B'}, 'two': 2, 'three': [3,2,1]}

b= 'two'=2 'one'= 'a'='A' 'b'='B' 'three'= [3,2,1] b['one']['b']='B' b['two']=2 b['three'][3]=1 dict list dict

PyMongo

• Let’s jump right in

# connect and authorise (see later) d=db.col.find_one()

for k,v in one.iteritems():

# Do something with k (the key) # and v (the value)

(3)

Connect and Authenticate

<%

import pymongo

from pymongo import MongoClient

client = MongoClient(“mqr1.cs.stir.ac.uk") client.kms.authenticate(username,pwd)

db = client.name # Database colc=db.colcname # Collection %>

Python Server Pages (PSP)

• You can use Python to drive the output of a

web page

• This can connect to a MongoDB • Much like using PHP and MySQL

• Many methods exist from simple scripts to

large frameworks

(4)

Simple Form to DB

• Make an HTML form

• Run the Python script when the form is

submitted

• Extract the values and put them into a

dictionary object

• Put the dictionary object into the database

First, the form in HMTL:

:

<form action=form.psp method=post> <input type=text name=key>

<input type=text name=val><br> <input type=submit value=Add> </form>

(5)

Extract Values and Add to DB

<%

key=form.getfirst('key')# Get key value val=form.getfirst('val')# Get row value

if key and val: # If both filled in

dct[key] = val # Add to dictionary if len(dct) > 0:

colc.insert(dct) # Add to db

%>

Search and Display

<%

f=colc.find() # Find every entry

for a in f:

# Each key,value pair

for k,v in a.iteritems():

if k != "_id": # Ignore _id

req.write(“%s = %s<br>”%(k,v)) %>

(6)

Some Details

d=db.col.find_one()Returns a single document – the first match

f=db.col.find()Returns a cursor which iterates through documents for example:

for d in f:

Specific Search

db.col.find({Name:"Bill"}) # Find all documents where Name = "Bill"

db.col.find({Name: "Bill", Age: 18}) # Find all documents where Name= "Bill " AND Age=18

is the same as

db.col.find({$and: [{Name: "Bill"}, {Age: 18}]})

but we must be explicit if we want to use OR:

db.col.find({$or: [{Name: "Bill"}, {Age: 17}]}) $Notdoes as you'd expect and$normeans not and not.

(7)

Regular Expression Search

Find names that contain "ev" anywhere

find({Name: /ev/})

Names that start with "K", ignoring case (K or k)

find({Name: /^K/i})

Names that contain "a" or "b" or "c" anywhere

find({Name: /[abc]/})

Find names that end with some numbers

find({Name: /\w{1,}\d{1,}/})

Bulk Inserts

• You can use colc.insert(docs)where docs is

iterable

• That is, it is a list of dictionaries that can be

iterated

names=[{"Name":"Bill", " Age":18}, {"Name":"Ted", "Age":18}]

(8)

Counting

colc.count(query)

Sorting

(9)

Aggregation

• In SQL we use Count(), Sum() .. GROUP BY • In MongoDB there are a number of

aggregation options:

–Aggregate() method –Map Reduce

–Group() method

Aggregate

db.things.insert({"x": 1, "tags":["dog", "cat"]}) db.things.insert({"x": 2, "tags":["cat"]})

db.things.insert({"x": 2, "tags":["mouse", "cat", "dog"]}) db.things.insert({"x": 3, "tags":[]})

Then

from bson.son import SON db.things.aggregate([

{"$unwind": "$tags"},

{"$group": {"_id": "$tags", "count": {"$sum": 1}}}, {"$sort": SON([("count", -1), ("_id", -1)])}

])

(10)

In Detail -

unwind

{"$unwind": "$tags"}

• $unwind operator turns a single document

with and array value into several documents, each with one value from the array in place of the array

unwind({"x": 1, "tags":["dog", "cat"]})

• Produces {"x": 1, "tags":"dog"}, {"x": 1, "tags":"cat"}

In Detail -

group

{"$group": {"_id":"$tags", "count":{"$sum":1}}}

• Means create documents like:

(11)

In Detail -

sort

{"$sort": SON([("count", -1), ("_id", -1)])}

• Means sort all the documents descending by

count.

• Python dictionaries don’t maintain order, so

we use SONto sort

• SON is ‘Serialized Ocument Notation’ – like a

dictionary, but ordered

Map Reduce

• PyMongo’s API supports all of the features of

MongoDB’s map/reduce engine

map_reduce(map, reduce, out, full_response=False, **kwargs)

• map: map function (as a JavaScript string) • reduce: reduce function (as a JavaScript string)

• out: output collection name or out object(dict). SON can be used to specify multiple options. e.g.

SON([(‘replace’, <collection name>), (‘db’, <database name>)])

• full_response(optional): if True, return full response to this command - otherwise just return the result collection

**kwargs(optional): additional arguments to the map reduce command may be passed as keyword arguments to this helper method

(12)

Group

• Write the reducer code in Javascript

reducer = Code(""" function(obj, prev){

prev.count++; }""")

• Run a query providing the key to group by, initial value for the reducer, and

the name of the reducer function

results = db.things.group(key={"x":1}, condition={}, initial={"count": 0}, reduce=reducer)

• Iterate through the documents in the result

for doc in results: print doc

{'count': 1.0, 'x': 1.0}{'count': 2.0, 'x': 2.0}{'count': 1.0, 'x': 3.0}

Cursors

• You might not want your program to retrieve

every match to a query at once

• Perhaps you want to show 20 per page, for

example

• Cursors are how to do this

db.test.find()[50] – returns an entry

db.test.find()[20:25] – returns a cursor (contains

(13)

Cursors

• Find() returns a cursor:

cur = db.test.find()

cur.sort(‘field’:1)

PyDev

• Eclipse plug-in

• In all machines in CS labs

• Python development environment • Includes interactive console

(14)

PyDev Interactive Console

• Nice alternative to the Mongo shell

(15)

References

Related documents