When to use GridFS - MongoDB Docs

This page is under construction

When to use GridFS

Lots of files. GridFS tends to handle large numbers (many thousands) of files better than many file systems.

User uploaded files. When users upload files you tend to have a lot of files, and want them replicated and backed up. GridFS is a perfect place to store these as then you can manage them the same way you manage your data. You can also query by user, upload date, etc...

directly in the file store, without a layer of indirection

Files that often change. If you have certain files that change a lot - it makes sense to store them in GridFS so you can modify them in one place and all clients will get the updates. Also can be better than storing in source tree so you don't have to deploy app to update files.

When not to use GridFS

Few small static files. If you just have a few small files for a website (js,css,images) its probably easier just to use the file system.

Note that if you need to update a binary object atomically, and the object is under the document size limit for your version of MongoDB (16MB for 1.8), then you might consider storing the object manually within a single document. This can be accomplished using the BSON bindata type. Check your driver's docs for details on using this type.

Indexes

Indexes enhance query performance, often dramatically. It's important to think about the kinds of queries your application will need so that you can define relevant indexes. Once that's done, actually creating the indexes in MongoDB is relatively easy.

Indexes in MongoDB are conceptually similar to those in RDBMSes like MySQL. You will want an index in MongoDB in the same sort of situations where you would have wanted an index in MySQL.

Basics

Creation Options The _id Index

Indexing on Embedded Keys ("Dot Notation") Documents as Keys

Compound Keys Indexes Indexing Array Elements Sparse Indexes Unique Indexes

Unique Indexes and Missing Keys

dropDups Background Index Creation Dropping Indexes

ReIndex

Additional Notes on Indexes Keys Too Large To Index Index Performance

Using sort() without an Index Presentations

Building indexes with replica sets Index Versions

Geospatial Indexing

Indexing as a Background Operation Multikeys

Indexing Advice and FAQ

Basics

An index is a data structure that collects information about the values of the specified fields in the documents of a collection. This data structure is used by Mongo's query optimizer to quickly sort through and order the documents in a collection. Formally speaking, these indexes are

implemented as "B-Tree" indexes.

In the shell, you can create an index by calling the ensureIndex() function, and providing a document that specifies one or more keys to index.

Referring back to our examples database from Mongo Usage Basics, we can index on the 'j' field as follows:

db.things.ensureIndex({j:1});

The ensureIndex() function only creates the index if it does not exist.

Once a collection is indexed on a key, random access on query expressions which match the specified key are fast. Without the index, MongoDB has to go through each document checking the value of specified key in the query:

db.things.find({j:2}); // fast - uses index

db.things.find({x:3}); // slow - has to check all because 'x' isn't indexed

You can run

db.things.getIndexes()

in the shell to see the existing indexes on the collection. Run

db.system.indexes.find()

to see all indexes for the database.

ensureIndex creates the index if it does not exist. A standard index build will block all other database operations. If your collection is large, the build may take many minutes or hours to complete - if you must build an index on a live MongoDB instance, we suggest that you build it in the background using the background : true option. This will ensure that your database remains responsive even while the index is being built.

Note, however, that background indexing may still affect performance, particularly if your collection is large.

If you use replication, background index builds will block operations on the secondaries. To build new indices on a live replica set, it is recommended you follow the steps described here.

In many cases, not having an index at all can impact performance almost as much as the index build itself. If this is the case, we recommend the application code check for the index at startup using the chosen mongodb driver's getIndex() function and terminate if the index cannot be found.

A separate indexing script can then be explicitly invoked when safe to do so.

Creation Options

The second argument for ensureIndex is a document/object representing the options. These options are explained below.

option values default

background true/false false

dropDups true/false false

unique true/false false

sparse true/false false

v index version. 0 = pre-v2.0, 1 = smaller/faster (current) 1 in v2.0. Default is used except in unusual situations.

name is also an option but need not be specified and will be deprecated in the future. The name of an index is generated by concatenating the names of the indexed fields and their direction (i.e., 1 or -1 for ascending or descending). Index names (including their namespace/database), are limited to 128 characters.

The _id Index

For all collections except capped collections, an index is automatically created for the _id field. This index is special and cannot be deleted. The _id index enforces uniqueness for its keys (except for some situations with sharding).

_id values are invariant.

Indexing on Embedded Keys ("Dot Notation")

With MongoDB you can even index on a key inside of an embedded document. Reaching into sub-documents is referred to as Dot Notation. For example:

db.things.ensureIndex({"address.city": 1})

Documents as Keys

Indexed fields may be of any type, including (embedded) documents:

db.factories.insert( { name: "xyz", metro: { city: "New York", state: "NY" } } );

db.factories.ensureIndex( { metro : 1 } );

// this query can use the above index:

db.factories.find( { metro: { city: "New York", state: "NY" } } );

// this one too, as {city:"New York"} < {city:"New York",state:"NY"} db.factories.find( { metro: { $gte : { city: "New York" } } } );

// this query does not match the document because the order of fields is significant db.factories.find( { metro: { state: "NY" , city: "New York" } } );

An alternative to documents as keys is to create a compound index:

db.factories.ensureIndex( { "metro.city" : 1, "metro.state" : 1 } );

// these queries can use the above index:

db.factories.find( { "metro.city" : "New York", "metro.state" : "NY" } );

db.factories.find( { "metro.city" : "New York" } );

db.factories.find().sort( { "metro.city" : 1, "metro.state" : 1 } );

db.factories.find().sort( { "metro.city" : 1 } )

There are pros and cons to the two approaches. When using the entire (sub-)document as a key, compare order is predefined and is ascending key order in the order the keys occur in the BSON document. With compound indexes reaching in, you can mix ascending and descending keys, and the query optimizer will then be able to use the index for queries on solely the first key(s) in the index too.

Compound Keys Indexes

In addition to single-key basic indexes, MongoDB also supports multi-key "compound" indexes. Just like basic indexes, you use the function in to create the index, but instead of specifying only a single key, you can specify several : ensureIndex() the shell

db.things.ensureIndex({j:1, name:-1});

When creating an index, the number associated with a key specifies the direction of the index, so it should always be 1 (ascending) or -1

(descending). Direction doesn't matter for single key indexes or for random access retrieval but is important if you are doing sorts or range queries on compound indexes.

If you have a compound index on multiple fields, you can use it to query on the beginning subset of fields. So if you have an index on

a,b,c

you can use it query on

a,b

a,b,c

New in 1.6+

Now you can also use the compound index to service any combination of equality and range queries from the constitute fields. If the first key of the index is present in the query, that index may be selected by the query optimizer. If the first key is not present in the query, the index will only be used if hinted explicitly. While indexes can be used in many cases where an arbitrary subset of indexed fields are present in the query, as a general rule the optimal indexes for a given query are those in which queried fields precede any non queried fields.

Indexing Array Elements

When a document's stored value for a index key field is an array, MongoDB indexes each element of the array. See the Multikeys page for more information.

Sparse Indexes

Current Limitations

A sparse index can only have one field. SERVER-2193

New in 1.7.4.

A "sparse index" is an index that only includes documents with the indexed field.

Any document that is missing the sparsely indexed field will not be stored in the index; the index will therefor be sparse because of the missing documents when values are missing.

Sparse indexes, by definition, are not complete (for the collection) and behave differently than complete indexes. When using a "sparse index" for sorting (or in some cases just filtering) some documents in the collection may not be returned. This is because only documents in the index will be returned.

> db.people.ensureIndex({title : 1}, {sparse : true})

> db.people.save({name:"Jim"})

> db.people.save({name:"Sarah", title:"Princess"})

> db.people.find()

{ "_id" : ObjectId("4de6abd5da558a49fc5eef29"), "name" : "Jim" }

{ "_id" : ObjectId("4de6abdbda558a49fc5eef2a"), "name" : "Sarah", "title" : "Princess" }

> db.people.find().sort({title:1}) // only 1 doc returned because sparse

{ "_id" : ObjectId("4de6abdbda558a49fc5eef2a"), "name" : "Sarah", "title" : "Princess" }

> db.people.dropIndex({title : 1}) { "nIndexesWas" : 2, "ok" : 1 }

> db.people.find().sort({title:1}) // no more index, returns all documents { "_id" : ObjectId("4de6abd5da558a49fc5eef29"), "name" : "Jim" }

{ "_id" : ObjectId("4de6abdbda558a49fc5eef2a"), "name" : "Sarah", "title" : "Princess" }

You can combine sparse with unique to produce a unique constraint that ignores documents with missing fields.

Note that MongoDB's sparse indexes are not block-level indexes. MongoDB sparse indexes can be thought of as dense indexes with a specific

filter.

Unique Indexes

MongoDB supports unique indexes, which guarantee that no documents are inserted whose values for the indexed keys match those of an existing document. To create an index that guarantees that no two documents have the same values for both firstname and lastname you would do:

db.things.ensureIndex({firstname: 1, lastname: 1}, {unique: true});

Unique Indexes and Missing Keys

When a document is saved to a collection any missing indexed keys will be inserted with null values in the index entry. Thus, it won't be possible to insert multiple documents missing the same indexed key in a unique index.

db.things.ensureIndex({firstname: 1}, {unique: true});

db.things.save({lastname: "Smith"});

// Next operation will fail because of the unique index on firstname.

db.things.save({lastname: "Jones"});

dropDups

A unique index cannot be created on a key that has pre-existing duplicate values. If you would like to create the index anyway, keeping the first document the database indexes and deleting all subsequent documents that have duplicate values, add the dropDups option.

db.things.ensureIndex({firstname : 1}, {unique : true, dropDups : true})

Background Index Creation

By default, building an index blocks other database operations. v1.4+ has a background index build option – however this option has significant limitations in a replicated cluster (see doc page).

Dropping Indexes

To delete all indexes on the specified collection:

db.collection.dropIndexes();

To delete a single index:

db.collection.dropIndex({x: 1, y: -1})

Running directly as a command without helper:

// note: command was "deleteIndexes", not "dropIndexes", before MongoDB v1.3.2 // remove index with key pattern {y:1} from collection foo

db.runCommand({dropIndexes:'foo', index : {y:1}}) // remove all indexes:

db.runCommand({dropIndexes:'foo', index : '*'})

ReIndex

The reIndex command will rebuild all indexes for a collection.

db.myCollection.reIndex()

See here for more documentation: reIndex Command

Additional Notes on Indexes

MongoDB indexes (and string equality tests in general) are case sensitive.

When you update an object, if the object fits in its previous allocation area, only those indexes whose keys have changed are updated.

This improves performance. Note that if the object has grown and must move, all index keys must then update, which is slower.

Index information is kept in the system.indexes collection, run db.system.indexes.find() to see example data.

Keys Too Large To Index

Index entries have a limitation on their maximum size (the sum of the values), currently approximately 800 bytes. Documents which fields have values (key size in index terminology) greater than this size can not be indexed. You will see log messages similar to:

...Btree::insert: key too large to index, skipping...

Queries against this index will not return the unindexed documents. You can force a query to use another index, or really no index, using this special index hint:

db.myCollection.find({<key>: <value too large to index>}).hint({$natural: 1})

This will cause the document to be used for comparison of that field (or fields), rather than the index.

This limitation will eventually be removed (see SERVER-3372 ).

Index Performance

Indexes make retrieval by a key, including ordered sequential retrieval, very fast. Updates by key are faster too as MongoDB can find the document to update very quickly.

However, keep in mind that each index created adds a certain amount of overhead for inserts and deletes. In addition to writing data to the base collection, keys must then be added to the B-Tree indexes. Thus, indexes are best for collections where the number of reads is much greater than the number of writes. For collections which are write-intensive, indexes, in some cases, may be counterproductive. Most collections are

read-intensive, so indexes are a good thing in most situations.

Using sort() without an Index

You may use sort() to return data in order without an index if the data set to be returned is small (less than four megabytes). For these cases it is best to use limit() and sort() together.

Presentations

Indexing and Query Optimization - Presentation from MongoSV (December 2011)

More Presentations

Video introduction to indexing and the query optimizer

More advanced video and accompanying slides, with many examples and diagrams Intermediate level webinar and accompanying slides

Another set of intermediate level slides

In document MongoDB Docs (Page 191-196)