• No results found

Things Made Easy: One Click CMS Integration with Solr & Drupal

N/A
N/A
Protected

Academic year: 2021

Share "Things Made Easy: One Click CMS Integration with Solr & Drupal"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Things Made Easy: One Click

CMS Integration with Solr &

Drupal

Peter M. Wolanin, Ph.D.

Momentum Specialist (principal engineer), Acquia, Inc. Drupal contributor drupal.org/user/49851

co-maintainer of the Drupal Apache Solr Search Integration module May 10, 2012

(2)

What is Drupal?

What Apache Solr features are integrated with Drupal?

Why is Drupal plus Apache Solr is better than starting from scratch?

What elements of the search can you configure in the UI without code?

(3)

You are starting a new website project?

You are wondering how hard it is to actually integrate Apache Solr with a website?

You already use Drupal but not with Apache Solr?

You like things that are easy yet powerful?

(4)

Drupal: Web Application Framework +

CMS

== Social Publishing Platform

blogs / wikis forums / comments social ranking social tagging users social networks workflow taxonomy semantic web RSS content analytics Content Mgmt Systems Social Software Tools

Drupal “… is as much a Social Software platform as it is a web content management system.”

(5)

Drupal + Solr Provides Immediate

Access to Rich Search Features

Dynamic content requires dynamic navigation -

which is provided by an effective search Search facets mean no dead ends

Solr provides better keyword relevancy in results Much faster searches for sites with lots of content By avoiding database queries, Drupal with Solr scales better

(6)

DEMO:

A Drupal 7 partial copy of the conference

site with Apache Solr integration

(7)
(8)

Drupal Has User Accounts, Roles

& Permissions

Define custom roles Set granular access controls by role Configure user behavior: – Registration – Email – Profiles – Pictures

(9)

Drupal Modules Add

Functionality

“There’s a module for that”

More than 4100 Drupal 7 community modules

Often controlled by role-based permissions

Drupal core and modules are GPL v2+, and have a huge, active community

(10)

Drupal is Written in PHP, Which

Makes for Easy Customization

The Drupal architecture encourages and provides many avenues for customization by writing

modules but not patching Drupal core Drupal has a huge community of users.

Approximately 10,000 sites report to Drupal.org that they use the Apache Solr Search Integration module.

(11)
(12)

Drupal Entities are Content + Data

Node 7 Node 8 Node 9

Node 4 Node 5 Node 6

Node 1 Node 2 Node 3

Nodes are the basic entity used for text content

The entity system is

extensible - can represent any data

Examples of data stored within Drupal entities

– Text

– geographic location

(13)

Define new data fields on a node using the Field API module.

– Text, images, integers, date, reference, etc

Flexible and configurable in the UI

No programming required (many existing modules)

Entity Types are Enriched With

User-configurable Data Fields

(14)
(15)

A Strong Framework for

Content Classification

Core taxonomy system Modules provide

taxonomy-based appearance, access control

Standard input options include free tagging, flat-controlled, and

(16)

Drupal + Solr Search for Business,

Government and NGOs

http://www.mattel.com/search/ apachesolr_search/ http://www.hrw.org/en/search/apachesolr_search/ http://www.restorethegulf.gov/search/apachesolr_search/ http://www.nypl.org/search/apachesolr_search/ http://www.mylifetime.com/community/search/apachesolr_search/ http://opensource.com/search/apachesolr_search/ https://www.ethicshare.org/publications/ http://www.poly.edu/search/apachesolr_search/ https://www.eff.org/search/site/ http://www.whitehouse.gov/search/site/ http://www.emporia.edu/search/site/

(17)

Drupal Has Already Solved Many

Solr Integration Challenges

The most important - content indexing.

Facets, sorting, and highlighting of results.

Immediate integration with the More Like This and spell-check handlers.

Included sub-module integrates content access permissions by indexing to and filtering Solr

(18)

Easy Content Recommendation!

Uses the MLT handler

(19)

The Module Has a Pipeline for

Indexing Drupal Content to Solr

Drupal entities are processed into one (or more) document objects. Each document object is

converted to XML and sent to Solr.

title nid type

Node object Document object

Drupal functions entity_type label entity_id bundle XML string <doc> <field name="entity_type">node</field> <field name="label">Hello Drupal</field> <field name="entity_id">101</field>

<field name="bundle">session</field> </doc>

(20)

Entity Meta-data Gives

Automatic Facets!

Content types

Taxonomy terms per vocabulary

Content authors

Posted and modified dates Text and numbers selected via select list/radios/check boxes

(21)

Drupal Modules Implement hooks

to Control Indexing and Display

HOOK_apachesolr_index_document_build($document,

$entity, $entity_type, $env_id)

By creating a Drupal module (in PHP), you can

implement module and theme “hooks” to extend or alter Drupal behavior.

Change or replace the data normally indexed. Modify the search results and their appearance.

(22)

Updates to an Entity or Related

Meta-data Cause Reindexing

Drupal entities are indexed during Drupal cron (typically invoked via *nix cron).

By using a specialized tracking table, content can automatically be queued for reindex when changed, and subsets of content can potentially be sent to different Solr indexes.

Entities include many ID-based reference fields (e.g. the User ID of the author). Changes to the referenced data is also watched.

(23)

Indexing Tracking Tables Maintain

Order

+---+---+---+---+---+ | entity_type | entity_id | bundle | status | changed | +---+---+---+---+---+ | node | 36 | session | 1 | 1336520756 | | node | 37 | session | 1 | 1336510489 | | node | 38 | session | 1 | 1336510456 | | node | 39 | session | 1 | 1336510456 | | node | 40 | speaker_bio | 1 | 1336510456 | +---+---+---+---+---+

When a node is updated, the “changed” timestamp is updated.

The indexing pipeline tracks the largest timestamp and entity_id which has been indexed.

(24)

Example: Taxonomy Term

Classifying a Node is Changed

Grapefruit Citrus fruit

All nodes classified with this terms are queued to be re-indexed by setting the “changed”

column to the current time.

Thus you will correctly match ‘Citrus’ instead of ‘Grapefruit’ for those documents.

(25)

When Unpublished, Content is

Purged

Drupal core includes a simple editorial workflow where content may be toggled between

published (visible) and unpublished (incomplete, removed, spam, etc).

The module immediately removes content from the index when unpublished, and also tracks it for future removal in case the Solr server is

(26)

Search Using Dismax Query

Parsing & Boosting Features

Dynamic fields in schema.xml used to index standard and custom entity data fields

Dismax (or EDismax) handler used for keyword

searching across multiple fields and per-field boosts Query-time boosting options available in the UI

(27)

A Query Object Is Used to

Prepare and Run Searches

$query->setParam('hl.fl', $field);

$keys = $query->getParam('q');

$response = $query->search();

(28)

More Modules Available to

Add More Features

ApacheSolr Attachments

Apache Solr Multisite Search

Apache Solr Organic Groups Integration Apachesolr User indexing

Apachesolr Commerce

(29)

To Wrap Up !

Drupal has extensive Apache Solr integration already, and is highly customizable.

The Drupal platform is widely adopted, and the Drupal community drives rapid innovation.

Acquia provides Enterprise Drupal support and a network of partners.

Acquia includes a secure, hosted Solr index with every support subscription.

(30)

What is Drupal?

What Apache Solr features are integrated with Drupal?

Why is Drupal plus Apache Solr is better than starting from scratch?

What elements of the search can you configure in the UI without code?

(31)

http://www.solarium-project.org/

http://php.net/solr

http://pecl.php.net/package/solr

http://code.google.com/p/solr-php-client/

Other PHP Integration Tools

Caveat: don’t use serialized PHP response format in a custom integration - use JSON writer.

(32)

Do you love Drupal, Solr, the LAMP stack,

DevOps or anything related, and working at a fast-growing and successful startup?

Boston and Portland area U.S. offices.

Some remote opportunities as well.

Come talk to me!

[email protected]

pwolanin in IRC #drupal or #solr

(33)

Resources ... Questions? !

http://drupal.org/project/apachesolr http://drupal.org/project/apachesolr_attachments http://archive.org/details/ drupalconchi_day2_attain_apache_solr_coding_c hops http://www.acquia.com/tags/apachesolr http://groups.drupal.org/lucene-nutch-and-solr

References

Related documents