ContentslistsavailableatScienceDirect
Research
Policy
j o ur na l h o me p a g e :w w w . e l s e v i e r . c o m / l o c a t e / r e s p o l
Crowd
science:
The
organization
of
scientific
research
in
open
collaborative
projects
Chiara
Franzoni
a,∗,
Henry
Sauermann
baDepartmentofManagement,EconomicsandIndustrialEngineering(DIG),PolitecnicodiMilano,P.LeonardodaVinci32,Milan20133,Italy bGeorgiaInstituteofTechnology,SchellerCollegeofBusiness,800W.PeachtreeSt.,Atlanta,GA30308,USA
a
r
t
i
c
l
e
i
n
f
o
Articlehistory:
Received15November2012 Receivedinrevisedform12July2013 Accepted13July2013
Available online 14 August 2013 Keywords: Crowdscience Citizenscience Crowdsourcing Community-basedproduction Problemsolving Openinnovation Funding
a
b
s
t
r
a
c
t
Agrowingamountofscientificresearchisdoneinanopencollaborativefashion,inprojectssometimes referredtoas“crowdscience”,“citizenscience”,or“networkedscience”.Thispaperseekstogainamore systematicunderstandingofcrowdscienceandtoprovidescholarswithaconceptualframeworkand anagendaforfutureresearch.First,webrieflypresentthreecaseexamplesthatspandifferentfields ofscienceandillustratetheheterogeneityconcerningwhatcrowdscienceprojectsdoandhowthey areorganized.Second,weidentifytwofundamentalelementsthatcharacterizecrowdscienceprojects –openparticipationandopensharingofintermediateinputs–anddistinguishcrowdsciencefrom otherknowledgeproductionregimessuchasinnovationcontestsortraditional“Mertonian”science. Third,weexplorepotentialknowledge-relatedandmotivationalbenefitsthatcrowdscienceoffersover alternativeorganizationalmodes,andpotentialchallengesitislikelytoface.Drawingonpriorresearch ontheorganizationofproblemsolving,wealsoconsiderforwhatkindsoftasksparticularbenefitsor challengesarelikelytobemostpronounced.Weconcludebyoutlininganagendaforfutureresearchand bydiscussingimplicationsforfundingagenciesandpolicymakers.
© 2013 The Authors. Published by Elsevier B.V.
1. Introduction
Forthelastcentury,scientificactivityhasbeenfirmlyplaced inuniversitiesorotheracademicorganizations,government lab-oratories,or in theR&D departments of firms.Sociologists and economists,inturn,havemadegreatprogressunderstandingthe functioningofthisestablishedsystemofscience(Merton,1973; Zuckerman,1988;DasguptaandDavid,1994;Stephan,2012).The lastfewyears,however,havewitnessedtheemergenceofprojects thatdonotfitthemoldoftraditionalscienceandthatappearto followdistinctorganizingprinciples.Foldit,forexample,isa large-scalecollaborativeprojectinvolvingthousandsofparticipantswho advanceourunderstandingofproteinfoldingatanunprecedented speed, usinga computergame astheirplatform. GalaxyZoo is a projectinvolvingover250,000 volunteerswho helpwiththe collectionofastronomicaldata,andwhohavecontributedtothe discoveryofnewclassesofgalaxiesandadeeperunderstanding
∗ Correspondingauthor.Tel.:+39223994823;fax:+39223992730. E-mailaddresses:[email protected](C.Franzoni),
[email protected](H.Sauermann).
oftheuniverse.Finally,PolymathinvolvesacolorfulmixofFields Medalistsandnon-professionalmathematicianswhocollectively solveproblemsthathavelongeludedthetraditionalapproachesof mathematicalscience.
Whileacommontermfortheseprojectshasyettobefound, theyarevariouslyreferredtoas“crowdscience”,“citizenscience”, “networkedscience”,or“massively-collaborativescience”(Young, 2010;Nielsen,2011;WigginsandCrowston,2011).Eventhough thereissignificantheterogeneityacrossprojects,theyarelargely characterizedbytwoimportantfeatures:participationinaproject isopentoawidebaseofpotentialcontributors,andintermediate inputssuchasdataorproblemsolvingalgorithmsaremadeopenly available.Whatwewillcall“crowdscience”isattractinggrowing attentionfromthescientificcommunity,butalsopolicymakers, fundingagenciesandmanagerswhoseektoevaluateitspotential benefitsandchallenges.1Basedontheexperiencesofearlycrowd scienceprojects,theopportunitiesareconsiderable.Amongothers, crowdscienceprojectsareabletodrawontheeffortand knowl-edgeinputsprovidedbyalargeanddiversebaseofcontributors,
1Forexample,scientificjournalshavepublishedspecialissuesoncitizenscience,
thetopichasbeendiscussedinmanagerialoutletssuchastheSloanManagement Review(Brokaw,2011),nationalfundingagenciesintheUSandothercountries activelyfundcrowdscienceprojects,andtheLibraryofCongressisdiscussinghow crowdscienceartifactssuchasblogsanddatasetsshouldbepreservedandcurated. 0048-7333© 2013 The Authors. Published by Elsevier B.V.
http://dx.doi.org/10.1016/j.respol.2013.07.005
Open access under CC BY-NC-ND license.
potentiallyexpandingtherangeofscientificproblemsthatcanbe addressedatrelativelylowcost,whilealsoincreasingthespeed atwhichtheycanbesolved.Indeed,crowdscienceprojectshave resultedinanumberofhigh-profilepublicationsinscientific out-letssuchasNature,ProceedingsoftheNationalAcademyofSciences (PNAS),andNatureBiotechnology.Atthesametime,crowdscience projectsfaceimportantchallengesinareassuchasattracting con-tributorsorcoordinatingthecontributionsofalargenumberof participants.Adeeperunderstandingofthesevariousbenefitsand challengesmayallowustoassessthegeneralprospectsofcrowd science,butalsotoconjectureforwhichkindsofscientificproblems crowdsciencemaybemore–orless–suitablethanalternative modesofknowledgeproduction.
Despite thegrowingnumber of crowd science projects in a widerange offields (seeTable1 for examples),scholarlywork oncrowdscienceitselfislargelyabsent.Weaddressthislackof researchinseveralways.First,weintroducethereadertocrowd sciencebybrieflypresentingthreecasestudiesofcrowdscience projectsinbiochemistry,astronomy,andmathematics.Thesecase studiesillustratetheheterogeneityconcerningwhatcrowdscience projectsdoandhowtheyareorganized.Second,weidentify organi-zationalfeaturesthatarecommontocrowdscienceprojectswhile alsodistinguishingthemfromprojectsin“traditionalscience”and otheremergingorganizationalparadigmssuchascrowdsourcing andinnovationcontests.Third,wediscusspotentialbenefitsand challengescrowdscienceprojectsarelikelytoface,howchallenges maybeaddressed,andforwhatkindsoftasksthebenefitsand challengesmaybemostpronounced.Indoingso,webuildupon organizationaltheoriesofproblemsolvingaswellaspriorwork inareassuchasopeninnovationandopensourcesoftware devel-opment.Wethenoutlineanagendaforfutureresearchoncrowd scienceandalsodiscusshowcrowdsciencemayserveasaunique settingforthestudyofproblemsolvingmoregenerally.Finally, wediscusspotentialimplicationsforfundingagenciesandpolicy makers.
2. Examplesofcrowdscienceprojects 2.1. Foldit
Bythe1990s,scientistshaddevelopedsignificantinsightsinto thebiochemicalcompositionofproteins.However,theyhadavery limitedunderstandingof proteinstructureandshapes.Shape is importantbecauseitexplainsthewayinwhichproteinsfunction andinteractwithcells, virusesor proteinsof thehumanbody. Forexample,a suitablyshapedproteincouldblockthe replica-tionofavirus.Oritcouldsticktotheactivesiteofabiofueland catalyzeachemicalreaction.Conventionalmethodstodetermine proteinshapesincludedX-raycrystallography,nuclearmagnetic resonancespectroscopy,andelectronmicroscopy.Unfortunately, thesemethodswereextremelyexpensive,costingupto100,000 dollarsforasingleprotein,withmillionsofproteinstructuresyet tobedetermined.In2000DavidBakerandhislabattheUniversity ofWashington,Seattle,receivedagrantfromtheHowardHughes MedicalInstitutetoworkonshapedeterminationwith computa-tionalalgorithms.Researchersbelievedthatinprinciple,proteins shouldfoldsuchthattheirshapeemploystheminimumlevelof energytopreventtheproteinfromfallingapart.Thus, computa-tionalalgorithmsshouldbeabletodeterminetheshapeofaprotein basedontheelectricchargesofitscomponents.However,each portionofaproteinsequenceiscomposedofmultipleatomsand eachatomhasitsownpreferencesforbondingwithorstanding apartfromotheratoms,resulting ina largenumberof degrees offreedominasinglemolecule,makingcomputationalsolutions extremelydifficult.Bakerandhislabdevelopedanalgorithmcalled
Rosettathatcombineddeterministicandstochastictechniquesto computethelevelofenergyofrandomlychosenproteinshapes insearchofthebestresult.Afterseveralyearsofimprovements, the algorithm worked reasonably well, especially on partially determinedshapes.Becausethecomputationwasextremely inten-sive,inthefallof2005theteamlaunchedRosetta@home,agrid systemthatallowedvolunteerstomakeavailablethespare com-putationalcapacityoftheirpersonalcomputers.Acriticalfeature ofRosetta@homewasavisualinterfacethatshowedproteinsas theyfolded.Althoughthevolunteersweremeanttocontributeonly computationalpower,lookingattheRosettascreensavers,someof thempostedcommentssuggestingbetterwaystofoldtheproteins thanwhattheysawthecomputerdoing.Theseotherwisenaïve commentsinspiredgraduatestudentsattheComputerScienceand Engineeringdepartmentandpost-docsatBaker’slab.Theybeganto wonderifhumanvisualabilitycouldcomplementcomputerpower inareaswherethecomputerintelligencewasfallingshort.They developedaweb-basedgamecalledFolditthatenabledplayersto modelthestructureofproteinswiththemoveofthemouse. Play-erscouldinspectatemplatestructurefromdifferentangles.They couldthenmove,rotateorflipchainbranchesinsearchof bet-terstructures.Thesoftwareautomaticallycomputedthelevelof energyofnewconfigurations,immediatelyshowingimprovement orworsening.Certain commonstructuralproblems,suchasthe existenceofclashesorvacuumsintheprotein,werehighlightedin red,sothattheplayercouldquicklyidentifyareasofimprovement. AmenuofautomatictoolsborrowedfromRosettaenabledeasy localadjustmentsthatthecomputercoulddobetterthanhumans. Forexample,proteinscouldbewiggledorsidechainsshakenwith asimpleclickofthemouse.Thesefeatures,combinedwithafew onlinetutorials,allowedpeopletostartfoldingproteinswithout knowingvirtuallyanythingaboutbiochemistry.Mostinterestingly, thesoftwarewasdesignedas acomputergameand includeda scoreboardthatlistedplayers’performance.Assuch,players com-petedtoclimbupintherankingsandtheycouldalsosetupteams andsharestrategiestocompeteagainstotherteams.Fig.1provides animpressionoftheFolditinterface.2
ThegamewasinitiallylaunchedinMay2008.BySeptemberof thesameyearithadalreadyengaged50,000users.Theplayerswere initiallygivenknownproteinstructuressothattheycouldseethe desiredsolution.Afterafewmonthsofpractice,severalplayershad workedtheirwaytoshapesveryclosetothesolutionandinseveral caseshadoutperformedthebeststructuresdesignedbyRosetta (Cooperetal.,2010).Therewasmuchexcitementatthelaband theresearchersinvitedafewtopplayerstowatchthemplaylive. Fromtheseobservations,itbecameclearthathumanintuitionwas veryusefulbecauseitallowedplayerstomovepastthetrapsoflocal optima,whichcreatedconsiderableproblemsforcomputers.One yearafterlaunchtherewereabout200,000activeFolditplayers.
Inthefollowingmonths,thedevelopmentofRosettaandthat ofFolditproceededincombination.SomeproteinswhereRosetta wasfailingweregiventoFolditplayerstoworkon.Inexchange, playerssuggested additional automatic toolsthat they thought thecomputershouldprovideforthem.Meanwhile,playerssetup teamswithnamessuchas“AnotherHourAnotherPoint”or“Void Crushers”andusedchatsandforumstointeract.Someplayersalso began toelaboratetheirown“recipes”, encoded strategies that couldbecomparedtothosecreatedinthelab.Forexample,some strategiesinvolvedwigglingasmallpartofaprotein,ratherthan theentirestructure,otherswerebasedonfusingproteinpartsor
2TheFolditcasestudyisbasedonthefollowingwebsources:http://fold.it; http://www.youtube.com/watch?v=2adZW-mpOk; http://www.youtube.com/ watch?v=PE048WhCCA; http://www.youtube.com/watch?v=nfxGnCcx9Ag;
C. Franzoni, H. Sauermann / Research Policy 43 (2014) 1– 20 3
Examplesofcrowdscienceprojects.
Name URL Hostingplatform Field Primarytasks
AncientLives http://ancientlives.org Zooniverse Archeology InspectfragmentsofEgyptianpapyri andtranscribecontent
Argus http://argus.survice.com/ None Cartography Measureseabeddepthduring navigation
BatDetective http://www.batdetective.org/ Zooniverse Biology Listentobatcallsandclassifythem Connect2Decode
(C2D)*
https://sites.google.com/a/osdd.net/c2d-01/ None Genetics AnnotategenesofMycobacterium tuberculosis
Connect2Decode: Cloning,Expression andPurificationof Proteins
http://c2d.osdd.net/ Connect2Decode Genetics ClonegenesinMycobacterium tuberculosis
Connect2Decode: Cheminformatics andChemicalData Mining
http://c2d.osdd.net/home/cheminformatics Connect2Decode Genetics Usechemicaldescriptorsanddata miningtoidentifyandannotate moleculeswithdesirableproperties CrowdComputingfor
Cheminformatics
http://vinodscaria.rnabiology.org/2C4C CSIRNetworkprojecton ncRNAbiology
Bioinformatics Developpredictivemodelsfor biologicalactivitiesofchemical molecules
Cheminformatics CrowdComputing forTubercolosisDrug Discovery
http://vinodscaria.rnabiology.org/cheminformatics-crowd-computing-for-tuberculosis CSIRNetworkprojecton ncRNAbiology
Bioinformatics Annotatedatasetofactive anti-tubercularmolecules,byusing existingorimprovedprioritization algorithms,toidentifyasmallsubsetof drugcandidates
OpenDinosaurProject http://opendino.wordpress.com/about/ None Paleontology Searchinpublishedliteratureand reportdinosaurlimblengthand/or measureandreportfrommuseum specimens
CycloneCenter http://www.cyclonecenter.org/ Zooniverse Climatology Watchsatelliteimagesofcyclonesand classifystorms
Discoverlife http://www.discoverlife.org/ None Biology Observeandreportimagesand locationofwildspecies eBird http://www.ebird.org/ CornellLabofOrnithology Biology Observeandreportimagesand
locationofwildbirds
EteRNA http://eterna.cmu.edu/ None Biochemistry Game.ModifytheshapeofRNAbases combinationtofitafoldedstructure Foldit http://www.fold.it None Biochemistry Game.Modifyavisual3Dmodelof
proteintooptimizeitsshape GalaxyZoo http://www.galaxyzoo.org/ Zooniverse Astronomy Inspectandclassifyimagesofgalaxies GreatBackyardBird
Count
http://birdsource.org/gbbc/ CornellLabofOrnithology Biology Reportfrequencyofbirdsightings occurringduring15minperiods GreatSunflower
Project
http://www.greatsunflower.org/ None Biology Growflowersthatattractbeesand otherpollinationinsects,observeand reportfrequencyofinsectsvisits IceHunters* http://www.icehunters.org Zooniverse Astronomy Inspecttelescopeimagesandidentify
possibletargetsforNewHorizons mission
MilkywayProject http://www.milkywayproject.org Zooniverse Astronomy Inspecttelescopeimages,identify cloudsandbubbles
MoonZoo http://www.moonzoo.org/ Zooniverse Astronomy InspectsatelliteimagesoftheMoon andreportcraters
Nestwatch http://nestwatch.org CornellLabofOrnithology Biology Observenestactivities(eggs,young, fledglings)andreportcount NotesfromNature http://www.notesfromnature.org/ Zooniverse Biology Inspectimagesofspecimensdigitized
fromnaturalhistorymuseumsand transcribetheirdescriptions
C. Franzoni, H. Sauermann / Research Policy 43 (2014) 1– 20 Table1(Continued)
Name URL Hostingplatform Field Primarytasks
OldWeather http://www.oldweather.org Zooniverse Climatology InspectimagesoflogbooksofRoyal Navyshipsorarticexplorationships andtranscribeclimateandlocation reports
Patientslikeme http://www.patientslikeme.com/ None Medicine Inputpersonaldataaboutsymptoms andtreatmentofdiseases
PigeonWatch* http://www.birds.cornell.edu/pigeonwatch CornellLabofOrnithology Biology Observepigeonspeciesandreport
imagesand/orlocationofobservation Phylo http://phylo.cs.mcgill.ca None Genetics Game.Inspectsequenceofcolored
shapesandmoveshapestooptimize theiralignment
PlanetHunters http://www.planethunters.org Zooniverse Astronomy Inspectstarlightcurvesregisteredby theKeplerspacecraftandreport possibleplanettransits Polymath1.Density
Hales–Jewett*
http://gowers.wordpress.com/2009/02/01/a-combinatorial-approach-to-density-hales-jewett/ None Mathematics Solveamathematicalproblem Polymath2.Banach
Spaces*
http://gowers.wordpress.com/2009/02/17/must-an-explicitly-defined-banach-space-contain-c0-or-ellp/ None Mathematics Solveamathematicalproblem Polymath3:
PolynomialHirsh Conjecture*
http://gilkalai.wordpress.com/2009/07/17/the-polynomial-hirsch-conjecture-a-proposal-for-polymath3/ None Mathematics Proveamathematicalconjecture
Polymath4: Deterministicwayto findprimes*
http://polymathprojects.org/2009/07/27/proposal-deterministic-way-to-find-primes/ Polymathprojects Mathematics Solveamathematicalproblem
Polymath5:Erdos Discrepancy*
http://gowers.wordpress.com/2009/12/17/erdoss-discrepancy-problem/ Polymathprojects Mathematics Solveamathematicalproblem Polymath6:Improving
theboundsforRoth’s theorem
http://polymathprojects.org/2011/02/05/polymath6-improving-the-bounds-for-roths-theorem/ Polymathprojects Mathematics Solveamathematicalproblem
Polymath7:HotSpots Conjecture
http://polymathprojects.org/2012/06/03/polymath-proposal-the-hot-spots-conjecture-for-acute-triangles/ Polymathprojects Mathematics Proveamathematicalconjecture Polymath8:Bounded
gapsbetweenprimes
http://polymathprojects.org/2013/06/04/polymath-proposal-bounded-gaps-between-primes/ Polymathprojects Mathematics Solveamathematicalproblem ProjectFeederWatch http://www.birds.cornell.edu/pfw/ CornellLabofOrnithology Biology Installabirdfeederinbackyard,
observeandreportfrequencyofbird visits
SeafloorExplorer http://www.seafloorexplorer.org/ Zooniverse Biology Inspectseafloorimages,identifyand reporttargetspecies
Setilive http://setilive.org/ Zooniverse Astronomy Inspectliveradiofrequencysignals broadcastedbytheSETIInstitute’s AllenTelescope.Consistent simultaneousreportsofpotential extraterrestrialsignalsprompta secondtelescopeinspectionwithin minutes
SecchiApp https://www1.plymouth.ac.uk/marine/secchidisk/Pages/default.aspx None Biology Measuretheturbidityofseawater usingaSecchidiskanduseamobile apptoreportthecorresponding concentrationofphytoplankton SnapshotSerengeti http://www.snapshotserengeti.org/ Zooniverse Biology InspectpicturesoftheSerengetiLion
Projectandclassifyimagesofwildlife SolarStormwatch http://www.solarstormwatch.com Zooniverse Astronomy Watchvideosofsolaractivityand
reportinceptionandlengthofsolar explosions
Space NEEMO * https://www.zooniverse.org/lab/neemo Zooniverse Biology Inspect underwater images, identify and classify marine life Spacewraps https://www.spacewraps.org Zooniverse Astronomy Inspect telescope images and report possible gravitational lenses Stardust@home http://stardustathome.ssl.berkeley.edu/ None Astronomy Inspect magnified images of pristine space material and report possible stardust particles Sun Lab http://www.pbs.org/wgbh/nova/labs/lab/sun/ NovaLabs Astronomy Inspect images of sun, report and classify sun spots Whale Song http://whale.fm Zooniverse Zoology Listen to underwater recordings of whale chants from ships and report if two chants are similar What’s the score http://www.whats-the-score.org/ Zooniverse Music Inspect pictures of musical score covers from the Bodleian Libraries and transcribe elements in covers Yardmap Network http://www.yardmap.org/ Cornell Lab of Ornithology Biology Report frequency of bird sightings in own backyard; report backyard characteristics and conditions *Projects completed or inactive as of July, 2013.
bandingchains.Complexstrategiesarenow oftenembeddedin tools(“recipes”)thatcansubsequentlybedownloadedandused byothers.3
Theresultswerestriking.A playerstrategycalled“Bluefuse” completelydisplaced Rosetta“ClassicRelax”,andoutperformed “FastRelax”,apieceofcodethattheRosettadevelopershadworked onforquitealongtime.TheseresultswerepublishedinPNASand theplayersofFolditwereco-authorsunderacollectivepseudonym (Khatibet al., 2011a).In December 2010,encouraged by these results,FirasKhatib,FrankDiMaioinBaker’slabandSethCooper inComputerScienceandEngineering,amongothersworkingon Foldit,thoughtthattheplayerswerereadyforareal-world chal-lenge.Consultingwithagroupofexperimentalists,theychosea monomericretroviralprotease,aproteinknowntobecriticalfor monkey-virusHIVwhosestructurehadpuzzledcrystallographers foroveradecade.Threeplayersfromthe“Contenders”Folditteam cametoafairlydetailedsolutionoftheproteinstructureinlessthan threeweeks(publishedasKhatibetal.,2011b).AsofSeptember 2012,Folditplayerswereco-authorsoffourpublicationsintop journals.
2.2. GalaxyZoo
In January2006,a StardustSpacecraft capsulelandedin the Utahdesertwithprecioussamples ofinterstellar dustparticles afterhavingencounteredthecometWild2.Particlesinthe sam-plewereastinyasamicron,andNASAtook1.6millionautomated scanningmicroscopeimagesoftheentiresurfaceofthecollector. Becausecomputersarenotparticularlygoodatimagedetection, NASAdecidedtouploadtheimagesonlineandtoaskvolunteers tovisuallyinspectthematerialandreportcandidatedust parti-cles.Theexperiment,knownasStardust@Home,hadalargeecho inthecommunityofastronomers,wheretheinspectionoflarge collectionsofimagesisacommonproblem.4
In2007,KevinSchawinski,thenaPhDinChrisLintott’sgroup attheUniversityofOxford,thoughtaboutusingthesamestrategy, althoughthegroup’sproblemwasdifferent.Inordertounderstand theformationofgalaxies,theyweresearchingforthose ellipti-calgalaxiesthathadformedstarsmostrecently.Inthespringof 2007theirinsightswere basedona limitedsample ofgalaxies thatSchawinskihadcodedmanually,butmoredatawereneeded toreallyprovetheirpoint.Afewmonthsearlier,theSloan Dig-ital Sky Survey (SDSS) had made available 930,000 pictures of distantgalaxiesthatcouldprovidethemwithjusttheraw mate-rialtheyneededfortheirwork.Tobeabletoprocessthislarge amountofdata,theresearcherscreatedtheonlineplatformGalaxy Zoo inthesummerof 2007.Volunteerswere askedtosignup, readanonlinetutorial,andthencodesixdifferentpropertiesof astronomicalobjectsvisibleinSDSSimages.Participationbecame quicklyviral,partlybecausetheimageswerebeautifultolookat, and partlybecausetheBBC publicizedtheinitiative in itsblog. Beforetheprojectstarted,thelargestpublishedstudywasbased on3000galaxies.Sevenmonthsaftertheprojectwaslaunched, about900,000 galaxieshad beencodedand multiple classifica-tionsofagivengalaxybydifferentvolunteerswereusedtoreduce theincidenceofincorrectcoding,foratotalofroughly50million classifications.Foranindividualscientist,50millionclassifications
3http://foldit.wikia.com/wiki/Recipes;retrievedMarch29,2013.
4The Galaxy Zoo description is based on Nielsen (2011) and
on the following web sources: http://www.galaxyzoo.org/story;
http://supernova.galaxyzoo.org/; http://mergers.galaxyzoo.org/; http://www. youtube.com/watch?v=jzQIQRr1Bo&playnext=1&list=PL5518A8D0F538C1CC& feature=resultsmain; http://data.galaxyzoo.org/; http://zoo2.galaxyzoo.org/;
http://hubble.galaxyzoo.org/; http://supernova.galaxyzoo.org/about#supernovae; retrievedSeptember24,2012.
Fig.1.Foldituserinterface.Source:http://fold.it/portal/info/about.
wouldhaverequiredmorethan83yearsoffull-timeeffort.5The GalaxyZoodataallowedLintott’steamtosuccessfully complete thestudytheyhadinitiallyplanned.However,thiswasjustthe beginningofGalaxyZoo’scontributionstoscience,partlybecause participantsdidnotsimplycodegalaxies–theyalsodevelopedan increasingcuriosityforwhattheysaw.ConsiderthecaseofHanny vanArkel,aDutchschoolteacherwhointhesummerof2007 spot-tedanunusual“voorwerp”(Dutchfor “thing”)thatappearedas anirregularly-shapedgreencloudhangingbelowagalaxy.After herfirstobservation,astronomersbegantopointpowerful tele-scopestowardthecloud.Itturnedoutthatthecloudwasaunique object that astronomers now explain as being an extinguished quasarwhoselightechoremainsvisible.Zooitesalsoreportedother unusualgalaxiesfortheircolororshape,suchasverydensegreen galaxiesthattheynamed“Greenpeagalaxies”(Cardamoneetal., 2009).Akeywordsearchfor“Hanny’sVoorwerp”inWebof Knowl-edgeas ofJanuary 2013showedeight publishedpapers, and a searchfor“Greenpeagalaxies”gavesix.6
ThecodedGalaxyZoodataweremadepubliclyavailablefor furtherinvestigationsin2010.AsofSeptember2012,therewere 141scientificpapersthatquotethesuggestedcitationforthedata release,36ofwhicharenotcoauthoredbyLintottandhisgroup. AfterthesuccessofthefirstGalaxyZooproject,GalaxyZoo2was launchedin2009tolookmorecloselyatasubsetof250,000 galax-ies.GalaxyZooHubblewaslaunchedtoclassifyimagesofgalaxies madeavailablebyNASA’sHubbleSpaceTelescope.Otherprojects lookedatsupernovaeandatmerginggalaxies.Threeyearsafter GalaxyZoostarted,250,000Zooiteshadcompletedabout200 mil-lionclassifications,andcontributorsarecurrentlyworkingonthe latestandlargestreleaseofimagesfromtheSDSS.
ThesuccessofGalaxyZoosparkedinterestinvariousareasof sci-enceandthehumanities.In2009,Lintottandhisteamestablished acooperationwithotherinstitutionsintheUKandUSA(theCitizen ScienceAlliance)torunanumberofprojectsonacommon plat-form“TheZooniverse”.TheZooniverseplatformreceivedfinancial
5 Schawinskiestimatedhismaximuminspectionrateasbeing50,000coded
galaxiesper month. http://www.youtube.com/watch?v=jzQIQRr1Bo&playnext= 1&list=PL5518A8D0F538C1CC&feature=resultsmain; retrieved September 21, 2012.
6 Thisisanincompletecountbecausemanyarticlesrefertotheseobjectsby
dif-ferentnames.AcitationsearchoftheCardamoneetal.(2009)paperintheSAO/NASA AstrophysicsDataSystemreportedover40entries.
supportfromtheAlfredP.SloanFoundationin2011andcurrently hostsprojectsinfieldsasdiverseasastronomy,marinebiology, climatologyandmedicine.Recentprojectshavealsoinvolved par-ticipantsinabroadersetoftasksandincloserinteractionwith machines.For example, contributors tothe Galaxy Supernovae projectwereshownpotentiallyinteresting targetsidentifiedby computersatthePalomarTelescope.Thecontributorsscreenedthe largenumberofpotentialtargetsandselectedasmallersubsetthat seemedparticularlypromisingforfurtherobservation.This itera-tiveprocesspermittedastronomerstosavepreciousobservation timeatlargetelescopes.Lintottthinksthatinthefuturevolunteers willbeusedtoprovidereal-timejudgmentswhencomputer pre-dictionsareunreliable,combiningartificialandhumanintelligence inthemostefficientway.7Inanewproject(GalaxyZooQuench), contributorsparticipateinthewholeresearchprocessby classi-fyingimages,analyzingdata,discussingfindings,andcollectively writinganarticleforsubmission.
2.3. Polymath
TimothyGowersisaBritishmathematiciananda1998Fields Medal recipient for his work on combinatorics and functional analysis.8Aneclecticpersonalityandanactiveadvocateof open-ness in science, he keeps a regular blog that is well read by mathematicians.OnJanuary29,2009hepostedacommentonhis blogsayingthathewouldliketotryanexperimenttocollectively solveamathematicalproblem.Inparticular,hestated:
Theidealoutcomewouldbeasolutionoftheproblemwithno singleindividualhavingtothinkallthathard.[...]Sotrytoresist
7http://www.youtube.com/watch?v=jzQIQRr1Bo&playnext=1&list= PL5518A8D0F538C1CC&feature=resultsmain;retrievedSeptember21,2012.
8ThePolymathcasestudyisbasedonNielsen(2011)aswellasthefollowing
sources: http://gowers.wordpress.com/2009/01/27/is-massively-collaborative-mathematics-possible/ http://gowers.wordpress.com/2009/01/30/background-to-a-polymath-project/; http://gowers.wordpress.com/2009/02/01/a-combinatorial-approach-to-density-hales-jewett/; http://gowers.wordpress.com/2009/03/10/ problem-solved-probably/; http://mathoverflow.net/questions/31482/the-sensitivity-of-2-colorings-of-the-d-dimensional-integer-lattice. http://gilkalai. wordpress.com/2009/07/17/the-polynomial-hirsch-conjecture-a-proposal-for-polymath3/; http://en.wordpress.com/tag/polymath4/; http://gowers.wordpress. com/2010/01/06/erdss-discrepancy-problem-as-a-forthcoming-polymath-project/;
http://gowers.wordpress.com/2009/12/28/the-next-polymath-project-on-this-blog/;retrievedSeptember,2012.
thetemptationtogoawayandthinkaboutsomethingandcome backwithcarefullypolishedthoughts:justgivequickreactions towhatyouread[...],explainbriefly,butaspreciselyasyou can,whyyouthinkitisfeasibletoanswerthequestion.9 Inthenextfewhours,severalreaderscommentedonhisidea. Theyweregenerallyinfavoroftryingtheexperimentandbegan todiscusspracticalissueslikewhetherornotablogwastheideal formatfortheproject,andiftheoutcomeshouldbeapublication withasinglecollectivename,orratherwithalistofcontributors. Encouragedby thepositivefeedback, Gowerspostedtheactual problemonFebruary1st:acombinatorialprooftothedensity ver-sionoftheHales–Jewetttheorem.The discussionthat followed spanned6weeks(seeFig.2foranexcerpt).
Among the contributors were several university professors, includingTerryTao,atop-notchmathematicianatUCLAandalso aFieldsMedalist,aswellasseveralschoolteachersandPhD stu-dents.Afterafewdays,thediscussionhadbranchedoutintoseveral threadsandawikiwascreatedtostoreargumentsandideas. Cer-tain contributors weremore active than others, but significant progresswascomingformvarioussources.OnMarch10,Gowers announcedthattheproblemwasprobablysolved.Heandafew colleaguestookonthetaskofverifyingtheworkanddraftinga paper,andthearticlewaseventuallypublishedintheAnnalsof Mathematicsunderthepseudonym“D.H.J.Polymath”(Polymath, 2012).
ThrilledbythesuccessoftheoriginalPolymathproject, sev-eralofGowers’colleagueslaunchedsimilarprojects,thoughwith varying degrees of success. In June 2009, Terence Tao orga-nized a collaborative entry to the International Mathematical Olympiadtakingplaceannuallyinthesummer.Similarprojects havebeensuccessfullycompletedeveryyearandareknownas Mini-Polymath projects.Scott Aaronson begana projectonthe “sensitivityof 2-coloringsof thed-dimensional integerlattice”, whichwasactiveforoverayear,butdidnotgettoafinal solu-tion.GilKalaistartedaprojectonthe“PolynomialHirshconjecture” (Polymath3),againwithinconclusiveresults.TerenceTaolaunched aprojectforfindingprimesdeterministically(Polymath4),which wassuccessfullycompletedandledtoapublicationinMathematics ofComputation(Taoetal.,2012).10InJanuary2010Timothy Gow-ersandGilKalaibegancoordinatinganewPolymathprojectonthe ErdosDiscrepancyProblem(knownasPolymath5).Aninteresting aspectofthisprojectisthattheparticularproblemtobesolvedwas chosenthroughapublicdiscussiononGowers’blog.Several Poly-mathprojectsarecurrentlyrunning.Despitethegrowingnumber ofprojects,however,thenumberofcontributorstoeach particu-larprojectremainsrelativelysmall,typicallynotexceedingafew dozen.11
Itisinterestingtonotethatovertime,Polymathprojectshave developedorganizationalpracticesthatfacilitatecollective prob-lemsolving.Inparticular,acommonchallengeisthatwhenthe discussiondevelopsintohundredsofcomments,itbecomes dif-ficultforcontributorstounderstandwhichtracksarepromising, which have been abandoned, and where the discussion really stands.Thechronologicalorderofcommentsisnotalways infor-mativebecausepeoplerespondtodifferentpostsandtheproblems branchoutinparallelconversations,makingitdifficulttocontinue
9 http://gowers.wordpress.com/2009/01/27/is-massively-collaborative-mathematics-possible/,retrievedSeptember,2012.
10WhileoriginallysubmittedunderthepseudonymD.H.J.Polymath,thispaper
endedupbeingpublishedwithaconventionalsetofauthorsattheinsistenceofthe journaleditors,whofeltthatthepseudonymwouldconflictwithbibliographical record-keeping(emailcommunicationwithTerenceTao,May22,2013).
11 http://michaelnielsen.org/blog/the-polymath-project-scope-of-participation/;
retrievedSeptember28,2012.
discussions ina meaningful way. Toovercometheseproblems, Gowers,Taoandotherprojectleadersstartedtotakeontheroleof moderators.Whenthecommentsonatopicbecometoomanyor toounfocused,theysynthesizethelatestresultsinanewpostor openseparatethreads.Polymathalsoinspiredmathoverflow.net, ausefultoolforthebroaderscientificcommunity.Launchedinthe fallof2009,thisplatformallowsmathematicianstopostquestions orproblemsandtoprovideanswersorrateothers’answersina commentedblog-stylediscussion. Insomecases,thediscussion developsinwayssimilartoasmallPolymathprojectandanswers havealsobeencitedinscholarlyarticles.
2.4. Overviewofadditionalcrowdscienceprojects
Table1showsadditionalexamplesofcrowdscienceprojects. Thetablespecifiestheprimaryscientificfieldofaproject, illus-tratingthebreadthofapplicationsacrossfieldssuchasastronomy, biochemistry,mathematics,orarcheology.Wealsoindicatewhat kindoftaskisperformedbythecrowd,e.g.,theclassificationof imagesandsounds,thecollectionofobservationaldata,or collec-tiveproblemsolving.Thiscolumnshowsthevarietyoftasksthat canbeaccomplishedincrowdscienceprojects.Finally,Table1 indi-cateswhichprojectsarepartoflargercrowdscienceplatforms.We willdrawontheearliercasesandtheexampleslistedinTable1 throughoutoursubsequentdiscussion.
3. Characterizingcrowdscienceandexploring heterogeneityacrossprojects
Examplessuchasthosediscussedinthepriorsectionprovide fascinatinginsightsintoanemergingwayofdoingscienceandhave intriguedmanyobservers.However,whilethereisagreementthat theseprojectsaresomehow“different”fromtraditionalscience, asystematicunderstandingoftheconcretenatureofthese differ-encesislacking.Similarly,itseemsimportanttoconsidertheextent towhichtheseprojectsdifferfromotheremergingapproachesto producingknowledgesuchascrowdsourcingorinnovation con-tests.
InthefollowingSection3.1,weidentifytwokeyfeaturesthat distinguishcrowdsciencefromotherregimesofknowledge pro-duction: openness in project participation and openness with respecttothedisclosureofintermediate inputssuchasdataor problemsolvingapproaches.Notallprojectssharethesefeatures tothesamedegree;however,thesedimensionstendtodistinguish crowdsciencefromotherorganizationalforms,whilealsohaving importantimplicationsforopportunitiesandchallengescrowd sci-enceprojectsmayface.InthesubsequentSection3.2,wewilldelve more deeply into heterogeneityamong crowd scienceprojects, reinforcingthenotionthat“crowdscience”isnotawell-defined typeofprojectbutratheranemergingorganizationalmodeofdoing sciencethatallowsforsignificantexperimentation,aswellas con-siderablescopeinthetypesofproblemsthatcanbeaddressedand inthetypesofpeoplewhocanparticipate.
3.1. Puttingcrowdscienceincontext:Differentdegreesof openness
A firstimportant featureof crowdsciencethat marksa dif-ference totraditional scienceisthat participationin projectsis opentoalargenumberofpotentialcontributorsthataretypically unknowntoeachotherortoprojectorganizersatthebeginning ofaproject.Anyindividualwhoisinterestedinaprojectisinvited tojoin.RecallthatFieldsMedalistTimothyGowers’invitationto joinPolymathwasaccepted,amongtheothers,byTerenceTao, anotherFieldsMedalistworkingatUCLA,aswellasanumberof
Fig.2. ExcerptfromPolymathdiscussion.Source:http://terrytao.wordpress.com/2009/02/05/upper-and-lower-bounds-for-the-density-hales-jewett-problem/.
otherless famouscolleagues,schoolteachers,and graduate stu-dents.OpenparticipationisevenmoresalientinGalaxyZoo,which recruitsparticipantsonitswebsiteandboastsacontributorbase ofover250,000.Notethattheemphasishereisnotsimplyona largenumberofprojectparticipants(largeteamsizeisbecoming increasinglycommonevenintraditionalscience,seeWuchtyetal. (2007)).Rather, openparticipation entailsvirtually unrestricted “entry”byanyinterestedandqualifiedindividual,oftenbasedon self-selectioninresponsetoageneralcallforparticipation.
Asecondfeaturethattendstobecommonisthatcrowd sci-enceprojectsopenlydiscloseasubstantialpartoftheintermediate inputsusedintheknowledgeproductionsuchasdatasetsor prob-lemsolvingapproaches.TheWhaleSongProject,forexample,has uploadedadatabaseofaudiorecordingsofwhalesongs,andGalaxy Zoohasmadepubliclyavailabletheclassificationsmadeby volun-teercontributors(Lintottetal.,2010).Asnotedearlier,manyFoldit playersalsosharetheirtrickstofacilitatecertainoperations,and complicatedstrategiesareencodedindownloadablescriptsand recipes.Thus,muchoftheprocess-relatedknowledgethathas tra-ditionallyremainedtacitinscientificresearch(Stephan,1996)is
codifiedandopenlysharedwithinandoutsideaparticularproject. Wealsosubsumeundertheterm“intermediateinputs”recordsof theactualprocessthroughwhichprojectparticipantsdevelop, dis-cuss,andevaluateproblemsolvingstrategiessincetheresulting insightscanfacilitatefutureproblemsolving.Consider,for exam-ple,thefollowingdiscussionofaproblemsolvingapproachfrom thePolymath4blog:
Anonymous(August9,2009at5:48pm):Tim’sideasonsumsets oflogsgotmetothinkingaboutarelated,butdifferentapproach tothesesortsof“spacingproblems”:saywewanttoshowthat nointerval[n,n+logn]containsonly(logn)ˆ100–smooth numbers.Ifsuchstrangeintervalsexist,thenperhapsonecan showthatthereareloadsofdistinctprimespin[(logn)ˆ10, (logn)ˆ100]thateachdividesomenumberinthisinterval[...] Terence Tao (August 9, 2009 at 6:00pm): I quite like this approach–itusestheentiresetSofsumsofthe1/pi,rather thanafinitesumsetofthelog-integersorlog-primes,andso shouldinprinciplegetthemaximalboostfromadditive com-binatorialmethods.It’salsousingthespecificpropertiesofthe
Fig.3. Knowledgeproductionregimeswithdifferentdegreesofopenness.
integersmoreintimatelythanthelogarithmicapproach,which isperhapsapromisingsign.12
Wesuggestthatthesetwodimensions–opennessin participa-tionandopennesswithrespecttointermediateinputs–donotonly describecommonfeaturesofcrowdscienceprojectsbutthatthey alsodifferentiatecrowdscienceprojectsfromothertypesof knowl-edgeproduction.13Inparticular,Fig.3showshowdifferencesalong thesetwodimensionsallowustodistinguishinanabstractway crowdsciencefromthreeotherknowledgeproductionregimes.
InFig.3,crowdsciencecontrastsmoststronglywiththe bottom-leftquadrant,whichcapturesprojectsthatlimitcontributionstoa relativelysmallandpre-definedsetofindividualsanddonotopenly discloseintermediateinputs.Wecallthisquadrant“traditional sci-ence”becauseitcaptureskeyfeaturesofthewaysciencehasbeen doneoverthelastcentury.Ofcourse,traditionalscienceisoften called“open”sciencebecausefinalresultsareopenlydisclosedin theformofpublications(MurrayandO’Mahony,2007;David,2008; SauermannandStephan,2013).However,whiletraditionalscience is“open”inthatsense,itislargelyclosedwithrespecttothe dimen-sionscapturedinourframework.Inparticular,researchersretain exclusiveuseoftheirkeyintermediate inputs,suchasthedata andtheheuristicsusedtosolveproblems.While someofthese inputsaredisclosedinmethodologysectionsofpapers,mostof themremaintacitoraresharedonlyamongthemembersofthe researchteam(Stephan,1996).
Thislackofopennessfollowsfromthelogicoftherewardsystem ofthetraditionalinstitutionofscience.AsemphasizedbyMerton inhisclassicanalysis,traditionalscienceplacesonekeygoalabove allothers:gainingrecognitioninthecommunityofpeersbybeing thefirsttopresentorpublishnewresearchresults(Merton,1973; Stephan,2012).Whilescientistsmayalsocareaboutothergoals, publishingandtheresultingpeerrecognitionarecriticalbecause theycanyieldindirectbenefitssuchasjobsecurity(tenure),more resources for research, more grants, more students, and so on (LatourandWoolgar,1979;SauermannandRoach,2013).Since mostoftherecognitiongoestothepersonwhoisfirstin discover-ingandpublishingnewknowledge,therewardsystemofscience inducesscientiststoexpendgreateffortandtodiscloseresearch
12 http://polymathprojects.org/2009/08/09/research-thread-ii-deterministic-way-to-find-primes/;retrieved29March2012.
13 Openparticipationandopendisclosureofinputsarealsocharacteristicofopen
sourcesoftware(OSS)development,althoughthegoalofthelatterisnotthe produc-tionofscientificknowledgebutofsoftwareartifacts.Giventhesesimilarities,our discussionwilltakeadvantageofpriorworkthathasexaminedtheorganizationof knowledgeproductioninOSSdevelopment.
resultsasquicklyaspossible.Atthesametime,thissystem encour-agesscientiststoseekwaystomonopolizetheirareaofexpertise andtobuildacompetitiveadvantageoverrivals.Assuch,the tradi-tionalrewardsystemofsciencediscouragesscientistsfromhelping contenders,explainingwhydata,heuristics,andproblemsolving strategiesareoftenkeptsecret(DasguptaandDavid,1994;Walsh etal.,2005;Haeussleretal.,2009).
Recognizingthatsecrecy withrespecttointermediateinputs mayslowdowntheprogressofscience,fundingagencies increas-inglyaskscientiststoopenlydisclosedataandlogsasacondition of funding.The NationalInstitutesofHealth andtheWellcome Trust,forexample,requirethatdatafromgeneticstudiesbemade openlyavailable.Similarly,moreandmorejournalsincludingthe flagshippublicationsScienceandNaturerequireauthorsto pub-liclydepositdataandmaterialssuchthatanyinterestedscientist canreplicateorbuilduponpriorresearch.14Assuch,anincreasing numberofprojectshavemovedfromquadrant4–“traditional sci-ence”–intoquadrant3(bottomright).Eventhoughprojectsinthis quadrantaremoreopenbydisclosingintermediateinputsafterthe projectisfinished,participationinagivenprojectremainsclosed tooutsiders.
Letusfinallyturntoprojectsin quadrant1(topleft),which solicitcontributionsfromawiderangeofparticipantsbutdonot publiclydiscloseintermediateinputs.Moreover,whilenot explic-itlyreflectedin Fig.3,projectsin thiscelldifferfromtheother threecellsinthatevenfinalprojectresultsaretypicallynotopenly disclosed. Examples of this organizing mode include Amazon’s MechanicalTurk(whichpaysindividualsforperformingtaskssuch as collectingdatafromwebsites) aswell asincreasingly popu-lar innovationcontestplatformssuchas Kaggleor Innocentive. Intheseinnovationcontests,“seekers”posttheirproblemsonline andoffermonetaryprizesforthebestsolutions.Anybodyisfreeto competebysubmittingsolutions,althoughmanycontestsrequire that“solvers”firstacceptconfidentialityandintellectualproperty agreements.Onceawinnerisdetermined,heorshewilltypically transferthepropertyrightofthesolutiontotheseekerinreturn fortheprize(JeppesenandLakhani,2010).Thus,theprojectsin quadrant1areopentoalargepoolofpotentialcontributors,but theydonotdiscloseresultsanddata.Amainreasonforthe lat-ter isthat projectsponsorsareoftenprivateorganizationsthat seektogainsomesortofacompetitiveadvantagebymaintaining uniqueaccesstoresearchresultsandnewtechnologies.Alongwith
14http://www.sciencemag.org/site/feature/contribinfo/prep/geninfo.xhtml# dataavail; http://www.nature.com/authors/policies/availability.html; retrieved October20,2011.
crowd science,however, innovation contests and crowd sourc-ingplatformsarehighlydynamic,witnessing therapidentryof newmodelsandsignificantexperimentationwithdifferent orga-nizational designs.As such, ourclassification focuses on broad differencesbetweenquadrantsandoncommoncharacteristicsof examplesasofthewritingofthisarticle.Therearemanyinteresting nuancesinthedesignsofcrowdsourcingplatformsandinnovation contestsandsomearemore“open”thanothers(i.e.,somemaybe locatedclosertoquadrant2).15Thus,adeeperunderstandingof theimplicationsofcrowdscience’shighdegreesofopennessmay beusefulforfutureworkexaminingheterogeneityandchangesin opennessamongcontestsandcrowdsourcingplatformsaswell.
Whileopennesswithrespecttoprojectparticipationandwith respecttointermediateinputsarebynomeanstheonly interest-ingaspectsofcrowdscienceprojects,andwhilenotallprojects reflectthesefeaturestothesamedegree,thesetwodimensions highlightprominentqualitativedifferencesbetweencrowdscience andotherregimesofknowledgeproduction.Moreover,wefocuson thesetwodimensionsbecausetheyhaveimportantimplications forourunderstandingofpotentialbenefitsandchallengesofcrowd science.Ashighlightedbypriorworkontheorganizationof prob-lemsolving,thefirstdimension–openparticipation–isimportant becauseitspeakstothelaborinputsandknowledgeaprojectcan drawon,andthustoitsabilitytosolveproblemsandtogenerate newknowledge.16Opennesswithrespecttointermediateinputs– ourseconddimension–isanimportantrequirementfordistributed workbyalargenumberofcrowdparticipants.Atthesametime, ourdiscussionoftheroleofsecrecyintraditionalsciencesuggests thatthisdimensionmayhavefundamental implicationsforthe kindsofrewardsprojectcontributorsareabletoappropriateand thusforthekindsofmotivesandincentivesprojectscanrelyonin attractingparticipants.Weprovideamoredetaileddiscussionof thebenefitsandchallengesresultingfromopenparticipationand opendisclosureofintermediateinputsinSections4and5. 3.2. Heterogeneitywithin:Differencesinthenatureofthetask andincontributorskills
Section 3.1 highlighted two important common features of crowdscienceprojects.However,thereisalsoconsiderable het-erogeneityamong projects. Inseeking deeperinsightsinto this heterogeneity,wefocusontwodimensionsthatmayhave particu-larlyimportantimplicationsforthebenefitsandchallengescrowd scienceprojectsface.AsreflectedinFig.4,thesedimensionsinclude thenatureofthetaskthatisperformedbythecrowd (horizon-talaxis),andtheskillsthatprojectparticipantsneedinorderto makemeaningfulcontributions(verticalaxis).17Wediscusseach dimensioninturn.
Inageneralsense,allcrowdscienceprojectsultimatelyhave thegoaltogainadeeperunderstandingofthenaturalworldandto findsolutionstoscientificproblems.Aspartofaproject,thecrowd
15 Forexample,theNetflixprizecompetitioninvolvedonlyanon-exclusivelicense
(http://www.netflixprize.com/faq#signup,accessedJuly5,2013).
16 Theliterature onopeninnovationandthegovernanceofproblem-solving
activitiesfocusesonthedecisiontolocateproblemsolvingwithinagiven orga-nizationalboundary(“closed”)versusoutsidetheorganization(“open”)(Nickerson andZenger,2004;JeppesenandLakhani,2010;AfuahandTucci,2012;Felinand Zenger,2012).Whilethispriorworkcentersonfirmboundaries,manyofthe result-inginsightsareusefulinourcontextifweconceptualizetherelevantorganizational unitnotasthefirmbutasthe“core”teamofresearchersthatwouldworkona prob-lemintraditionalsciencebutnowhastheoptiontoinviteparticipationbythelarger crowd.
17 AswewilldiscussinSection5.1.3,mostprojectshavededicatedproject
organiz-erswhoofteninitiatetheprojectsandcloselyresembletheprincipalinvestigators intraditionalscientificprojects.Ourfocushereisonthetypicallymuchlargergroup ofcontributorsoutsideoftheimmediate“core”ofprojectorganizers.
performscertaintypesoftaskssuchasthecodingofastronomical data,theoptimizationofproteinstructures,orthedevelopment ofcomplexmathematicalproofs.Notethatweconceptualizethe “task”inanaggregatesenseatthelevelofthecrowd,witheach individualmakingdistinctcontributionstothattaskbyengaging incertainsubtasks.Thenatureofthetasksoutsourcedtothecrowd differsinmany ways,but priorresearchontheorganizationof problemsolvingsuggeststhatitisparticularlyusefultoconsider differencesintworelatedcharacteristics:thecomplexityofthetask andthetaskstructure(Simon,1962,1973;FelinandZenger,2012). In ourcontext,taskcomplexity is bestconceptualizedasthe degreeofinterdependencybetweentheindividualsubtasksthat participantsperformwhencontributingtoaproject.Intasksthat arenotcomplex,thebestsolutiontoasubtaskisindependentof othersubtasks,allowingcontributorstoworkindependently.For example,thecorrectclassificationofanimageinGalaxyZoodoes notdependontheclassificationsofotherimages.Incomplextasks, however,thebestsolutiontoasubtaskdependsonothersubtasks and,assuch,acontributorneedstoconsiderothercontributions whenworkingonhis/herownsubtask.Thetermtaskstructure cap-turesthedegreetowhichtheoveralltaskthatisoutsourcedtothe crowdis“well-structured”versus“ill-structured”.Well-structured tasksinvolveaclearlydefinedsetofsubtasks,thecriteriafor evalu-atingcontributionsarewellunderstood,andthe“problemspace”is essentiallymappedoutinadvance(Simon,1973).Inill-structured tasks,thespecificsubtasksthatneedtobeperformedarenotclear exante,contributionsarenoteasilyevaluated,andtheproblem spacebecomesclearerastheworkprogressesandcontributions buildoneachotherinacumulativefashion.Whiletaskcomplexity andstructurearedistinctconcepts,theyareoftenrelatedinthat complextaskstendtoalsobemoreill-structured,partlyduetoour limitedunderstandingoftheinteractionsamongdifferentsubtasks (FelinandZenger,2012).Consideringdifferencesacrossprojects withrespecttothenatureof thetaskisparticularlyimportant becausecomplexandill-structuredtasksprovidefewer opportu-nitiesforthedivisionoflaborandmayfacedistinctorganizational challenges.Wewilldiscussthesechallengesaswellaspotential mechanismstoaddresstheminSection5.
AsFig. 4 illustrates,a largeshare of crowd scienceprojects involvesprimarilytasksoflowcomplexitythattendtobe well-structured.Forexample,thecodingofimagesfortheGalaxyZoo projectinvolvesa largenumber ofindividualcontributions,but subtasksareindependentand contributorscanworkinparallel withoutregardtowhatothercontributorsaredoing.18Attheother extremeareprojectsthatinvolvehighlycomplexandill-structured tasksrequiring contributors tobuildoneach others’ workin a sequentialfashion,andtodevelopacollectiveunderstandingof theproblemspaceandofpossiblesolutionsovertime.Polymath projectsexhibitthesecharacteristicsandFig.2illustratesthe inter-active problem solving process. Finally, projects located in the middleofthehorizontalaxisinFig.4involvetasksthatareof mod-eratecomplexityandrelativelywell-structured,whilestillallowing contributorstocollaborateandtoexploredifferentapproachesto solvetheproblem.Forexample,Folditplayerscanfindthebest proteinstructureusingverydifferentapproachesthatareoften dis-coveredonlyintheprocessofproblemsolving.Moreover,while
18Onemaythinkthatthesecodingtasksaresosimplethattheirperformance
shouldnotbeconsideredasaseriousscientificcontributionatall.However,data collectionisanintegralpartofscientificresearchandmuchoftheeffort(andmoney) intraditionalresearchprojectsisexpendedondatacollection,oftencarriedoutby PhDsandpost-docs.Infact,recallthatGalaxyZoowasinitiatedbyaPhDstudent whowasoverwhelmedbythetasktocodealargenumberofgalaxies.Reflectingthis importantfunction,contributionsintheformofdatacollectionortheprovisionof dataareoftenexplicitlyrewardedwithco-authorshiponscientificpapers(Haeussler andSauermann,2013).
Fig.4. Heterogeneityamongcrowdscienceprojects.
someplayersworkindependently,many ofthemostsuccessful solutionshavebeenfoundbymembersoflargerteamswho devel-opedsolutionscollaborativelybybuildingoneachother’sideas.
LetusknowconsidertheverticalaxisinFig.4,whichreflects differentkindsofskillsthatarerequiredtomakemeaningful con-tributionstoa project(Ericksonetal.,2012).19Asnotedearlier, many“citizenscience”projectssuchasGalaxyZooaskfor con-tributionsthatrequireonlyskillsthatarecommoninthegeneral humanpopulation. Otherprojectstypicallyrequireexpertskills inspecificscientificdomains,asevidencedbytheprevalenceof trainedmathematiciansamongPolymathparticipants,orexperts inbioinformaticsintheConnect2Decodeproject.Projectslocated inthemiddleoftheverticalaxisrequireskillsthat are special-izedandlesscommonbutthatarenottiedtoaparticularscientific domain.Forexample,theArgusprojectreliesoncaptainsofships tocollectmeasuresofseabeddepthusingsonarequipment, requir-ingspecializednauticalskills.Similarly,successinproteinfolding requirestheabilitytovisualizeandmanipulatethree-dimensional shapesinspace,aspecialcognitiveskillthathaslittletodowith thetraditionalstudyofbiochemistry.20
4. Potentialbenefitsofcrowdscience
InSection3.1,wecomparedcrowdsciencetootherknowledge productionregimesandsuggestedthatcrowdscienceprojectstend tobemoreopenwithrespecttoparticipationaswellasthe disclo-sureofintermediateinputs.Wenowdiscusshowthishighdegree ofopennesscanprovidecertainadvantageswithrespecttothe creationofnewknowledgeaswellasthemotivationofproject par-ticipants.Section5considerspotentialchallengesopennessmay poseand how these challenges may beaddressed. Throughout thesediscussions,wewillconsiderhowthebenefitsandchallenges
19 Wefocusontheskillsrequiredtomakeameaningfulcontributiontoaproject
andabstractfromthesignificantheterogeneityinskillsamongcontributorstoa givenproject.Inparticular,itislikelythatsomecontributorsbenefitfromhigher levelsofskillsoraccesstoabroaderrangeofskillsthanothers.
20ItisinterestingtonotethatFolditeffectivelyusessophisticatedsoftwareto
transformataskthattraditionallyrequiredscientifictrainingintoataskthatcanbe performedbyamateursusingadifferentsetofskills.
maydifferacrossdifferenttypesofprojects,focusingprimarilyon thedistinctionsdrawninSection3.2.
4.1. Knowledge-relatedbenefits
Thescholarlyliteraturehasdevelopedseveraldifferent concep-tualizationsof thescienceandinnovationprocess,emphasizing differentproblemsolvingstrategiessuchastherecombinationof priorknowledge,thesearchforextremevalueoutcomes,orthe sys-tematictestingofcompetinghypotheses(seeKuhn,1962;Fleming, 2001; Weisberg, 2006;Boudreau et al.,2011; Afuahand Tucci, 2012).Giventhewidespectrumofproblemscrowdscienceprojects seektosolve,itisusefultodrawonmultipleconceptualizationsof problemsolvinginthinkingaboutthebenefitsprojectsmayderive fromopennessinparticipationandfromtheopendisclosureof intermediateinputs.
4.1.1. Benefitsfromopenparticipation
SomeprojectssuchasGalaxyZooorOldWeatherbenefit pri-marily from a larger quantity of laborinputs, using thousands of volunteerstocomplement scarcematerial resourcessuchas telescopesbutalsotheuniqueexpertiseofhighlytrainedproject leaders.Couldprojectsrelyonpowerfulcomputerstoaccomplish thesetaskswithouttheinvolvementofalargenumberofpeople? Itturnsoutthathumanscontinuetobemoreeffectivethan com-putersinseveralrealmsofproblemsolving,includingimageand sounddetection(MaloneandKlein,2007).Humanscanalsorely onintuitiontoimproveoptimizationprocesses,whereascomputer algorithmsoftenuserandomtrial-and-errorsearchandmayget trappedinlocaloptima.Humansalsohaveasuperiorcapacityto narrowthespaceofsolutions,whichisusefulinproblemslike pro-teinfolding,wheretoomanydegreesoffreedommakecomplete computationunfeasiblylong(Cooperetal.,2010).Another advan-tageofhumansistheirabilitytowatchoutfortheunexpected. Thisskillisessentialinmanyrealmsofscience,asevidencedby thediscoveryofnewtypesofgalaxiesandastronomicalobjectsin thecourseofGalaxyZoo’soperation.Benefitsintermsofaccess toalargerquantityoflaborarelikelytobegreatestforprojects involvingtasksthatarewell-structuredandrequireonlycommon humanskills,allowingthemtodrawonaverylargepoolof poten-tialcontributors.Thisrationale mayexplain whymanyexisting
crowdscienceprojects arelocated inthebottom-left corner of Fig.4.
Otherprojectsbenefitfromopenparticipationbecauseitallows themtoaccessrareandspecializedskills.ProjectssuchasArgus oreBird(whichaskscontributorstoidentifybirdsintheir neigh-borhoods)caneffectively “broadcast”theskillrequirementtoa largenumberofindividualsinthecrowd,enhancingthechances tofindsuitable contributors.Ashighlighted byrecent workon innovationcontests,broadcastsearchmayalsoidentifyindividuals whoalreadypossesspre-existingsuperiorsolutions(Jeppesenand Lakhani,2010;Nielsen,2011).
Finally, crowd science projects may also benefit from the diversityintheknowledge andexperiencepossessedbyproject participants,especiallywhenproblemsolutionsrequirethe recom-binationof different pieces of knowledge(Fleming, 2001; Uzzi andSpiro,2005).Knowledgediversityislikelytobehighbecause projectcontributors tend to come from different demographic backgrounds, organizations, and even scientific fields (Raddick etal.,2013).Moreover,someprojectsalsobenefitfromdiversity withrespecttocontributors’geographiclocation,includingefforts suchaseBird,Argus,ortheGreatSunflowerProject,whichseekto collectcomprehensivedataacrossalargegeographicspace.Inan efforttocollectdataonthebeepopulationacrosstheUnitedStates, forexample,TheGreatSunflowerProjectasksparticipantstogrow aspecialtypeofflowerandtocountthenumberofbeevisitsduring 15-mindailyobservations.Thebenefitsofgeographicdiversityare likelytoaccrueprimarilytoprojectsrequiringobservationaldata wheregeographyitselfplaysakeyrole.
4.1.2. Benefitsfromtheopendisclosureofintermediateinputs Crowdsciencereliesontheparticipationofmanyandoften tem-poraryparticipants.Makingintermediateinputswidelyaccessible enablesgeographicallydispersedindividualstojoinandto partic-ipateviawebinterfaces.Moreover,codifiedintermediateinputs suchas Polymathblogsor Folditrecipesprovidea memoryfor theproject,ensuringthatknowledgeisnotlostwhen contribu-torsleaveandenablingnewcontributorstoquicklycatchupand becomeefficient(seeVonKroghetal.,2003;Haefligeretal.,2008). Artifactssuchasdiscussionslogsmayalsoembodyandconvey socialnormsandstandardsthatfacilitatethecooperationand col-laborationamongparticipants(RullaniandHaefliger,2013).Thus, thereisaninterestingconnectionbetweenthetwodimensionsof opennessinthattheopendisclosureofintermediateinputsallows projectstotakefulladvantageofthebenefitsofopenparticipation (Section4.1.1).
Transparencyoftheresearchprocessand availabilityofdata mayalso facilitate the verification of results (see Lacetera and Zirulia,2011).While peerreviewintraditionalsciencedoesnot typically entail a detailed examination of intermediate inputs, theopennessof logsand dataincrowd scienceprojectsallows observerstofollow andverifytheresearchprocessin consider-abledetail. By way of example, therelatively large number of “eyes”followingthePolymathdiscussionensuredthatmistakes werequicklydetected(seeFig.2).Ofcourse,therearelimitstothis internalverificationsincecontributorswhoarenottrained scien-tistsmaynothavethenecessarybackgroundtoassesstheaccuracy ofmoresophisticateddataormethods.Assuch,theirinvolvement inverificationwillbelimitedtotaskssimilartotheonestheywould beabletoperform.21
Afinalknowledge-relatedbenefitofopendisclosureof inter-mediateinputsemergesnotatthelevelofaparticularprojectbut
21 SomeprojectssuchastheOpenDinosaurprojectorGalaxyZooinstitutionalize
thisverificationprocessbyexplicitlyaskingmultipleparticipantstocodethesame pieceofdataandcomparingtheresults.
forthemoregeneralprogressofscience.Scholarsofsciencehave arguedthatopendisclosureallowsfuturescientiststobuildupon priorworkinacumulativefashion(Nelson,2004;Sorensonand Fleming,2004;Jones,2009).Whilethesediscussionstypicallyrefer tothedisclosureoffinalprojectresults,additionalbenefitsmay accrueifotherscientistscanbuildontheintermediateinputs pro-ducedbyaproject(DasguptaandDavid,1994;Haeussleretal., 2009).Suchbenefitsareparticularlylargefordata,whichare typ-icallycostlytocollectbutcanoftenbeusedtoexaminemultiple researchquestions.Consider,forexample,theSloanDigitalSky Sur-vey,theHumanGenomedata,orCensusdatainthesocialsciences thathaveallresultedinmanyvaluablelinesofresearch.Notonly data,butalsologsanddiscussionarchivescanbeenormously help-fulforfutureresearchiftheyprovideinsightsintosuccessful prob-lemsolvingtechniques.AsillustratedinFig.2,forexample,problem solvingapproachesandintermediatesolutionsdevelopedinone Polymathdiscussionhavebeenre-usedandadaptedinanother. 4.2. Motivationalbenefits
Successful scientific research requires not only knowledge inputs but also scientists who are motivated to actually exert efforttoward solvinga particular problem. Dueto opennessin participation,crowdscienceprojectsareabletorelyon contrib-utors who are willing to exert effort without pay, potentially allowingprojectstotakeadvantageofhumanresourcesatlower financialcostthanwouldberequiredintraditionalscience.22Inthe following,wediscusssomeofthenon-pecuniarypayoffsthatmay mattertocrowdsciencecontributors,aswellasmechanismsthat projectsusetoaddressandreinforcethesetypesofmotivations.As wewillsee,someofthesemechanismsrequireconsiderable infra-structure,suggestingthateventhoughvolunteersareunpaid,their helpisnotnecessarilycostless.
Scholarshaveforalongtimeemphasizedtheroleofintrinsic motives,especiallyinthecontextofscienceandinnovation(Ritti, 1968; Hertelet al.,2003; Sauermannand Cohen, 2010; Raasch andVonHippel,2012).Intrinsicallymotivatedpeopleengagein anactivitybecausetheyenjoytheintellectualchallengeofatask, becausetheyfinditfun,orbecauseitgivesthemafeelingof accom-plishment(Amabile,1996; Ryan andDeci,2000).Such intrinsic motivesappeartobeimportantalsoincrowdscienceprojects,as illustratedinthefollowingpostofaGalaxyZooparticipant:
Ok,I’mcompletelyjealousofthepersonwhogotthatparticular image,itsamazing.Youneedtowarnpeoplejusthowaddictive thisis!Itsdangerous![...].
AfterdoingacouplehundredIwasstartingtoburnout... sud-denlythere wasakelly-greenstarintheforeground.Whoa! [...]beingthefirsttoseethesethings:who*knows*whatyou mightfind?Hooked!23
Intrinsicmotivationmaybeeasytoachievefortasksthatare inherentlyinterestingandchallenging.However,evensimpleand potentiallytedioustasks–suchascodinglargeamountsofdata orsystematicallytryingdifferentconfigurationsofmolecules–can becomeintrinsicallyrewardingiftheyareembeddedinagame-like environment(PrestopnikandCrowston,2011).Assuch,an
increas-22Arelatedargumenthasbeenmadeinthecontextofuserinnovation(Raasch andVonHippel,2012).Whilethereisnoestimateofthecostsavingscrowd sci-enceprojectsmayderivefromrelyingonnon-financialmotives,thereareestimates ofthecostsavingsachievedinOSSdevelopment.Inparticular,oneestimatepegs thevalueofunpaidcontributionstotheLinuxsystematover3billiondollars (http://linuxcost.blogspot.com/2011/03/cost-of-linux.html;retrievedNovember9, 2011).
ingnumber ofcrowdscienceprojects employ“gamification”to increaseprojectparticipation.Fig.1,forexample,showshowFoldit usesgroupandplayercompetitionstokeepcontributorsengaged. Project participants mayalso feel good aboutbeing part of aparticular projectcommunity andmayenjoy“social”benefits resultingfrompersonalinteractions (Harsand Ou,2002).While sometypesofprojects–especiallythoseinvolvingcollaborative problemsolving–willnaturallyprovidealocusforsocial interac-tion,manyprojectsalsoactivelystimulateinteraction.Forexample, theGreatSunflowerprojectnominatesgroupleaderswho oper-ate as facilitators in particular neighborhoods, communities or schools.24Otherprojectspromotesocialinteractionsbyproviding dedicatedITinfrastructure.Thishasbeenthestrategyemployed byFoldit,whereprojectlogsanddiscussionforumsallow partic-ipantstoteamuptoexchangestrategiesandcompeteingroups, fosteringnotjustenjoymentfromgaming,butalsoacollegialand “social”element.Aparticularlyinterestingaspectofintrinsicand socialbenefitsisthattheymaybenon-rival,i.e.,thebenefitstoone individualarenotdiminishedsimplybecauseanotherindividual alsoderivesthesebenefits(BonaccorsiandRossi,2003).
Anotherimportantnonpecuniarymotiveforparticipationisan interestintheparticularsubjectmatteroftheproject.Towit,when askedabouttheirparticipationmotives,oneofthereasonsmost oftenmentioned bycontributorstoGalaxyZoo wasaninterest inastronomyandamazementaboutthevastnessoftheuniverse (Raddicketal.,2013).Whilethismotiveisintrinsicinthesense thatitisnotdependentonanyexternalrewards,itisdistinctfrom challengeandplaymotivesinthatitisspecifictoaparticulartopic, potentiallylimitingtherangeofprojectsapersonwillbewilling toparticipatein.Recognizingtheimportanceofindividuals’ inter-estinparticulartopics,someplatformssuchasZooniverseenrich theworkbyprovidingscientificbackgroundinformation,by reg-ularlyinformingparticipantsaboutnewfindingsorbyallowing participantstocreatecollectionsofinterestingobjects(Fig.5).
Whileitisnotsurprisingthatprojectsinareaswithalarge num-bersofhobbyists–suchasastronomyorornithology–havebeen abletorecruitlargenumbersofvolunteers,projectsinless interest-ingareas,orprojectsaddressingverynarrowandspecificquestions aremorelikelytofacechallengesintryingtorecruitvolunteers.At thesametime,reachingouttoalargenumberofpotential contrib-utorsmayallowevenprojectswithlesspopulartopicstoidentifya sufficientnumberofindividualswithmatchinginterests.Assuch, whilepriorworkhasemphasizedthatbroadcastsearchcanmatch problemstoindividuals holdinguniqueknowledgeor solutions (Jeppesen andLakhani,2010), wesuggestthatbroadcastsearch canalsoallowthematchingofparticulartopicstoindividualswith relatedinterests.Bywayofexample,batsareprobablynot popu-laranimalsamongmostpeople,buttheprojectBatDetectivehas beenquitesuccessfulinfindingthoseindividualswhoare inter-estedinthisanimalandarewillingtolistentosoundrecordings andtoidentifydifferenttypesofbatcalls.
Anespeciallypowerfulversionofinterestinaparticular prob-lemisevidentinthegrowingnumberofprojectsthataredevotedto theunderstandingofparticulardiseasesortothedevelopmentof cures(ÅrdalandRøttingen,2012).Manyoftheseprojectsinvolve patientsand theirrelativeswho –aspotential“users”–have a verystrongpersonalstakeinthesuccessoftheproject,leading themtomakesignificanttimecommitments(seeVonHippel,2006; Marcus,2011).Totheseparticipants,itmaybeparticularly impor-tantthatprojectsalsoquicklydiscloseintermediateinputsifthey believethatdisclosurespeedsuptheprogresstowardasolutionby allowingotherstobuildupontheseinputs.
24 http://www.greatsunflower.org/garden-leader-program;retrievedOctober3,
2012.
Finally,manycrowdsciencecontributorsalsovaluethe oppor-tunitytocontributetoscience,especiallycitizenscientists,who lack the formal training to participate in traditional science (Raddicketal.,2013).AsoneGalaxyZoocontributorcommented inablogdiscussion:
Dang...thisisaddictive.Iputonsomemusicandstarted clas-sifying,andthenextthingyouknowit’shourslater.ButI’m contributingtoscience!25
Likesomeoftheothermotivesdiscussedabove,thismotive maybeparticularlybeneficialforcrowdsciencetotheextentthat individualsarewillingtoparticipateinabroadrangeofdifferent projectsandareevenwillingtohelpwith“boring”andtedioustasks aslongastheseareperceivedtobeimportantfortheprogressof science.
5. Challengesandpotentialsolutions
Opennesswithrespecttoprojectparticipationandintermediate inputscanleadtoconsiderablebenefits,butthesame characteris-ticsmayalsocreatecertainchallenges.We nowhighlightsome ofthesechallengesanddrawonthebroaderorganizational liter-aturetopointtowardorganizationalandtechnicaltoolsthatmay beusefulinaddressingthem.
5.1. Organizationalchallenges 5.1.1. Matchingprojectsandpeople
Onekeyfeatureofcrowdscienceprojectsistheiropennessto thecontributionsofalargenumberofindividuals.However,there isalargeandincreasingnumberofprojects,andthepopulationof potentialcontributorsisvast.Thus,organizationalmechanismsare neededtoallowfortheefficientmatchingofprojectsandpotential contributorswithrespecttobothskillsandinterests.One poten-tialapproachmakes iteasierfor individualstofindprojectsby aggregatinganddisseminatinginformationonongoingorplanned projects.Websitessuchasscistarter.com,forexample,offer search-abledatabasesofawiderangeofprojects,allowingindividualsto findprojectsthatfittheirparticularinterestsandlevelsof exper-tise.
We expect that an efficient alternative mechanism may be crowd science platforms that host multipleprojects and allow themtoutilizeasharedpoolofpotentialcontributorsaswellas a sharedtechnicalinfrastructure. Especiallyif projectsare sim-ilar withrespectto thefield of science, typesof tasks, or skill requirements, individuals who contributed to one project are alsolikelytobesuitable contributorstoanotherprojectonthe sameplatform.Indeed,multi-projectcrowdscienceplatformsmay enjoyconsiderablenetwork effectsbysimultaneouslyattracting more projects looking for contributors and contributors look-ing for projects to join. Such platforms are common in open sourcesoftwaredevelopment(e.g.,sourceforge.comandGitHub) andarealsoemerginginthecrowdsciencerealm(seeTable1). For example,the GalaxyZoo projecthasevolved intothe plat-formZooniverse,whichcurrentlyhostsoverfifteenprojectsthat cover different areas of science but involve very similar kinds of tasks. Whena new projectis initiated, Zooniverse routinely contacts its large and growing member baseto recruit partic-ipants, and many individuals contributetomultiple Zooniverse projects.
25 http://blogs.discovermagazine.com/badastronomy/2009/04/02/a-million-galaxies-in-a-hundred-hours/#.UU8TPVs4Vxg.
Fig.5.GalaxyZooobjectwithselectedusercomments.Source:http://talk.galaxyzoo.org/objects/AGZ0002bw2.
5.1.2. Divisionoflaborandintegrationofcontributions
Asdiscussedinsection3.2,projectsdifferwithrespecttothe complexity and structure of the taskthat is outsourced tothe crowd.Themorecomplex andill-structuredthetask, themore contributorshavetointeractand buildoneachother’s contrib-utions,limitingthenumberofpeoplewhocanworkonagiven projectatthesametime.Whileourdiscussioninsection3.2took thenatureofthetaskasgiven,projectorganizerscantrytoreduce interdependenciesandstructuretaskssuchthatalargernumber ofindividualscanparticipate,effectivelymovingprojectstoward theleftofFig.4.Opensourcesoftwaredevelopmenthas devel-opedsophisticatedmodularizationtechniquestoachievesimilar goals.Thebasicideaofmodularizationisthatalargeproblemis dividedintomanysmallerproblems,plusastrategy(the architec-ture)thatspecifieshowthemodulesfittogether.Thegoalisto designmodulesthathaveminimuminterdependencieswithone another,allowingforagreaterdivisionoflaborandparallelwork (VonKroghetal.,2003;BaldwinandClark,2006).Modularization hasalreadybeenusedinmanycrowdscienceprojects.Forexample, GalaxyZooandOldWeatherkeepthedesignoftheoverallproject centralizedandprovideindividualcontributorswithasmallpiece ofthetasksothattheycanworkindependentlyandattheirown pace.Ofcourse,notallproblemscanbeeasilymodularized;we sus-pect,forexample,thatmathematicalproblemsarelessamenable tomodularization.However,whilethenatureoftheproblemmay setlimitstomodularizationintheshortterm,advancesin infor-mationtechnologyandprojectmanagementknowledgearelikely toincreaseprojectleaders’abilitytomodularizeagivenproblem overtime(seeSimon,1973).
Justas important as distributing tasks is theeffective inte-grationofindividuals’ contributionstofindtheoverallproblem solution.Inhighlymodularizeddatacollectionandcodingtasks suchasOldWeatheroreBird,individualcontributionscan eas-ilybeintegratedintolargerdatasets.Similarly,insomeproblem
solvingtaskssuchasFoldit,individualsorteamsgenerate stand-alonesolutionsthatcanbeevaluatedusingstandardcriteriaand thatrequirelittleintegration(see Jeppesenand Lakhani,2010). Thebiggest challengeistheintegrationofcontributions in col-laborative problemsolving tasks such as Polymath, where the contributorsseekto develop a singlesolutionin an interactive fashion,e.g.,throughanongoingdiscussion.Insuchcases,much of the integrationis doneinformally as participantsread each other’s contributions. However, as the amount of information that needstoberead and processedincreases, informal mech-anisms may miss important contributions while also imposing largetime costs onproject participants(Nielsen, 2011). Filter-ing and sorting mechanisms may lower these costs to some extent,butdifficultiesinintegratingthecontributionsofalarger number of participants are likely to impose limits upon the optimalsizeofcollaborativeproblemsolvingprojectssuchas Poly-math.
5.1.3. Projectleadership
Mostcrowd scienceprojectsrequirea significantamountof projectleadership.Dependingonthenatureoftheproblem,leaders arefundamentalinframingthescientificexperiment, modulariz-ingthetask,securingaccesstofinancialandtechnicalresources, ormakingdecisionsregardinghowtoproceedatcriticaljunctures ofaproject(Mateos-GarciaandSteinmueller,2008;Wigginsand Crowston,2011).
Open source software projects such as Linux illustrate that leadershipintheformofarchitectureand kerneldesign canbe performed by a relatively small and tightly knit group of top-notchprogrammers,whilealargenumberofpeopleatalllevelsof skillscanexecutesmallerandwell-definedmodules(Shah,2006). EmergingcrowdscienceprojectssuchasGalaxyZooorFolditshow asimilarpattern:Theseprojectsaretypicallyledbywell-trained scientistswhoformulateimportantresearchquestionsanddesign