Community Online Resource for Statistical
Seismicity Analysis
J. Douglas Zechar,
1,2Jeanne L. Hardebeck,
3Andrew J. Michael,
3Mark Naylor,
4Sandy Steacy,
5Stefan Wiemer,
1Jiancang Zhuang,
6and the
CORSSA Working Group
7INTRODUCTION
Statistical analysis of seismicity is critical for understanding earthquake observations, testing proposed prediction and fore-cast methods, and assessing seismic hazard. Unfortunately, despite its importance to seismology—especially to those stud-ies that potentially impact public policy—statistical seismology is mostly ignored in the education of seismologists, and there has been no central repository for relevant software. To remedy these deficiencies, and with the broader goal of enhancing the quality of statistical seismology research, we have begun build-ing the Community Online Resource for Statistical Seismicity Analysis (CORSSA). CORSSA is an educational platform that is designed to be authoritative, up-to-date, prominent, and useful. We anticipate an audience that ranges from beginning graduate students to experienced researchers.
Every co-author of this article has served as a referee for at least one seismology manuscript in which the author(s) made a questionable or incorrect application or interpretation of statis-tics. We suspect that most readers have had a similar experience. This is not a matter of stupidity—not even the important kind championed by Schwartz (2008)—but rather a lack of under-standing and/or awareness of sometimes sophisticated math-ematical concepts and how they should be applied to uncertain data. We seek to fill this gap in knowledge, understanding, and application, and to promote excellence in statistical seismol-ogy, by providing the information and resources necessary to understand and implement the best practices, with the hope that readers will apply these methods to their own research.
Given that seismology is a field of applied physics, it is reasonable that, starting with only Hooke’s law, students are taught to derive the wave equation, Snell’s law, reflection/ refraction coefficients, and the behavior of surface waves. But seismology is also increasingly becoming a field of applied sta-tistics, and few seismology students are taught even the most basic statistical methods let alone the underlying theory. For instance, while most seismology texts mention the Gutenberg-Richter magnitude distribution, few include Aki’s (1965) dem-onstration and Weichert’s (1980) additional treatment suggest-ing that one should use maximum likelihood to estimate the model parameter a- and b-values. Such disregard for statistics in seismology texts might be explained by the fact that seismology evolved from physics and had an early emphasis on theoretical understanding. Relying on supplementary statistical courses is an imperfect solution for seismology students: Even with some basic training, it is rarely simple to apply textbook statistical procedures to problems of seismicity, where clustering under-mines the common assumption of independent data, and issues of data quality related to seismic networks are unique.
Because statistics is so little emphasized in seismology texts, the audience that stands to benefit from CORSSA is quite varied. CORSSA material can serve undergraduate stu-dents as a starting point to understand the issues, and it should serve graduate students as a resource for their own research. Moreover, it should serve experienced researchers from out-side the statistical seismology community, and even research-ers within that group, as a point of reference to enhance the quality of their work. To serve this diverse audience, CORSSA covers a wide variety of material, which we categorize using the following seven themes:
I. Introductory material II. Basic features of seismicity III. Statistical foundations
IV. Understanding seismicity catalogs
V. Models and techniques for analyzing seismicity VI. Earthquake predictability and related hypothesis testing VII. Data standards
The thematic structure was devised to make it easy for readers to focus on their personal requirements to get an introduction to statistical seismology (Theme I), or to learn about the basics of earthquakes (Theme II), statistics (Theme III), and/or the intricacies of seismicity catalogs (Theme IV) before moving on to applications found in Theme V and Theme VI. Theme
electronic seismologist
S E I S M O L O G I S T
E
L
E
C
T
R
O
N
I
C
1. Swiss Seismological Service, ETH Zurich, Zurich, Switzerland 2. Lamont-Doherty Earth Observatory, Columbia University,
Palisades, New York, U.S.A.
3. U.S. Geological Survey, Menlo Park, California, U.S.A.
4. School of Geosciences, University of Edinburgh, Edinburgh, Scotland
5. School of Environmental Sciences, University of Ulster, Northern Ireland
6. Institute of Statistical Mathematics, Tokyo, Japan 7. See http://www.corssa.org/about/community
VII provides information about data formats and standardized datasets that can be used for testing software.
Each of these themes comprises a series of articles. Articles act as tutorials and rely on previously published, peer-reviewed literature. Each article deals with a specific task or topic and includes some subset of the following: discussion of why the topic is useful for research; a brief review of theory; a list of methods and software that address this topic; a discussion of trade-offs between analysis choices; pitfalls to be aware of; example results; examples of applications in scientific journals; recommendations for further reading; and next steps for the reader to take.
FEATURES
CORSSA is a collection of review articles related to statistical seismicity analysis, organized by a few thematic elements, and supplemented by software packages, data, a glossary, news items, and discussion forums. To more fully understand this project, it is useful to compare it with three common contemporary research outlets: peer-reviewed journals, textbooks, and wikis.
As with textbooks and unlike wikis and regular issues of journals, a comprehensive design guides CORSSA devel-opment. But unlike a book and similar to a wiki, individual CORSSA articles and other content are made available imme-diately once they are deemed ready, rather than waiting for everything to be completed. Moreover, given that technology now allows a more accurate representation of the ever-evolving, incremental nature of scientific advancement, the concept of a final state of knowledge is obsolete—in short, CORSSA arti-cles and content can be revised and updated, and version infor-mation can be included when appropriate. Also like a wiki, large datasets can be curated and presented in the context of an article and as standalone resources. Because CORSSA is pri-marily an educational resource, its articles will not contain new interpretive science; on the contrary, and given that content can be updated, CORSSA will feature “living” review articles.
We believe that identifying authors and using an option-ally anonymous peer-review system provides an authority that is sometimes missing in anonymous wiki entries (and Web pages in general). Therefore, as with journals, CORSSA authors are clearly identified, and articles are peer-reviewed and subject to editorial approval. By identifying authors, we also acknowledge their efforts, which are crucial to CORSSA’s existence. CORSSA articles can be cited in much the same way as traditional peer-reviewed journal articles. Although we categorize the articles by theme, the relatively small number of articles allows us a simple citation scheme without specifying a volume or issue number: Articles are cited by author(s), year, and a unique digital object identifier (DOI). CORSSA is not a traditional journal, so its articles are not indexed in the Web of Science databases. But because these articles have DOIs, the Web of Science Cited Reference Search and other tools such as Google Scholar can track citations.
Recognizing that the portable document format (PDF) is the current standard for research articles, we present articles as PDF files. Nevertheless, readers can search the text of all
arti-cles directly via the Web interface, rather than having to open each article file. The PDF also allows authors to easily include long equations and in-line vector graphics, an advantage over most Web-based content, which tends to present equations and figures as low-resolution images. Authors can also provide standalone, high-quality graphics that are appropriate for pre-sentations. Because these articles are more educative than most research articles, we anticipate that authors will include illus-trative examples and accompanying code. To accommodate this need, the CORSSA system allows authors to link an article with software, data, and accompanying explanatory text (Figure 1).
Like most textbooks and wikis, CORSSA maintains a glossary of relevant terms. If one of these terms is used in a CORSSA article, its first occurrence within the article is linked directly to the definition in the glossary (similar to the elec-tronic version of the New York Times). The glossary includes community-developed definitions that are specific to statistical analysis of seismicity, but the definitions are general enough to be shared by multiple articles, much like a wiki.
Perhaps the most important feature for readers, and unlike most textbooks and journals, is that CORSSA content is free to all.
BUILDING CORSSA
In May 2010, 24 scientists from 11 nations attended a work-shop in Zürich, Switzerland, to flesh out a plan for CORSSA and begin drafting an initial set of articles and accompany-ing material (Figure 2). This was a workshop in the literal sense: The majority of the time was dedicated to working in small groups, designing the contents of each thematic section. During the workshop, the authors of this article formed a CORSSA executive committee; by volunteering for this com-mittee, we pledged our commitment to implement and publi-cize CORSSA, including sharing editorial and administrative responsibilities.
After the conclusion of the Zürich workshop, CORSSA participants continued drafting articles; soon thereafter, two article templates were designed and distributed: one for authors who prefer Microsoft Word and another for authors who pre-fer LaTeX. These templates provide a consistent look for each article with minimal typesetting.
The workshop participants agreed to use the Silva con-tent management system for the CORSSA Web presence. Silva allows us to quickly add and edit all CORSSA content without requiring detailed knowledge of Web development technology. Silva has an open source license and was a natural choice because we rely on the technical support of the Swiss Seismological Service IT group, which was already familiar with using and supporting Silva.
We worked with colleagues at the ETH-Bibliothek (http://www.doi.ethz.ch) to obtain DOIs for CORSSA con-tent. Serendipitously, we discovered that ETH-Bibliothek is a member of the DataCite consortium (http://datacite.org), which is one of only seven DOI registration agencies world-wide. This drastically reduced the administrative overhead and
cost for DOIs. We reserved the prefix doi:10.5078/corssa, to which we append a unique eight digit number for each article. We note that we could also register DOIs for CORSSA data-sets, software, and other content in the same way, but we have not yet chosen to do so.
With an initial set of DOI-registered articles and accompa-nying content, the CORSSA Web presence was officially publi-cized to attendees of the European Seismological Commission in Montpellier, France, in September 2010.
CURRENT STATE
In this section, we describe CORSSA as it existed at the time of this writing; because it is a living resource, we don’t expect that the description will remain exactly accurate in the future, but this section provides the reader with an informative snapshot. We encourage the reader to visit http://www.corssa.org for cur-rent information.
At the time of this writing, CORSSA includes seven pub-lished articles across five themes. In Theme I, introductory mate-rial, Michael and Wiemer (2010) described the motivation for, and some historical development related to, the CORSSA proj-ect. Vere-Jones (2010) adapted his keynote presentation from the 2007 International Statistical Seismology (StatSei) confer-ence, suggesting how statistical tools can aid seismicity analyses and how students of seismology can obtain effective statistical training. As part of Theme III, statistical foundations, Naylor et al. (2010) mentioned some of the difficulties that a new researcher may face when attempting exploratory data analysis with earthquake catalogs, and they provided several practical exercises and code snippets. Husen and Hardebeck (2010) con-tributed a review of an important topic that many researchers neglect—accuracy and precision of earthquake locations—to Theme IV, understanding seismicity catalogs. They outlined in clear language how events are usually located or relocated; they also reported typical assumptions and highlighted the coupled nature of seismic velocity models and earthquake location esti-mates. For Theme V, models and techniques for analyzing
seis-micity, Hainzl et al. (2010) reviewed work related to spatiotem-poral seismicity models based on rate-and-state friction and Coulomb stress transfer; they supplemented a brief theoretical treatment with discussion of numerical algorithms for param-eter value estimation. Marsan and Wyss (2011) described the challenges in robustly identifying and understanding seismic-ity rate changes. In Theme VI, earthquake predictabilseismic-ity and related hypothesis testing, Zechar (2010) compared various methods for evaluating earthquake predictions and earth-quake forecasts, noting advantages and disadvantages for each strategy. He also contributed several software implementations and practical applications to accompany the article.
An additional six articles exist in various states of draft. Gulia et al. (under review) discuss methods for investigat-ing the quality of a seismic catalog, includinvestigat-ing techniques for deblasting, or identifying non-tectonic events. Mignan and Woessner (under review) comprehensively review meth-ods used to estimate catalog completeness—the magnitude level above which all earthquakes are believed to be reliably reported. Woessner et al. (under review) tell the story of how a seismicity catalog is generated and maintained. Zhuang et al. (forthcoming) provide a broad overview of several statistical models used to describe seismicity distributions. Iwata (under review) discusses earthquake triggering caused by forces other than tectonic loading, e.g., tides and passing seismic waves. The important issue of declustering—identifying and remov-ing aftershock sequences from catalogs—is reviewed by van Stiphout et al. (under review).
Moreover, several articles that were planned during the initial CORSSA workshop have not yet been drafted. The complete list of envisioned articles is maintained at http:// www.corssa.org/articles/draft_toc.pdf and, while this list asso-ciates potential authors with potential articles, we would hap-pily consider additional volunteer authors or article suggestions (e-mail [email protected]).
The CORSSA glossary contains 69 terms. The list of terms was originally compiled during the organizational meeting in Zürich, primarily by eavesdropping on the discussions of the
individual working groups; this list was then augmented by combing through the submitted articles for frequently used terms. The linking between articles and the glossary is auto-mated with a few simple scripts—one implemented as a macro for articles drafted using the Word template, and another implemented as a Java function that operates on LaTeX arti-cles. Although the glossary is closely linked to the articles, it can also be used as a standalone resource for readers seeking seismicity-related definitions. The purpose of the CORSSA glossary is to provide a concise, contextual definition of each term, but we also provide links to more comprehensive treat-ments; for example, pointing to relevant research articles, U.S. Geological Survey Web pages for earthquake-specific terms, and Wikipedia for statistical terms. We also link some glos-sary terms to other glosglos-sary terms with which they can be con-trasted; for example, “moment magnitude” is in this way linked with “local magnitude.”
All contributors to CORSSA are acknowledged on the Web presence; the list of contributors includes workshop attendees, article authors, referees, and individuals who shared software.
Because we want CORSSA articles to be as useful and accurate as is practical, we host forums that allow open com-munication between the authors of each article and the readers; these forums also allow communication among readers. At the time of this writing, these forums have been little used, which may indicate that the forums are too new, that our reader com-munity is too small to merit this functionality, or perhaps that readers are not accustomed to this type of interaction. We note that journals such as Nature and Nature Geoscience allow simi-lar functionality in the form of online comments, and these too are often unused.
As a service to readers, we maintain a minimal news sec-tion that includes a list of recent and upcoming relevant meet-ings and a growing list of relevant journal articles. We suspect
▲Figure 2. Photographs from initial CORSSA workshop in Zürich, during which participants organized themes, began drafting arti
that these manually curated lists will serve as a convenient cen-tral point to access the latest information related to statistical seismicity research advances. Such a resource is increasingly useful; as David Foster Wallace pointed out in 1996, as the amount of information that we daily receive continues to grow, we need some method for filtering what is important (Lipsky 2010, 38). We intend for this news section to be sparingly used to announce calls for papers for special issues of journals or conference sessions.
We have received anecdotal positive feedback regarding CORSSAConclusionsarticles. The content has been effective for introducing new and continuing graduate students to com-plex topics, and article reprints that we brought to conferences have been very popular souvenirs.
CONCLUSIONS
Less than one year after work began on CORSSA, we have made tremendous progress in building a resource that we believe will educate students and researchers.
When designing CORSSA, we made a deliberate decision to limit the initial scope of the project; as none of the partici-pants had done something quite like this before, and because we were mostly reliant on volunteer efforts, we resisted the temp-tation to make an excessively ambitious plan. It was primarily for this reason that we chose to emphasize statistical seismic-ity analysis rather than the broader field of statistical seismol-ogy. Moreover, seismicity analysis has tended to dominate the recent StatSei meetings (e.g., Schorlemmer and Jackson 2009).
Nevertheless, nothing about the design of CORSSA precludes us from expanding to cover other topics within statistical seismology. As research interests evolve, so too can CORSSA, provided that a sufficiently energetic community persists. We suspect that many other subfields would benefit from a resource similar to what we have designed and imple-mented, and because so many of the features of CORSSA are not knowledge domain specific, we hope that it can serve as a blueprint for others.
ACKNOWLEDGMENTS
Portions of this article appear in slightly different form in the CORSSA article by Michael and Wiemer (2010). We thank an anonymous referee for many insightful comments and use-ful suggestions. We thank Benno Luthiger and Philipp Kästli for general technical assistance. We thank Angela Gastl and Francesco Croci for assistance with DOIs. We thank the fol-lowing organizations for supporting CORSSA: Network of Research Infrastructures for European Seismology (NERIES), the Swiss Seismological Service, Southern California Earthquake Center, and the U.S. Geological Survey. JDZ was partially supported by NSF grant EAR-0944202. We espe-cially thank Mietta Petronio for her patient and energetic sup-port of this work.
REFERENCES
Aki, K. (1965). Maximum-likelihood estimate of b in the formula log
N = a − bM and its confidence limits. Bulletin of the Earthquake Research Institute45, 237–239.
Gulia, L., S. Wiemer, and M. Wyss (under review). Catalog artifacts and quality control. Community Online Resource for Statistical Seismicity Analysis; doi:10.5078/corssa-93722864.
Hainzl, S., S. Steacy, and D. Marsan (2010). Seismicity models based on Coulomb stress calculations. Community Online Resource for Statistical Seismicity Analysis; doi:10.5078/corssa-32035809. Husen, S., and J. L. Hardebeck (2010). Earthquake location accuracy.
Community Online Resource for Statistical Seismicity Analysis; doi:10.5078/corssa-55815573.
Iwata, T. (under review). Earthquake triggering caused by the external oscillation of stress/strain changes. Community Online Resource for Statistical Seismicity Analysis; doi:10.5078/corssa-65828518. Lipsky, D. (2010). Although of Course You End Up Becoming Yourself: A
Road Trip with David Foster Wallace. New York: Broadway, 352 pp. Marsan, D., and M. Wyss (2011). Seismicity rate changes. Community
Online Resource for Statistical Seismicity Analysis; doi:10.5078/ corssa-25837590.
Michael, A. J., and S. Wiemer (2010). CORSSA: The Community Online Resource for Statistical Seismicity Analysis. Community Online Resource for Statistical Seismicity Analysis; doi:10.5078/ corssa-39071657.
Mignan, A., and J. Woessner (under review). Completeness magnitude in earthquake catalogs. Community Online Resource for Statistical Seismicity Analysis; doi:10.5078/corssa-00180805.
Naylor, M., K. Orfanogiannaki, and D. Harte (2010). Exploratory data analysis: Magnitude, space, and time. Community Online Resource for Statistical Seismicity Analysis; doi:10.5078/corssa-92330203. Schorlemmer, D., and D. D. Jackson (2009). Seismologists and
statisti-cians establish new research targets. Eos, Transactions, American Geophysical Union90 (43); doi:10.1029/2009EO430008. Schwarz, M. A. (2008). The importance of stupidity in scientific research.
Journal of Cell Science121, 1,771.
van Stiphout, T., J. Zhuang, and D. Marsan (under review). Seismicity declustering. Community Online Resource for Statistical Seismicity Analysis; doi:10.5078/corssa-52382934.
Vere-Jones, D. (2010). How to educate yourself as a statistical seismolo-gist. Community Online Resource for Statistical Seismicity Analysis; doi:10.5078/corssa-17728079.
Weichert, D. H. (1980). Estimation of the earthquake recurrence param-eters for unequal observation periods for different magnitudes.
Bulletin of the Seismological Society of America70 (4), 1,337–1,346. Woessner, J., J. L. Hardebeck, and E. Haukkson (under review). What is
an instrumental seismicity catalog? Community Online Resource for Statistical Seismicity Analysis; doi:10.5078/corssa-38784307. Zechar, J. D. (2010). Evaluating earthquake predictions and earthquake
forecasts: A guide for students and new researchers. Community Online Resource for Statistical Seismicity Analysis; doi:10.5078/ corssa-77337879.
Zhuang, J., M. J. Werner, D. Harte, S. Hainzl, and S. Zhou (forthcom-ing). Basic models of seismicity. Community Online Resource for Statistical Seismicity Analysis; doi:10.5078/corssa-47845067.
Swiss Seismological Service, ETH Zurich NO H3 Sonneggstrasse 5 8092 Zurich, Switzerland [email protected] (J. D. Z.) J. Douglas Zechar et al.