Big Data has been called the ‘fourth paradigm’ of scientific research, making it im- possible to overlook the symmetry with computer simulations. Because of this, in this chapter I will draw some similarities and differences between one and the other, with the hope of understanding their scope and limits.
6.1 The new paradigms
The denomination of computer simulations as a third paradigm of research has its origin within the advocates of Big Data studies. In fact, partisans of computer sim- ulations have never referred to them in this way, despite strong advances in favor of their scientific and philosophical novelty.
Conceiving computer simulations as a paradigm of research – regardless of their ordinal position – adds some pressure to the expectations that researchers and the general public have on them. Physics is conceived as a paradigm for all natural occurring phenomena because it provides insight into such phenomena. The elec- tromagnetic theory, for instance, is capable of explaining all electro and magnetic phenomena, and general relativity generalizes special relativity and Newton’s law of universal gravitation providing a unified description of gravity as a geometric prop- erty of space and time. To call computer simulations and Big Data a paradigm, one might think, must have a similar epistemic input as physics in terms of the insight they provide as well as their role as an epistemic authority. It is therefore important to briefly discuss the extent of such a (meritorious) title.
Jim Gray suggested calling computer simulations and Big Data new paradigms of research1in a talk given to the National Research Council and to the Computer Sci-
ence and Telecommunications Board in Mountain View, CA on January 2007. Dur- ing his talk, Gray connected computer simulations and Big Data with a chief concern existing in contemporary research, that is, that scientific and engineering practice are being affected by a ‘data deluge.’ The metaphor was intended to call attention to genuine concerns that many researchers have regarding the large amount of data stored, rendered, gathered, and manipulated by science and engineering practition- ers. To give a simple example, the Australian Square Kilometer Array Pathfinder (ASKAP) consists of 36 antennas, each of 12 meters in diameter, spread out over 4,000 square meters, and working together as a single instrument rendering as much
1For the most part, I am going to ignore the ordinal position of the paradigm. In this respect,
I will left unanswered the question of whether there is some presupposed hierarchy among the paradigms. As we saw in chapter 3, the positivist would take that experimentation remains a sec- ondary, confirmatory stance of theory, where theory is more important. After the flaws of the pos- itivist were exposed, a new experimentalist wave invaded the literature showing the vast universe of experimentation and their philosophical importance. Should advocates on computer simulations and Big-Data claim that there is a new, improved way of doing science and engineering, and that theory and experimentation are reserved only a minor role, they would be walking down the same dangerous road as the positivist.
as 700 TB/second of data.2Thus understood, the data deluge stems from an excess
in the production and collection of data that no single team of researchers can pro- cess, select, and understand without further aid by a computational system. This is the principal reason why Big Data requires special algorithms that help to sort out what is important data and what is not (e.g., noise, redundancies, incomplete data, etc.).
In this context, it is important to elucidate the notion of ‘paradigm’ as used by Grey and supporters because, so far, there are not available definitions. In particular, this is a term that can not be taken lightly, especially if it has implications in the status – cultural, epistemological, social – of a discipline and the way the public will accept it. In philosophy, ‘paradigm’ is a theoretical term that comes with specific assumptions. Clarifying the meaning of a ‘parardigm’ in the context of computer simulations and Big Data is, then, our first task.
Before we begin, one caveat must be acknowledged. Earlier, we discussed hybrid scenarios where computer simulations integrate with laboratory experimentation. In Big Data studies we face a similar situation. The Large Hadron Collider (LHC) is a good example of such an integration, as it exquisitely combines cutting edge science and technology with large amounts of data including, of course, experimen- tation, theory, interdisciplinary work, and computer simulations. Many simulations at the LHC are meant to optimize the computing resources needed to model the complexity of detectors and sensors, as well as the physics (Rimoldi 2011). Others, like state-of-the-art Monte Carlo simulations, calculate the Standard Model Higgs boson signal and any relevant background processes. The use of these simulations is to optimize selections of events, to evaluate their acceptance, and assess system- atic uncertainties (Chatrchyan et al. 2014). These simulations are meant to produce large amounts of data that eventually need to be carefully curated, selected, and clas- sified. Thus understood, computer simulations and Big Data are deeply interwoven in the process of rendering data, classifying it and using it, among other activities. In this book, I have intentionally avoided discussing hybrid scenarios such as the LHC suggests. This decision is based on a very simple reason. To me, before we are able to grasp computer simulations as hybrid systems, it is important to first understand them individually. Whereas hybrid scenarios provide a richer view, they also obscure important aspects of the epistemological and methodological analysis. In this respect, when discussing the third and fourth paradigm of research, I will be addressing computer simulations and Big-Data in a non-hybrid scenario.
Thomas Kuhn is the first philosopher to analyze in depth the idea of a ‘paradigm’ in scientific research. When discussing how paradigms function in science, Kuhn notes that “one of the things a scientific community acquires with a paradigm is a criterion for choosing problems that, while the paradigm is taken for granted, can be assumed to have a solution. To a great extent these are the only problems that the community will admit as scientific or encourage its members to undertake” (Kuhn 1962, 37). Calling computer simulations and Big-Data a paradigm of scientific re-
2Although there are several projects in science and engineering relying on Big Data, its presence
6.1 The new paradigms 177
search has specific implications in the way researchers carry out their practice, the problems worth solving, and the right methods to pursue such solutions.
What is, then, a ‘paradigm’? According to Kuhn, any mature science (e.g., physics, chemistry, astronomy, etc.) experiences alternating phases of normal sci- ence (e.g., Newtonian mechanics) and revolutions (e.g., Einsteinian relativism). During periods of normal science, a constellation of commitments are fixed, includ- ing theories, instruments, values, and assumptions. These conform a ‘paradigm,’ that is, the consensus on what constitutes exemplary instances of good scientific re- search. The function of a paradigm, then, is to supply puzzles for scientists to solve and to provide the tools for their solution (Bird 2013). As examples, Kuhn cites the chemical balance found in the Trait`e ´el´ementaire de chimie by Antoine Lavoisier, the mathematization of the electromagnetic field by James Clerk Maxwell, and the invention of calculus in Principia Mathematica by Isaac Newton (Kuhn 1962, 23). Each one of these books contain not only the key theories, laws, and principles of Nature, but also – and this is what makes them paradigms – guides on how to apply those theories for the solution of important problems (Bird 2013). Furthermore, they also provide new experimental and mathematical techniques for the solution of such problems. Examples of this sort have been already mentioned: the chemical balance for the Trait`e ´el´ementaire de chimie and the calculus for the Principia Mathematica. A crisis in science arises when the scientists’ confidence in a paradigm is lost due to its inability of, or failure to, solve a particularly worrying puzzle. These are the ‘anomalies’ that emerge in times of normal science. Such crises are typically followed by a scientific revolution, having one existing paradigm superseded by a rival. During a scientific revolution, the disciplinary matrix (i.e., the constellation of shared commitments) undergoes revision, sometimes even shaking to the core the corpus of beliefs and world view. Such revolutions typically emerge from the need of finding new solutions to anomalies and disturbing new phenomena that were co- existing within theories in periods of normal science. The classical example is the precession of the perihelion of Mercury that worked as a confirmatory stance for general relativity over Newtonian mechanics.3Revolutions, however, do not neces-
sarily affect the scientific progress, mainly because the new paradigm must retain at least some core aspects of its predecessor, especially the power to solve quantitative problems (Kuhn 1962, 160ff.). It is possible, however, that the new paradigm may lose some qualitative and explanatory power (Kuhn 1970, 20). In any case, we can say that in periods of revolution, there is an overall increase in the puzzle-solving power, the number and significance of the puzzles, and the anomalies solved by the revised paradigm (Bird 2013).
A paradigm, then, informs scientists about the scope and limits of their scientific domain, at the time that warrants that all legitimate problems can be solved within its own terms. Thus understood, it seems that neither computer simulations nor Big Data fit into this description. For starters, they are methods that implement theories and models, but not theories in themselves, and therefore they are not suited to pro-
3The precession of the perihelion of Mercury was explained by general relativity around 1925 –
with successive and more accurate measurements starting in 1959 – although it was an ‘anomalous’ phenomenon already known back in 1919.
mote a scientific crisis. Could they bestow a theory that questions our basic under- standing of, say, biology? Probably, but in this case they would not have a different status than any of the experiments used for debunking theories about spontaneous generation of complex life from inanimate matter.4One could of course speculate
that computer simulations and Big Data might become, in and by themselves, the- ories of some sort – or means for a theory. It is true that some philosophers have declared the ‘end of theory’ brought about by Big Data, but there is little evidence that research practice is actually heading in that direction. Moreover, changes of paradigm come with the new paradigm withholding the same explanatory and pre- dictive force as the old, with the addition of taking care of the anomalies that lead to the crisis in the first place. Big Data, as we will see in this chapter, not only has little preoccupation with ‘accumulating’ from previous paradigms, but it openly rejects many of its triumphs. Most evidently is the rejection of the need for explanations of phenomena of any kind. As many advocates of Big Data admit, it is not possible to explain why real-world phenomena happen, but only to show that they happen.
What, then, had Gray in mind when he called computer simulations and Big Data the third and fourth paradigm, respectively? Let us begin by noticing his division of scientific paradigms into four historical moments, namely,
1. Thousand of years ago, science was empirical describing natural phenomena; 2. Last few hundred years theoretical branch using models, generalizations; 3. Last few years a computational branch simulating complex phenomena; 4. Today: data exploration eScience unify theory, experiment, and simulation
– data captured by instruments or generated by simulator, – processed by hardware,
– information/knowledge stored in computer,
– Scientists analyses database/files using data management and statistics. Gray (2009, xviii)
Under this interpretation, a ‘paradigm’ is not so much a technical term in the sense given by Kuhn, as it is the set of coherent research practices – including meth- ods, assumptions, and terminology – that a community of scientists and engineers share among themselves. Such research practices do not require a scientific revo- lution, nor promote one. In fact, since computer simulations and Big Data make use of current scientific and engineering standards, theories, and the like, they seem to be already inserted into a paradigm. The hallmark of computer simulations and Big Data, however, is the mechanization and automatization of data by computers, clearly lacking in the first two paradigms. This means that the methods used and of- fered in the third and fourth paradigm are significantly different from experimenting with phenomena and theorizing about the world. I will refer to computer simulations and Big Data as ‘technological paradigms’ in an attempt to put some distance from the philosophical interpretation of ‘paradigm’ presented by Kuhn. Let us now see if we can make sense of these new technological paradigms.
4Louis Pasteur showed that the apparent spontaneous generation of microorganisms was actually