There are several tools specialized for **analysis** of **time** **series**. R is a statistics soft- ware that has extensive features for analyzing **time** **series** **data**. In the Open Source community, there are two popular tools: opentsdbr [10] and StatsD **OpenTSDB** pub- lisher backend [11]. Both tools uses **OpenTSDB** HTTP/JSON API to query **data** from **OpenTSDB**. This API is only useful for small scale **analysis** due to its non distributed implementation that creates performance bottlenecks for real world applications. It requires huge memory to store **time**-**series** **data** at client side. Moreover, it is **time** con- suming due to transfering of **data** through network interface. For visual **analysis**, both systems use third party packages in R for displaying high dimensional **time** **series** **data**. Some of the most common **time**-**series** **analysis** tools are: GRETL (GNU Regression, Econometrics and **Time**-**series** Library) [12], TimeSearcher [13], Calendar-Based Visu- alisation [14] and Spiral [15] etc., but they are not specialised for real-world **time**-**series** **analysis**. These tools are not designed to work with distributed programming model. These tools works with single node, so the tasks are not distributed. If users want to do statistical **analysis** on massive amounts of **data** using these tools, it will take couple of days.

Show more
68 Read more

The idea of smooth transition regression models is based on the observation that many economic variables are sluggish and will not move until some state variable exceeds a certain threshold. For example, price arbitrage in markets will only set in once the expected profit of a trade exceeds the transaction cost. This observation has led to the development of models with fixed thresholds that depend on some observable state variable. Smooth transition models allow for the possibility that this transition occurs not all of a sudden at a fixed threshold, but gradually, as one would expect in **time** **series** **data** that have been aggregated across many market participants. A simple example is the smooth-transition AR(1) model:

Show more
24 Read more

where d < 1/ 2 and d ≠ 0. Such ‘long memory’ models may be estimated by the two- step procedure of Geweke and Porter-Hudak (1983) or by maximum likelihood (Sowell, 1992; Baillie, Bollerslev, and Mikkelsen, 1996). A detailed discussion including extensions to the notion of fractional co-integration is provided by Baillie (1996). Long memory may arise, for example, from infrequent stochastic regime changes (Diebold and Inoue, 2001) or from the aggregation of economic **data** (Granger, 1980; Chambers, 1998). Perhaps the most successful application of long- memory processes in economics has been work on modeling the volatility of asset prices and powers of asset returns, yielding new insights into the behavior of markets and the pricing of financial risk.

Show more
24 Read more

This study provides further evidence that exposure to tobacco smoke is an independent risk factor which increases the risk of IMD. Of the four countries studied, the most complete dataset for IMD, smoking, ILI and household crowding was obtained from Norway. Over a 34-year period, a 5.2–6.9% increase in IMD in children under 5 years of age was observed for every 1% rise in prevalence of smoking in adults aged between 25 and 49 years. Taken together with previous case–control studies showing smoking as a risk factor for contracting IMD, the reduction in smoking prevalence that has occurred in Norway during this period is likely to have made a signi ﬁ cant contribution to the concurrent reduc- tion in incidence of IMD. The proportion of IMD cases under 5 years of age in the total population that could be attributed to active smoking in Norway was found to be 11.4%, which is far lower than that estimated in other studies for young children. 22 37 The lack of dem- onstrable associations between incidence of IMD and prevalence of smoking, after adjustment for the same confounding variables, in Denmark, Sweden and the Netherlands may in part be ascribed to the limited **data**- sets available. The absence of statistically signi ﬁ cant asso- ciations is hence difﬁcult to interpret, although unadjusted **analysis** showed positive associations between IMD in children related to older smokers in Sweden and the Netherlands. In contrast, negative associations were found related to younger smokers in Sweden. These mixed patterns of associations may indicate that not all biologically relevant confounding factors were accounted for.

Show more
11 Read more

This chapter presents an introduction to the branch of statistics known as **time** **series** **analysis**. Often the **data** we collect in environmental studies is collected sequentially over **time** – this type of **data** is known as **time** **series** **data**. For instance, we may mon- itor wind speed or water temperatures at regularly spaced **time** intervals (e.g. every hour or once per day). Collecting **data** sequentially over **time** induces a correlation between measurements because observations near each other in **time** will tend to be more similar, and hence more correlated to observations made further apart in **time**. Often in our **data** **analysis**, we assume our observations are independent, but with **time** **series** **data**, this assumption is often false and we would like to account for this temporal correlation in our statistical **analysis**.

Show more
30 Read more

This study will utilize quarterly **time** **series** **data** with period from 2000 to 2018 for aggregate level **data** and quarterly **data** from 2005-2018 for trade balance of commodity-level. This study will use this period because this study will focus on the period after the Asian financial crisis that made Indonesia’s exchange regime shift into a floating exchange regime. Further information regarding the **data** is described as follows:

60 Read more

The mainstream is not without its problems, however. Kreps (ibid) notes the difficulties of game theory, for example: on what basis is an equilibrium chosen if there are multiple equilibria? And what if players make moves which run counter to theory? Day (1993) points out th at the founders of classical economics, including Adam Smith himself, were well aware that not all of human behaviour was rooted in balance and rationality. Atkinson (1969, in Ormerod 2005, p. 21), states th at it may take over 100 years for economic growth equilibria to stabilise - meaning that the systems we observe are largely in disequilibrium in any case. And in the latter half of the 20th century, the advent of chaos theory undermined the idea that even the simplest behavioural foundations would necessarily result in an analytically tractable outcome. The sentiment is succinctly expressed by Strogatz (1994): “If you listen to your two favourite songs at the same **time**, you won’t get double the pleasure!”2

Show more
237 Read more

The first is detailed in Chapter 2 and is motivated by estimating the power spectrum of HRV **time** **series** in a way that provides insight into the workings of the autonomic nervous system (Malik et al., 1996). Because HRV is a nonstationary **time** **series**, it poses a specific challenge in that the frequency characteristics of its power spectrum can vary over **time** (Priestley, 1965). Furthermore, since **time**-varying power is estimated as a three dimensional surface, often clinicians use summa- rizing measures in their research, such as power within a band of frequencies. Our method hopes to provide an alternative by aiding in the interpretation of these structures by reframing the typical locally stationary Fourier estimate of the **time** varying spectrum in a penalized reduced rank re- gression setting. This allows for the power spectrum to be broken up into multiple unit-rank layers that are formed by multiplying an “importance” singular value, a left singular “**time**” vector, and a right singular “frequency” vector together. An adaptive sparse fused lasso penalty is imposed on these vectors that introduces sparsity and smoothness into the estimate. These layers can then be examined individually for patterns and the singular vectors provide a parsimonious representation of the **time**- and frequency-varying characteristics of the power spectrum.

Show more
94 Read more

In the intra-brain neural networks, improvisation was found to have triggered a more widely distributed network than composed music [216]. The distribution of intra-brain neural information flows expands from the back of the brain (pianist) or the right of the brain (listener) to the entire brain, when composed music is changed to improvisation. The frontal (attention and executive control) and central (motor cortex) regions became acti- vated when musicians played the improvisations. This may be because either performing or listening to improvisations demands more widespread functional coordinations between **large** brain regions [216]. Also, the intra-brain causality values were found to be sig- nificantly greater in composed music than in improvisation, particularly for the listeners, where the neural information flows separately began and ended in the left frontal and the right frontal regions in composed music and reverse directions when composed music is changed to improvisation [216]. Similarly, the differences between strict mode and “let- go” mode can also be found in the frontal activities and the inversion of information flows when the strict mode is changed to the “let-go” mode. These results agree with early stud- ies [22] [164] that the frontal regions (a more general area that covers the dorsal prefrontal regions) especially the right frontal region plays an important role in free improvisation of melodies and rhythms, which is the key regions that distinguish the brain activities be- tween composed music and improvisation and between strict mode and “let-go” mode [22] [164]. Moreover, in the contrast intra-brain neural networks, the central regions tend to act as transit hubs that transport the neural information flows, while the temporal and parietal regions also behave differently to different experimental conditions [216]. Moreover, the results of the differences between experimental conditions are robust and independent of the significance thresholds (Remark 12.3.1) [216].

Show more
254 Read more

Predeﬁned pattern detection from **time** **series** is an interesting and challenging task. In order to reduce its computational cost and increase effectiveness, a number of **time** **series** represen- tation methods and similarity measures have been proposed. Most of the existing methods focus on full sequence matching, that is, sequences with clearly deﬁned beginnings and end- ings, where all **data** points contribute to the match. These methods, however, do not account for temporal and magnitude deformations in the **data** and result to be ineffective on several real-world scenarios where noise and external phenomena introduce diversity in the class of patterns to be matched. In this paper, we present a novel pattern detection method, which is based on the notions of templates, landmarks, constraints and trust regions. We employ the Minimum Description Length (MDL) principle for **time** **series** preprocessing step, which helps to preserve all the prominent features and prevents the template from overﬁtting. Tem- plates are provided by common users or domain experts, and represent interesting patterns we want to detect from **time** **series**. Instead of utilising templates to match all the potential subsequences in the **time** **series**, we translate the **time** **series** and templates into landmark sequences, and detect patterns from landmark sequence of the **time** **series**. Through deﬁn- ing constraints within the template landmark sequence, we effectively extract all the land- mark subsequences from the **time** **series** landmark sequence, and obtain a number of land- mark segments (**time** **series** subsequences or instances). We model each landmark segment through scaling the template in both temporal and magnitude dimensions. To suppress the inﬂuence of noise, we introduce the concept of trust region, which not only helps to achieve an improved instance model, but also helps to catch the accurate boundaries of instances of the given template. Based on the similarities derived from instance models, we introduce the probability density function to calculate a similarity threshold. The threshold can be used to judge if a landmark segment is a true instance of the given template or not. To evaluate the effectiveness and eﬃciency of the proposed method, we apply it to two real-world datasets. The results show that our method is capable of detecting patterns of temporal and magnitude deformations with competitive performance.

Show more
15 Read more

There is a clear seasonal effect present in the **series**, but the size of the seasonal effects seems to be increasing as the level of the **series** increases. The number of passengers is clearly increasing with **time**, with the number travelling in July and August always being roughly 50% greater than the number travelling in January and February. This kind of proportional variability suggests that it would be more appropriate to examine the **series** on a log scale. Figure 5.18 shows the **data** plotted in this way. On that scale the **series** shows a consistent level of seasonal variation across **time**. It seems appropriate to analyse this **time** **series** on the log scale.

Show more
111 Read more

Abstract: We present an earth observation based approach to detect aquaculture ponds in coastal areas with dense **time** **series** of high spatial resolution Sentinel-1 SAR **data**. Aquaculture is one of the fastest-growing animal food production sectors worldwide, contributes more than half of the total volume of aquatic foods in human consumption, and offers a great potential for global food security. The key advantages of SAR instruments for aquaculture mapping are their all-weather, day and night imaging capabilities which apply particularly to cloud-prone coastal regions. The different backscatter responses of the pond components (dikes and enclosed water surface) and aquaculture’s distinct rectangular structure allow for separation of aquaculture areas from other natural water bodies. We analyzed the **large** volume of free and open Sentinel-1 **data** to derive and map aquaculture pond objects for four study sites covering major river deltas in China and Vietnam. SAR image **data** were processed to obtain temporally smoothed **time** **series**. Terrain information derived from DEM **data** and accurate coastline **data** were utilized to identify and mask potential aquaculture areas. An open source segmentation algorithm supported the extraction of aquaculture ponds based on backscatter intensity, size and shape features. We were able to efficiently map aquaculture ponds in coastal areas with an overall accuracy of 0.83 for the four study sites. The approach presented is easily transferable in **time** and space, and thus holds the potential for continental and global mapping. Keywords: aquaculture; SAR; Sentinel-1; **time** **series**; image segmentation; remote sensing; ponds; coastal zone; river delta

Show more
23 Read more

The Dickey-Fuller test statistics for the joint hypotheses are computed in the same way as the usual F -test statistics Reject the null hypothesis if the test statistic is too **large** The critical values are not the quantiles of the F -distribution There are tables with the correct critical values

148 Read more

The first objective of the experiments is to study the performance of the proposed method with regard to various **data** parameters and query parameters (Section 4.3). As explained earlier, the exhaustive technique, which calculates all the correlation coefficients in (18), is infeasible since its **time** complexity is 𝑂(𝑁 𝑛 ), where N is the number of observed values at each timestamp and n is the dimension (length) of the uncertain **time** **series**. Thus, to make the similarity search feasible in different settings, similar to [DAL12], we reduced and used the input **data**, obtained by truncating the dataset to 50 **time** **series** of dimension 6 with 3 observed values at each timestamp. For example, given a correlation threshold c , probability threshold p, and SDR r, we need to do over 26.5 million calculations (with 50 **time** **series**) in the exhaustive technique, and in total over 15.7 billion calculations (with 9 SDR, 6 correlation thresholds, and 11 probability thresholds (Section 4.3)). This shows that even for small uncertain **time** **series** dataset, the exhaustive technique requires an excessive amount of processing **time**.

Show more
132 Read more

Although the selection of window length is an important issue for SSA (Hassani et al., 2012; Hassani and Mehmoud- vand, 2013), this paper chooses the same window length (L = 120) as that in Schoellhamer (2001) in order to com- pare the performance of the proposed method with that of Schoellhamer. Using the synthetic **time** **series** we compute the lagged correlation matrix and the variances of each mode. The first four modes contain the periodic components, which account for 72.3 % of the total variance; in particular, the first mode contains 50.2 % of the total variance. In order to eval- uate the accuracies of reconstructed PCs from the **time** **series** with different percentages of missing **data**, following the ap- proach of Shen et al. (2014), we compute the relative errors of the first four modes derived by ISSA and SSAM with the following expression:

Show more
with T → ∞; see Gijbels & Peng (2000) for a consideration of similar estimators. For the estimation of the bounds a(τ) and b(τ ) we take advantage of our assumption that a(τ ) and b(τ) are smooth, two times continuously differentiable, bounds. This allows us to estimate these bounds consistently as T → ∞ even if n(T ) remains bounded. However, the scatter plot in Figure 2.2 suggests a nonstandard assumption on the shape of a(τ) and b(τ ), which excludes the usage of classical boundary estimators such as free disposal hull (FDH) estimators or **data** envelope estimators [see, e.g., Deprins et al. (1984) and Kneip et al. (1998)]. Instead, we use nonparametric local linear regression in order to estimate the bounds a(τ) and b(τ ). On the one hand, this allows us to estimate arbitrary smooth boundary functions; on the other hand, it seamlessly fits to our unify- ing nonparametric regression problem in Eq. (2.15). We use the deterministic frontier regression model proposed by Martins-Filho & Yao (2007), which can be formulated for our case as

Show more
161 Read more

Generating workloads for **data** **series** indexes. Our fourth contribution is motivated by the fact that up to this point very little attention had been paid on how to properly evaluate **data** **series** index structures. Most previous work relied solely on randomly selecting **data** **series** with or without adding noise, which were then used as queries. A hardness **analysis** of these queries was always omitted, instead measuring index per- formance as the average query answering **time** across a **large** number of queries. On the contrary, in the context of relational databases, various benchmark workload gen- eration techniques have been proposed through the years. Such techniques included methods for generating queries with specific properties, carefully designed to stress different parts of the database stack. In this thesis, we argue that apart from creat- ing novel **data** structures for **data** **series**, there is also a need for carefully generating a query workload, such that these structures are stressed at appropriate levels. To solve this problem, in Chapter 6, we define measures that capture the characteristics of queries, and we propose a method for generating workloads with the desired prop- erties, that is, effectively evaluating and comparing **data** **series** summarizations and indexes. In our experimental evaluation, with carefully controlled query workloads, we shed light on key factors affecting the performance of nearest neighbor search in **large** **data** **series** collections.

Show more
176 Read more

this is indeed obtained under the proposed new recursions and so the respective correlations at points of **time** where there are gaps are 0.633 (at t = 24), 0.779 (at t = 43), 0.812 (at t = 75) and 0.809 (at t = 86); the mean of these correlations is 0.792, which is close to the real 0.8 under the simulation experiment.

13 Read more

We analyze the tea price **data** of three regions, NI, SI and ARIMA models for these **data**. **Time** **series** plot of the three types of **data** (Figure 1) revealed that the **data** is not stationary, but shows an upward trend. To tionary successive differences are taken to create new **series**. Now we look at the autocorrelation func- tion (ACF) and partial autocorrelation function (PACF) of the differenced **series** for determining the order of the most ed in identifying model parameters are autocorrelation function (ACF) and partial autocorrelation function (PACF). First we analyze the **data** of NI. The **time** stationary. For NI region the ACF cosine waves and each value is highly significant. PACF (Figure 3) is significant at lags 1, 5

Show more
Contrasting the trends displayed in Figures 19 and 20 with those displayed in Figures 16 and 17 highlights the inherent challenge in assessing the fractal proper- ties of **time**-**series** structures that suffer from limited total length and/or limited resolution/spectral content. Indeed, accommodating the impact of a minimum fea- ture size that is significantly in excess of the trace’s resolution limit generally necessitates restricting a fractal **analysis** to length scales larger still than even this observed minimum feature size. This in turn often restricts an **analysis** of scaling properties to a consideration of relatively few orders of magnitude in length. For example, performing a fractal **analysis** of a 512-point Fourier filtered trace using **analysis** cutoffs corresponding to 10 **data** points and 1/5 of the trace length corre- sponds to an **analysis** of the scaling behavior over barely more than one order of magnitude in length scale; attempting to increase the accuracy of the measurement by raising the fine-scale cutoff to 20 **data** points further reduces the scaling range to 0.71 orders of magnitude.

Show more
27 Read more