We presented the first truly streaming CDC sys- tem, showing that significantly better performance is achieved by using an informed sampling technique that takes into account a notion of streaming dis- course. We are able to get to within 5% of an ex- act system’s performance while using only 30% of the memory required. Instead of improving perfor- mance by using an uninformed sampling technique and doubling the memory available, similar perfor- mance can be achieved by using the same amount of memory and a informed sampling technique. Fur- ther work could look at improving the similarity metric used, applying these samplingtechniques to other streaming problems or adding a mention com- pression component.
Weighted uniform hypergraphs have been particularly interesting in the computer vi- sion since hypergraphs provide a natural way to represent multi-way similarities. Yet, it is computationally expensive to compute the entire affinity tensor of the hypergraph. As a consequence, several tensor sampling strategies have come into existence. We provide the first theoretical analysis of such samplingtechniques in the context of uniform hyper- graph partitioning. Our result suggests that consistency can be achieved even with very few sampled edges provided that one assigns a higher sampling probability for edges with larger weight. The derived sampling rate is much lower than that known in tensor litera- ture (Bhojanapalli and Sanghavi, 2015; Jain and Oh, 2014), and our analysis also justifies the superior performance of popular sampling heuristics (Chen and Lerman, 2009; Duchenne et al., 2011). We finally proposed a iteratively sampled variant of TTM, and empirically demonstrated the potential of this method in subspace clustering and motion segmentation applications.
The efficiency and generality of the developed methods are very promising, and have been well received by the academic community, via conference presentations and journal articles. Uncertainty quantification represents the fundamental tool for risk analysis in any area of engineering applications. Beyond the philosophical digressions, aimed at identifying the most appropriate mathematical framework, this doctoral study has been oriented to provide a practical answer to the need for quantifying the uncertainty in all its complexity. Until now, Imprecise Probability appears to be the most general theory to quantify such uncertainty, as it can be used to describe a vast range of different situations; for example in decision theory, game theory, financial maths, risk, reliability and availability analysis, model updating, surrogate modelling, reliability-based optimi- sation, non-stationary stochastic processes, power spectra estimations, and many more. Imprecise Probability extends the notion of classical probability to include set-valued descriptors. However, applications of Imprecise Probability to real-scale engineering problems, so far, has been possible only in very limited cases. Among the reasons for this, the efficiency is certainly predominant. In the doctoral study, the use of Ad- vanced Samplingtechniques has allowed surmounting this issue, thus linking Imprecise Probability ever closer to the community of practitioners.
Applied statistics research plays pivotal role in diverse problems of social sciences, agricultural sciences, health sciences, and business research. Many investigations are conducted by survey research. The technique of sampling and determination of sample size have crucial role in survey-based research problems in applied statistics. Specific samplingtechniques are used for specific research problems because one technique may not be appropriate for all problems. Similarly, if the sample size is inappropriate it may lead to erroneous conclusions. The present paper gives an overview of some commonly used terms and techniques such as sample, random sampling, stratified random sampling, power of the test, confidence interval that need to be specified for a sample size calculation and some techniques for determination of sample size, and also describes some sampling methods such as purposive random sampling, random sampling, stratified random sampling, systematic random sampling and quota sampling for specific research purposes.
Note: convenient sampling is not always a mutually exclusive category of non-probability samplingtechniques rather it is used in various other types of it. For instance, a researcher wants to investigate difference in aesthetic sense among people belonging to different educational domains. A quota is made for every domain of arts and science faculties in a university. The researcher reaches every department in the morning and assesses the students who were sitting free in the lawn. In this way the sample is chosen on the convenience of the investigator for he was free in the morning. Moreover, he approached the students who were free at that time and did not have any classes.
This paper intends to look into the application of statistical samplingtechniques to auditing. As voluminous data extensive testing, the conventional techniques may not be adequate and competence to the statistical method. The user of die data especially the financial statements require more stringent and concrete evidence to evaluate the status of their investment. The objectivity and calculated sampling risk of the statistical method assure a higher degree of confidence in auditor’s opinion and a more defensible results. Somehow on the contrary, the Bayesian approach which suggests the auditor's subjective estimate to the population be involved in the evaluation is discussed.
Given that 47.3% of all input segments are NA, as shown in Table 1, its unsurprising their inclu- sion significantly impacted training time and re- sults. We find that this simple form of Nega- tive Sampling yields non-trivial improvements on MRQA (see Table 2). We hypothesize this is pri- marily because a vaguely relevant span of tokens amid a completely irrelevant NA segment would monopolize the predicted probabilities. Mean- while the actual answer span likely appears in a segment that may contain many competing spans of relevant text, each attracting some probability mass. As we would expect, the improvement this technique offers is magnified where the context is much longer than M . To our knowledge this technique is still not prevalent in purely extractive question answering, though Alberti et al. (2019) cite it as a key contributor to their strong baseline on Google’s Natural Questions.
Right off the bat of all, in our current investigation, we contrasted the ISMOTE calculation with a solitary choice tree and SMTOE calculation [15] for execution appraisal. This is principally on the grounds that these techniques are single-show based learning calculations. Factually, gathering based learning calculations can enhance the exactness and power of learning execution, subsequently as a future research bearing, the ISMOTE calculation can be stretched out for mix with outfit based learning calculations. To do this, one should utilize a bootstrap examining system to test the first preparing informational collections, and afterward insert ISMOTE to each inspected set to prepare a theory. At long last, a weighted blend casting a ballot rules like AdaBoost.M1 [35] [36] can be utilized to join all choices from various theories for the last anticipated yields. In such circumstance, it is fascinating to see the execution of such supported ISMOTE calculation with those of SMOTEBoost [16], DataBoost-IM [17] and other outfit based imbalanced learning calculations.
The theory presented in section 2.2 can be used to investigate the structure of the closed-loop transfer function matrices of the class of tracking systems incorporating controllable and[r]
In many situations the cost of computing the value of a function at some point is very high, either because the analytic expression of the function is extremely complex, or because the value is the result of an experiment. Therefore, due to budget restrictions, the function can be computed only at a finite number of points. Often the object of interest is not the whole graph of the function, but only some functional. Monte Carlo estimation of functionals such as the maximum or the integral of a real valued function f is the subject of a very large number of papers. In most cases some regularity of the function f is assumed, see, for example, Novak (1988) or Zhigljavsky and Chekmasov (1996). Moreover, in much of the literature, estimators are compared in terms of a given loss function, which may be arbitrary. The emphasis in this paper is on showing that more general comparisons of estimation methods are possible in terms of suitable stochastic orders that imply comparisons for wide classes of loss functions. At the same time, we attempt to work with minimal assumptions on f. In this paper we will compare estimators with respect to different stratified sampling schemes, and will show that, generally speaking, refining stratification leads to an improvement of estimators.
each field the average soil test values for the single-core samples and the values for the composite samples generally agreed closely, but that because of fertility variability in the fi[r]
Clearly, specialist vessels are required to drill into rock and heavy lifting winches or A-frames are required to deploy heavy gear such as vibro-cores. For all sampling types vessel stability is important. Bringing sampling equipment back on deck in rough weather can be hazardous and all operations have a maximum sea- state (depending of the motion of the vessel and its configuration) after which deployment and especially recovery is no longer safe. Catamarans make good sampling vessels as they are stable. Sampling is always preferably done from side-winches located mid-ship, if available. For lifting heavier gear or recovering from deeper water (remember you have to factor in the weight of the cable also), it may be necessary to deploy and recover off a stronger A-frame over the stern. This means there will be more vessel pitch and potential vessel roll implications.
On plots 3 and 4 somewhat better agreement between actual and sampled data for three of the major species (switchgrass, splitbeard bluestem, and big bluestem) wa[r]
We have been vague about the nature of “statistical analysis” mentioned above, since a range of statistical problems can be tackled using the diffusion bridge simulation methodology we develop here: parameter estimation of discretely observed diffusions, on-line inference for diffusions observed with error, off-line posterior estimation of partially observed diffusions etc. Additionally a range of computational statistics tools can be combined with the simulation methodology to give answers to the aforementioned problems: the EM algorithm, simulated likelihood, Sequential Monte Carlo, Markov chain Monte Carlo etc. Given the wide range of applications where diffusions are employed, it is not surprising that important methodological contributions in bridge simulation are published in scientific journals in statistics, applied probability, econometrics, computational physics, signal processing etc. Naturally, there is a certain lack of communi- cation across disciplines and one of the aims of this chapter is to unify some fundamental techniques.
composition of a biomarker in tissues or blood depend on multiple processes such as digestion and absorption in the gastrointestinal tract, transport in the blood, up- take, distribution, and metabolism in a variety of cells, and excretion via the kidney and gastrointestinal tract. All these processes involve multiple gene products with polymorphisms potentially creating large individual variations [5]. Moreover, different physiological states like fasting feeding, cold, warm, resting, exercising, sex, menstrual cycle, pregnancy, lactation, and age might have effects on the lipid spectrum. Finally, the nutrient composition of ingested food, endogenous production of different molecules, flux into and out of various compartments in the body, and sampling time points, must be considered when omics data are interpreted. All these considerations make it likely to suggest that the rapid development of biomarker measurements to be discussed in this review will represent an important addition to the information obtain by classical methods
random and non-random or probability and non-probability sampling, respectively. In random sampling each unit of population has the same probability of being selected as part of the sample and hence also named as probability sampling. Whereas, in non-random sampling every unit of the population is not equally likely to be selected, assigning a probability of occurrence in non-random sampling is impossible. In random sampling it is possible to answer research questions and to achieve objectives that require us to estimate statistically characteristics of the population from the sample. For non- probability sampling, it is impossible to answer research questions or to address objectives that require you to make statistical inferences about the characteristics of the population. We may still be able to generalize from non- probability sample about the population but not on statistical grounds. Under non-probability sampling, methods such as quota, purposive or judgmental, convenience or haphazard, snowball and self-selection falls. And probability sampling method has simple random, systematic, stratified and cluster sampling.(See Figure 1) These are the most vital and highly used techniques of probability and non-probability sampling. For many research projects we need to use a variety of samplingtechniques at different stages.
The conservation of polymer-based cultural heritage is a major concern for collecting institutions internationally. Col- lections include a range of different polymers, each with its own degradation processes and preservation needs, how- ever, they are frequently unidentified in collection catalogues. Fourier transform infrared (FTIR) spectroscopy is a useful analytical tool for identifying polymers, which is vital for determining storage, exhibition, loan and treatment condi- tions. Attenuated Total Reflection (ATR), and External Reflection (ER) are proven effective FTIR samplingtechniques for polymer identification and are beginning to appear in conservation labs. This paper evaluates and optimises the application of these two FTIR techniques to three-dimensional plastic objects in the museum context. Elements of the FTIR measurement process are investigated for 15 common polymers found in museum collections using both authentic reference sheets, and case study objects to model for surface characteristics. Including: use of the ATR and ER modules, the difference between clamping and manually holding objects in contact with the ATR crystal, use of the Kramers–Kronig Transformation, signal-to-noise ratios for increasing number of co-added scans, resultant time taken to collect each measurement, associated professional, health and safety considerations, and the use and avail- ability of reference materials for polymer identify verification. Utilising this information, a flowchart for applying FTIR spectroscopy to three-dimensional historic plastic objects during museum collection surveys is proposed to guide the conservation profession.
In this paper simple linear regression model is considered with respect to samples taken from the samplingtechniques like simple random sampling (SRS), systematic sampling (SYS) including rank set sampling (RSS).The method of estimation used in this paper is the ordinary least squares method (OLS). Also, bivariate ranked set sample is introduced, Al-Saleh and Zheng (2002). Finally regression models based on different identified sampling schemes are compared with each other based on validation technique (Jackknifing), which is a sample reuse technique, Quenouille (1949). A bivariate rank set sampling given by Al-Saleh and Zheng (2002) can be obtained as follows:
Abstract: In this study we have explored an estimator for finite population total under the famous prediction approach. This approach has been compared with design-based approach using simple random sampling and stratified random samplingtechniques. It is shown that the estimators under model based approach give better estimates than the estimators under design based approach both when using simple random sampling (s.r.s) and stratified random sampling. The relative absolute error from both approaches is computed and has been shown to be superior under the super population model than the design based approach. This approach is then applied to predict the total number of people living with HIV/AIDS in Nakuru Central district.