The **relative** **entropy** is a measure of the distance between two distributions. In statistics, it arises as an expected logarithm of the likelihood ratio. The **relative** **entropy** D(p k q) is a measure of the inefficiency of assuming that the distribution is q when the true distribution is p. For example, if we knew the true distribution of the random variable, then we could construct a code with average description length H (p). If, instead, we used the code for a distribution q, we would need H (p) + D(p k q) bits on the average to describe the random variable [1, p. 18].

Show more
Pruning is one approach to address this problem, where models are made more compact by discarding entries from the model, based on additional selection criteria. The challenge in this task is to choose the entries that will least degenerate the quality of the task for which the model is used. For language models, an effective algorithm based on **relative** **entropy** is described in (Seymore and Rosenfeld, 1996; Stolcke, 1998; Moore and Quirk, 2009). In these approaches, a criteria based on the KL divergence is applied, so that higher order n-grams are only included in the model when they provide enough additional information to the model, given the lower order n-grams. Recently, this concept was applied for translation model pruning (Ling et al., 2012; Zens et al., 2012), and results indicate that this method yields a better phrase table size and translation quality ratio than previous methods, such as the well known method in (Johnson et al., 2007), which uses the Fisher’s exact test to calculate how well a phrase pair is supported by data.

Show more
10 Read more

Based on this, we build KLD models for lemmas and POS trigrams to observe divergences at lexical and grammatical levels, respectively. For both linguistic levels, we use all lemmas/POS trigrams for modeling that occur at least five times in a document. For each window, KLD models are created comparing post with pre periods. For the analysis we use 2-, 5-, and 10-year windows inspecting 10- and 20-year ranges. Moreover, the individual contribution (discriminative power) of a feature to **relative** **entropy** allows us to observe which features are involved in change. The higher the KLD value of a feature (here: lemma or POS trigram), the more discriminative the feature is for the post period (see Equation (3)).

Show more
12 Read more

boundary asymptote to geometries of constant negative curvature. In this setting the entropies (and other field-theoretic quantities we will be concerned with) translate to geometric objects. Based on papers [1, 2], we will see that the **relative** **entropy** associated to a spherical boundary region can be viewed as a form of quasi-local energy, in the precise sense defined by Wald. The constraints obeyed by **relative** **entropy** map to an infinite family of constraints on the bulk, which hold for arbitrary spacetimes away from the vacuum. In various perturbative regimes, these constraints reduce to the linearized Einstein equations around vacuum, integrated positivity of the bulk stress-energy tensor and positivity of canonical energy. 1 Furthermore, our

Show more
68 Read more

The presented method exposes three main hyperparameters to be set by the practitioner. These are the number of options O, the **entropy** bound κ as well as the **relative** **entropy** bound . The number of options can usually be chosen generously, i.e., around 20 seems to be reasonable for a wide range of problems. The algorithm will prioritize more promising options such that even if too many options are initialized, only options which yield high rewards will be sampled from after the first few iterations. The **entropy** bound κ is probably the most important parameter to consider since it does not have a clear equivalent in existing approaches. However, our experiments showed that a value of 0.9 seems to work well in almost all cases and no major tuning was necessary. The parameter is probably the parameter that is the most tuning-intensive in the proposed method, especially if the total number of episodes is crucial, e.g., in real robot experiments. In our experience values for between 0.5 and 1.5 are reasonable and, most often, we would start a new task with = 1. Changing these parameters certainly influences the learning speed of the proposed method. However, while sub-optimal settings may lead to slower convergence, they usually do not prevent successful learning. Thus, in our experience, the algorithm is generally robust in that even sub- optimal settings will lead to convergence.

Show more
50 Read more

To design a communication system with a specific message handling capability we need a measure of information content to be transmitted. The **entropy** of a random variable is a measure of the uncertainty of the random variable; it is a measure of the amount of information required on the average to describe the random variable. The **relative** **entropy** is a measure of the distance between two distributions. In statistics, it arises as the expectation of the logarithm of the likelihood ratio. The **relative** **entropy** D(p k q) is a measure of the inefficiency of assuming that the distribution is q when the true distribution is p. For example, if we knew the true distribution of the random variable, then we could construct a code with average description length H (p). If, instead, we used the code for a distribution q, we would need H(p) + D(p k q) bits on the average to describe the random variable [6, p. 18]. Definition. (**Relative** **Entropy**) The **relative** **entropy** or, Kullback-Leibler dis- tance, between two probability mass functions p(x) and q(x) is defined by

Show more
12 Read more

Several studies [22][23] indicate that the issues connected with the assumptions of CAPM (viz. efficient market hypothesis) can be addressed using statistical methods based on Tsallis **entropy** [24], which is a generalization of Shannon **entropy** to non-extensive systems. These methods were originally proposed to study classical and quantum chaos, physical systems far from equilibrium such as turbulent systems (non-linear), and long range interacting Hamiltonian systems. However, in the last several years, there has been considerable interest in applying these methods to analyze financial market dynamics as well. Such applications fall into the category of econophysics [25]. The rest of the paper is organized as follows. In Section 2, Tsallis **relative** **entropy** with some necessary background on Tsallis **entropy** and 𝑞 -Gaussian distribution is discussed. A relationship between TRE and the parameters of a 𝑞 -Gaussian distribution is derived. Section 3 deals with the data and methodology for constructing risk optimal portfolios and their results. The conclusions are given in Section 4.

Show more
22 Read more

To be more precise, consider two random variables X and Y with a joint prob- ability mass function r (x, y) and marginal probability mass functions p (x) and q (y) , x ∈ X , y ∈ Y . The mutual information is the **relative** **entropy** between the joint distribution and the product distribution, that is,

be found using our quantum relative entropy with a suitable uniform prior density matrix.. 27.[r]

23 Read more

(1999) measured the distance between two prob- ability distributions by using **Relative** **entropy** or Kullback-Leibler divergence (KLD) which is cal- culated by using the equation (2). The Pearson correlation coefficient (Pearson, 1920) or simply correlation coefficient measures the linear corre- lation between two samples that is calculated by the equation (3). Since the task of IPD does not use any reference document we require a robust method for comparing small sections of the docu- ment **relative** to the whole document under ques- tion. Measuring the **relative** **entropy** and correla- tion coefficient between a small section and the rest of the document are possible methods. We use the frequency profiles of n-grams generated from the individual text-window (X) and the com- plete suspicious document (Y) separately for cal- culating **relative** **entropy** and correlation coeffi- cient. The probability distributions of n-gram fre- quencies (P and Q) is calculated from n-gram fre- quency profiles (from X and Y) for measuring the **relative** **entropy**.

Show more
Investigations have shown very little difference of the en- tropy (approximately 0,5 bit/S) on various German motor- ways (with different topographic characteristics and differ- ent speed limits). The **entropy** of velocity-time profiles in city traffic is characterised by a wide range of segment-based **entropy** values. Many segments achieve nearly the theoreti- cal maximum of **entropy** (e.g. acceleration after stop at traffic lights), others show very little **entropy** (e.g. slowly approach- ing traffic lights). Comparing information **entropy** of motor- ways and city traffic shows significant differences in **relative** **entropy** but quite similar (absolute) **entropy** values.

Show more
By taking projective limits, it is possible to extend Theorems 1.2–1.3 to more general letter spaces. See, e.g., Deuschel and Stroock [6], Section 4.4, or Dembo and Zeitouni [5], Section 6.5, for background on (specific) **relative** **entropy** in general spaces. The following corollary will be proved in Section 8.

41 Read more

In Section II we have introduced the **relative** **entropy** and the cross **entropy** between two sources. Recently, a method has been proposed for the estimate of the cross **entropy** between two strings based on LZ77 [11]. Recalling that the cross **entropy** C(A|B) between two strings A and B, is given by the **entropy** per character of B in the optimal coding for A, the idea is that of appending the two sequences and zipping the resulting file A + B. In this way the zipper “learns” the A file and, when encounters the B subsequence, tries to compress it with a coding optimized for A. If B is not too long [20, 21], thus preventing LZ77 from learning it as well, the cross **entropy** per character can be estimated as:

Show more
11 Read more

The selection of LDB subspaces is used to differentiate every class [11]. It will determine the classification accuracy that will be obtained. By using only one dissimilarity measure, probably the characteristics for certain classes are unable to be recognized [2]. Therefore, this research used the same dissimilarity measures as in [11], which are normalized energy difference and **relative** **entropy** in order to gain high accuracy.

In this section we investigate an important formula, recently put forward in [6], for the scaling of the **relative** **entropy** under a general constraint. The analysis in [6] allows for the possibility that not all the constraints (i.e., not all the components of the vector C ) are linearly independent. For instance, C may contain redundant replicas of the same constraint(s), or linear combinations of them. Since in the present paper we only consider the case where C is the degree sequence, the different components of C (i.e., the different degrees) are linearly independent.

19 Read more

Definition 1.2. Consider two random variables X and Y with a joint probability mass function p(x, y) and marginal probability mass function p(x) and q(y). The mutual information is the **relative** **entropy** between the joint distribution and the product distribution, i.e.

11 Read more

We presented a novel efficient apparatus for tracking the entire relaxation path of maximum **entropy** prob- lems. We currently study natural language process- ing applications. In particular, we are in the process of devising homotopy methods for domain adapta- tion Blitzer (2008) and language modeling based on context tree weighting (Willems et al., 1995). We also examine generalization of our approach in which the **relative** **entropy** objective is replaced with a separable Bregman (Censor and Zenios, 1997) function. Such a generalization is likely to distill further connections to the other homotopy methods, in particular the least angle regression algorithm of Efron et al. (2004) and homotopy methods for the Lasso in general (Osborne et al., 2000). We also plan to study separable Bregman functions in order to de- rive entire path solutions for less explored objectives such as the Itakura-Saito spectral distance (Rabiner and Juang, 1993) and distances especially suited for natural language processing.

Show more
Varga and Szathmary [23] demonstrated that the system (2) has a single internal, globally stable rest point with q<1. This stable rest point corresponds to the “survival of every- body”, in contrast to the Darwinian case where survival of the fittest prevails, which is realized in standard exponen- tial models with q=1. We give a constructive algorithm of solving of system (2). The theorem of Varga and Szathmary immediately follows from this solution. We further show that the frequency distribution of individual types in the population (2) minimizes the Tsallis **relative** **entropy** at each moment of the “internal” time of the population. Population of freely growing parabolic replicators The dynamics of the size of a “freely growing” population is given by equation (1). The solution to this equation is

Show more
11 Read more

Abstract. **Entropy**, conditional **entropy** and mutual information for discrete- valued random variables play important roles in the information theory. The purpose of this paper is to present new bounds for **relative** **entropy** D(p||q) of two probability distributions and then to apply them to simple **entropy** and mutual information. The **relative** **entropy** upper bound obtained is a refinement of a bound previously presented into literature.