A more direct approach is to design a discriminative graphical model that models the **conditional** distribu- tion P (Y|X) instead of modeling the joint probability as in generative model (Mccallum et al., 2000; Lafferty, 2001). **Conditional** **random** **fields** (CRF) are a typical example of this approach. Maximum Margin Markov network (M3N) (Taskar et al., 2004) go further by fo- cusing on the discriminant function (which is defined as the log of potential functions in a Markov network) and extend the SVM learning algorithm for structured prediction. While using a completely different learning algorithm, M3N is based on the same graphical mod- eling as CRF and can be viewed as an instance of a CRF. Based on log-linear potentials, CRFs have been widely used for sequential data such as natural lan- guage processing or biological sequences (Altun et al., 2003; Sato & Sakakibara, 2005). However, CRFs with log-linear potentials only reach modest performance with respect to non-linear models exploiting kernels (Taskar et al., 2004). Although it is possible to use kernels in CRFs (Lafferty et al., 2004), the obtained dense optimal solution makes it generally inefficient in practice. Nevertheless, kernel machines are well known to be less scalable.

Show more
The paper is organized as follows. We start by introducing **Conditional** **Random** **Fields**. Then we describe CRF architecture for dealing with multimodal classes and describe training algorithms for learning these models with partially labeled data. We also show how we exploit segmental features for online handwriting signal. At last, we provide experimental results on on-line handwritten character recognition, where we compare our **conditional** models with more standard Markovian models.

Abstract
Pixel-level labelling tasks, such as semantic segmenta- tion, play a central role in image understanding. Recent ap- proaches have attempted to harness the capabilities of deep learning techniques for image recognition to tackle pixel- level labelling tasks. One central issue in this methodology is the limited capacity of deep learning techniques to de- lineate visual objects. To solve this problem, we introduce a new form of convolutional neural network that combines the strengths of Convolutional Neural Networks (CNNs) and **Conditional** **Random** **Fields** (CRFs)-based probabilistic graphical modelling. To this end, we formulate mean-field approximate inference for the **Conditional** **Random** **Fields** with Gaussian pairwise potentials as Recurrent Neural Net- works. This network, called CRF-RNN, is then plugged in as a part of a CNN to obtain a deep network that has de- sirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF modelling with CNNs, mak- ing it possible to train the whole deep network end-to-end with the usual back-propagation algorithm, avoiding offline post-processing methods for object delineation.

Show more
17 Read more

7 Conclusions
We have presented the use of virtual evidence as a principled way of incorporating prior knowledge into **conditional** **random** **fields**. A key contribu- tion of our work is the introduction of a novel semi-supervised learning objective for training a CRF model integrated with VE. We also found it useful to create so-called collocation-based VE, assuming that tokens close to each other tend to have consistent labels. Our evaluation on the CLASSIFIEDS data showed that the learning ob- jective presented here, combined with the use of collocation-based VE, yielded remarkably good accuracy performance. In the future, we would like to see the application of our approach to other tasks such as (Li et al., 2009).

Show more
One may approach this task as a sequence label- ing problem and apply methods such as the linear- chain **conditional** **random** **fields** (CRFs) (Lafferty et al., 2001). However, this solution ignores a use- ful property of the task: the space of possible la- bel sequences is much smaller than that enumer- ated by a linear-chain CRF. There are two impli- cations. First, the normalization constant in the linear-chain CRF is too large because it also enu- merates the impossible sequences. Second, the re- striction to the correct space of label sequence per-

Finally, we conclude in Section 7.
2 **Conditional** **random** **fields**
CRFs are undirected graphical models which de- fine a **conditional** distribution over a label se- quence given an observation sequence. We use a CRF to model many-to-one word alignments, where each source word is aligned with zero or one target words, and therefore each target word can be aligned with many source words. Each source word is labelled with the index of its aligned target, or the special value null, denot- ing no alignment. An example word alignment is shown in Figure 1, where the hollow squares and circles indicate the correct alignments. In this example the French words une and autre would both be assigned the index 24 – for the English word another – when French is the source lan- guage. When the source language is English, an- other could be assigned either index 25 or 26; in these ambiguous situations we take the first index.

Show more
1
Z ψ 1 (F |A)ψ 2 (F |B)ψ 3 (F |C)ψ 4 (F, G)ψ 5 (G|D) .
The main focus of this investigation, namely **conditional** **random** **fields** (CRFs) , is introduced in Chapter 4. These models can be represented by probabilistic graphical models. One of the most well-known special case, the linear-chain CRF, is also a type of weighted finite state transducer. CRFs can be represented by **conditional** graphical models, which means that instead of having a graph that represents a joint distribution p(X), we directly model the distribution p(Y|X). Here, Y is the set of unknown **random** variables that we would like our model to predict, while X is the set of variables whose values we know because we can observe them directly. In Figure 1.3, an example of a CRF is shown.

Show more
161 Read more

1.6 Conclusion
**Conditional** **random** **fields** are a natural choice for many relational problems be- cause they allow both graphically representing dependencies between entities, and including rich observed features of entities. In this chapter, we have presented a tutorial on CRFs, covering both linear-chain models and general graphical struc- tures. Also, as a case study in CRFs for collective classification, we have presented the skip-chain CRF, a type of general CRF that performs joint segmentation and collective labeling on a practical language understanding task.

35 Read more

Abstract
In this paper, we propose a novel approach for supervised classification of linguistic metaphors in an open domain text using **Conditional** **Random** **Fields** (CRF). We analyze CRF based classification model for metaphor detection using syntactic, conceptual, affective, and word embeddings based features which are extracted from MRC Psycholinguistic Database (MRCPD) and WordNet-Affect. We use word embeddings given by Huang et al. to capture information such as coherence and analogy between words. To tackle the bottleneck of limited coverage of psychological features in MRCPD, we employ synonymy relations from WordNet ® . A comparison of our approach with previous approaches shows the efficacy of CRF classifier in detecting metaphors. The experiments conducted on VU Amsterdam metaphor corpus provides an accuracy of more than 92% and F- measure of approximately 78%. Results shows that inclusion of conceptual features improves the recall by 5% whereas affective features do not have any major impact on metaphor detection in open text.

Show more
10 Read more

Nevertheless, in our notation, we will let factors de- pend on the whole observed entity x to denote that all of x can be accessed if necessary.
For our structure recognition task, the graphical model G exhibits the structure shown in figure 3, i.e., there are multiple connected chains of variables with factors defined over single-node cliques and two-node cliques within and between chains; the pa- rameters of factors are tied across time. This corre- sponds to the factorial CRF structure described in Sutton and McCallum (2005). Structure recognition using **conditional** **random** **fields** then involves two separate steps: parameter estimation, or training, is concerned with selecting the parameters of a CRF such that they fit the given training data. Prediction, or testing, determines the best label assignment for unknown examples.

Show more
10 Read more

This motivated us to seek to incorporate such incomplete annotations into a state of the art ma- chine learning technique. One of the recent ad- vances in statistical NLP is **Conditional** **Random** **Fields** (CRFs) (Lafferty et al., 2001) that evaluate the global consistency of the complete structures for both parameter estimation and structure infer- ence, instead of optimizing the local conﬁgurations independently. This feature is suited to many NLP tasks that include correlations between elements in the output structure, such as the interrelation of part-of-speech (POS) tags in a sentence. However, conventional CRF algorithms require fully anno- tated sentences. To incorporate incomplete anno- tations into CRFs, we extend the structured out- put problem in Section 3. We focus on partial an- notations or ambiguous annotations in this paper.

Show more
These results reveal that a radical resampling
(leaving only 1000 negative instances), when us- ing **Conditional** **Random** **Fields**, does not have a dramatic effect in performance. While Recall in- creases almost a 10% (from 0.89 to 0.98), Preci- sion suffers from a strong decrease, in this case 94% (from 0.97 to 0.03). With scores nearing or above 90% in Precision, Recall and F-Measure, it seems safe to assume that using linguistic, statis- tic and structural features combined with CRF im- prove dramatically a DE system. In compari- son with previous work in this field, where most datasets consisted in more structured text than in- terview transcripts, it also seems reasonable to claim that this method is better suited for more un- structured language.

Show more
2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0237 Japan { jun, mcd, isozaki } @cslab.kecl.ntt.co.jp
Abstract
This paper proposes a framework for train- ing **Conditional** **Random** **Fields** (CRFs) to optimize multivariate evaluation mea- sures, including non-linear measures such as F-score. Our proposed framework is derived from an error minimization ap- proach that provides a simple solution for directly optimizing any evaluation mea- sure. Specifically focusing on sequential segmentation tasks, i.e. text chunking and named entity recognition, we introduce a loss function that closely reflects the tar- get evaluation measure for these tasks, namely, segmentation F-score. Our ex- periments show that our method performs better than standard CRF training.

Show more
5 Factorial **Conditional** **Random** **Fields** Extensions to the linear-chain CRF model have been proposed in previous research efforts to encode long range dependencies. One such well-known exten- sion is the semi-Markov CRF (semi-CRF) (Sarawagi and Cohen, 2005). Motivated by the hidden semi- Markov model, the semi-CRF is particularly helpful in text chunking tasks as it allows a state to persist for a certain interval of time steps. This in practice often leads to better modeling capability of chunks, since state transitions within a chunk need not pre- cisely follow the Markov property as in the case of linear-chain CRF. However, it is not clear how such a model can benefit our task, which requires word- level labeling in addition to sentence boundary de- tection and sentence type prediction.

Show more
10 Read more

on top of the generative model introduced in Lu et al. (2008) (referred to as the LNLZ08 system).
We first present a baseline model by directly
“inverting” the LNLZ08 system, where an NL sen- tence is generated word by word. We call this model the direct inversion model. This model is unable to model some long range global depen- dencies over the entire NL sentence to be gener- ated. To tackle several weaknesses exhibited by the baseline model, we next introduce an alterna- tive, novel model that performs generation at the phrase level. Motivated by **conditional** **random** **fields** ( CRF ) (Lafferty et al., 2001), a different pa- rameterization of the **conditional** probability of the hybrid tree that enables the model to encode some longer range dependencies amongst phrases and MRs is used. This novel model is referred to as the tree CRF -based model.

Show more
10 Read more

This paper reports how to build a Chinese Grammatical Error Diagnosis system based on the **conditional** **random** **fields** (CRF). The system can find four types of grammatical errors in learners’ essays. The four types or errors are redundant words, missing words, bad word selection, and disorder words. Our system presents the best false positive rate in 2015 NLP-TEA-2 CGED shared task, and also the best precision rate in three diagnosis levels.

b t (s t ) = P(H 1,1 , H 1,2 , ..., H t,m−1 , H t,m ) (2) where H t,m is a binary variable indicating the truthfulness of the m-th hypothesis at turn t.
For each turn, the model takes into account all the slots on the N -best lists from the first turn up to the current one, and those slots predicted to be true are added to the dialog state. The graphical model is illustrated in figure 1. To predict dialog state at turn t, the N -best items from turn 1 to t are all considered. Hypotheses assigned true labels are included in the dialog state. Compared to the DBN approach, the dialog states are built ‘jointly’. This approach is reasonable because what the tracker generates is just some combinations of all N -best lists in a session, and there is no point guessing be- yond SLU outputs. We leverage general structured **Conditional** **Random** **Fields** (CRFs) to model the probabilities of the N -best items, where factors are used to strengthen local dependency. Since CRF is a discriminative model, arbitrary overlap- ping features can be added, which is commonly considered as an advantage over generative mod- els.

Show more
In this paper, we applied **conditional** **random** **fields** to the labelling of a segmented TV stream where video segments are described with robust descriptors. The TV stream was segmented with two different segmentation processes, each process leading to a specific dataset: manual and auto- matic. Our goal was to identify five kinds of broadcasts in each dataset. We obtained interesting results on the manual dataset where the precision and the recall were up to 90%. Results are lower on the automatic dataset, especially in multiple labelling where we noticed many confusions between labels. Nevertheless, CRF’s results exceed those of other classification methods such as Hidden Markov Models, which is also a probabilistic graphical model. Indeed, the CRF’s capability to handle the sequential context between video segments makes it possible to separate different kinds of programs and breaks, even when they are described with very simple features. Of course, this approach chiefly relies on the quality of the stream pre-processing steps. Dealing with the automatically segmented data is thus more challenging, especially for the multiple labelling task, which leads to high confusion between certain labels (commercial vs. jingle, commercial vs. and sponsorship...). This weakness can be explained by the over-segmentation of the automatic dataset: broadcasts are divided into many consecutive seg- ments that features are not informative enough to discrimi- nate.

Show more
10 Read more

27 Read more

Our framework eliminates the problem of overfit- ting, and offers the full advantages of a Bayesian treatment. Unlike the ML approach, we estimate the posterior distribution of the model parameters during training, and average over this posterior during inference. We apply an extension of EP method, the power EP method, to incorporate the partition function. For algorithmic stability and accuracy, we flatten the approximation structures to avoid two-level approximations. We demon- strate the superior prediction accuracy of BCRFs over **conditional** **random** **fields** trained with ML or MAP on synthetic and real datasets.

Show more