In the current study we tested two possibilities for the relationship between labels and object representations using a neurocomputational model to capture recent empirical data (Twomey & Westermann, 2016). The target data showed that learned labels affect 10-month-old infants’ looking times in a silent familiarization phase, suggesting that knowing a label for an object directly affects its representation, even when that object is presented in silence. As noted by T&W, both the compound representations and labels as features accounts predict some effect of labels on object representations, however the empirical data could not shed light on which of these two accounts best explained the pattern of results they observed. To untangle these two possibilities, we implemented both accounts in simple auto- encoder models (cf., Mareschal & French, 2000; Twomey & Westermann, 2015). In the compound representations model we instantiated labels on the output layer. This model learned to associate labels with inputs over time such that the presence of visual/haptic input for an object would consistently activate the label, but nonetheless, label representations were separate from visual and haptic object representations (Westermann & Mareschal, 2014). In the labes as features model, labels were represented on the input as well as on the output layer with the same status as the visual and haptic components of object representations (Gliozzi et al., 2009; Sloutsky & Fisher, 2004). Only the labels as features model captured the more rapid decrease in looking to the no-label stimulus exhibited by the infants in T&W’s empirical study.
RE in parietal cortex has been found in at least two other fMRI studies of visual object priming (Dolan et al., 1997; Eger et al., 2007). Although both studies differed methodologically from the present paradigm, their results overall reinforce an important role of parietal cortex in visual object priming. The ﬁ ndings most directly related to the present view-speci ﬁ c repetition effects in IPS come from a study by James et al. (2002). These authors found fMRI response reductions in both lateral occipital cortex and caudal IPS during blocks in which objects were repeated multiple times, but these reductions only generalised over depth rotations in lateral occipital cortex. Thus, IPS responded to repeated (intact) objects only when shown in the same view, as in our study, though this response was RS rather than the RE, unlike in our study. This discrepancy between RS and RE may relate to the differences between two paradigms: The James et al. paradigm involved only passive viewing, and objects were repeated many times within a block (more typical of an “ fMR adaptation ” paradigm; e.g., Grill-Spector et al., 1999), such that their RS re ﬂ ected a reduction in the mean response throughout a block. This reduced average response may be caused by reduced attentional demands across the block, as participants begin to expect the same visual image on every trial. Nonetheless, despite the different direction of repetition effects, James et al's conclusion that the dorsal stream codes for object identity in a strict view-based fashion is consistent with our proposal here that the IPS maintains holistic object representations. Indeed, a metrically- veridical representation (rather than an abstract part-based one) would make sense for guiding actions via the dorsal stream (see Introduction; Milner and Goodale, 1995). Thus, although the exact nature of RS vs RE in parietal cortex is dif ﬁ cult to establish, it is clear that they are view-speci ﬁ c and correlate with behavioural priming, suggesting a role in visual object recognition.
13 Read more
Forty-two consecutive patients with unilateral strokes were recruited from the rehabilitation ward of the Ospedali Riuniti in Trieste (age: mean = 64.2; Standard Deviation (SD) = 11.0 years; education: mean = 9.4; SD = 3.7 years). Only patients with no previous neurological history, and who had CT- or MRI-scans available were included. The lesions in each patient were mapped into standard space using MRIcro by a neuroradiologist (M.U.). Twenty-five neurologically healthy individuals matched for age and education (age: mean = 66; SD = 11; education: mean = 8.96; SD = 4.1) with the patient group were recruited from patients’ and staff’s relatives, as well as from the rehabilitation ward of the Ospedali Riuniti in Trieste, where they were treated following orthopedic surgery. The Revised Standardized Difference Test (RSDT) was used to detect impaired performance for single patients against controls (Crawford & Garthwaite, 2006). The scatter plots in Figure 7 plot modified t-values as calculated with the procedure described in Crawford and Garthwaite (2006). Thirty-five of the patients completed the object identification and object use task using the same set of 29 real objects (bottle, cigarette, coffee mug, comb, dust cloth, eraser, fork, glass, gun, hammer, iron, jug, key, knife, ladle, lemon squeezer, light bulb, lipstick, match stick, paintbrush, pen, razor, saw, scissors, screwdriver, wrench, spoon, tennis racket, tooth brush). Seven patients completed the same tasks with a subset (n = 20) of those objects (excluding: bottle, dust cloth, fork, glass, knife, ladle, match stick, racket, tooth brush). Patients were compared to controls taking into account only the items in common to patients and controls (control performance for all 29 items: naming: mean = 99.2; standard deviation (SD) = 1.5; object use: mean = 96.3; SD = 4.4; control performance for 20 items: naming: mean = 99.6; SD = 1.4; object use: mean = 96.5; SD = 4.6). No feedback (either positive or negative) was given to either patients or controls
25 Read more
theorists (e.g. Caramazza, Hillis, Rapp & Romani, 1990) suggest that representations of items that share some meaning are clustered together in the semantic system, and that category- specific deficits reflect instances where the neural structures that hold the representations for only one cluster have been damaged. Other theorists (e.g. Humphreys & Forde, 2001) consider that a "semantic" deficit could reflect damage to the structural description system, because objects that have a similar meaning are also likely to share visual features. For example, the structural description of an animal may be likely to include four legs, a tail and fur. Thus an inability to identify these features will disproportionately affect objects in the animal category, sparing the ability to describe fruit and vegetables. Capitani, Laiacona, Mahon and Caramazza (2003) conducted a review of 79 published reports of patients with category-specific deficits in an attempt to examine these two positions and concluded that the structural description system operates relatively independently from conceptual knowledge. That is, it appears that category-specific semantic deficits arise from damage to the semantic system and are not artefacts of damage to other knowledge stores.
35 Read more
The model’s behaviour on a specific task is determined by the schemas present in the schema network, the object representations present in the object representation network, and numerous parameters that control the flow of activation within and between the networks. Previous work has demonstrated that with appropriate schemas, object representations and parameter settings the model is able to produce well- formed sequences of action in tasks such as preparing a cup of instant coffee and packing a child’s lunchbox, but that when the parameters are varied, for example by reducing the propagation of top-down excitation in the schema network or adding normally distributed random noise to activation levels in any of the networks, the model produces action errors ranging from slips and lapses similar to those of neurologically healthy individuals when distracted (Reason, 1979) to disorganised behaviour similar to that of patients who have suffered a closed head injury or unilateral stroke (cf. Cooper et al., 2005). Thus, increased noise in the schema network may lead to sequential errors when a schema becomes active either before its preconditions are met (resulting in an anticipation error) or after it has already been successfully completed (resulting in a perseverative error). Similarly such noise may lead to a schema failing to be activated above threshold when it should (resulting in an omission error), being activated above threshold when it is not appropriate to the task at hand (resulting in an action addition), or being deselected in favour of a related but currently inappropriate schema (resulting in a blend or capture error).
22 Read more
information . In our LaF model, labels were represented on the input as well as on the output layers in exactly the same way as the visual and haptic components of object representations , . Only the LaF model captured the longer looking to the previously labeled stimulus exhibited by the infants in Twomey and Westermann’s  empirical study. These results offer converging evidence that labels may have a low-level, featural status in infants’ early representations. In line with recent computational work ,  we chose to explore such low-level accounts using a simple associative model that could account for the nuances of recent empirical data . Our LaF model offers a parsimonious account of Twomey and Westermann’s  results, in which looking time differences emerge from a low-level novelty effect , , , without the need to specify qualitatively different, top- down representations , , . Specifically, as argued in , and as implemented in the LaF model, over background training the label is learned as part of the object representation. Thus, when the object appears without the label there is a mismatch between representation and reality. This mismatch leads to an increase in network error for the previously labeled stimulus only, which has been interpreted in the literature as a model of longer looking times , , –. Further, these results delineate between the two possible explanations for infants’ behavior in the empirical task; specifically, our results support accounts of early word learning in which labels are initially encoded as low-level, perceptual features and integrated into object representations.
representation and in the other by two or more modality specific object representations. According to the crossmodal model, each sense modality represents particular objects. The visual system represents particular objects, and the auditory system represents particular objects, and there are principles that ensure that when the same distal object is represented both visually and auditorily, information about the non-modality specific properties of these objects is integrated. The result of the integration is that, with respect to features that can be perceived with more than one sense, the visual-object representation and the auditory-object representation represent the object as having the same features, e.g., as being at the same location, occurring at the same time, and so on, but the visual-object representation of the object will also represent it as having features that are specific to vision, and the auditory- object representation of the object will also represent it as having features specific to audition. Perception is crossmodal in the sense that information from other senses contributes to and helps to determine what is represented in any particular sense modality. But it is not amodal because there are distinct object-representations of the same distal or environmental object in each of the sense modalities, and these distinct sense-specific object-representations explain our perceptual awareness of objects perceived with each of the senses.
23 Read more
Regarding recognition from still images, Ullman introduced the representation of 3D objects based on view exemplars [Ullman and Basri, 1991] and several recent approaches use a sample of appearance views deliberately taken to build a model [Rothganger et al., 2003, Ferrari et al., 2004, Savarese and Fei-Fei, 2007]. The au- thors of [Savarese and Fei-Fei, 2007] propose to learn object category models encod- ing shape and appearance from multiple images of the same object category by relat- ing homographies between the same plane in multiple views. [Rothganger et al., 2003] extract 3D object representations based on local affine-invariant descriptors of their images and the spatial relationships between surface patches. To match them to test images, they apply appearance feature correspondences and a RANSAC proce- dure for selecting the inliers subject to the geometric constraints between candidate matching surface patches. In terms of models, Liebelt’s approach [Liebelt et al., 2008] is very close to ours since it works with a view space of rendered views of 3D models. Appearance features are selected based on their discriminativity regarding aspect as well as object category and they are matched to single images in the standard benchmark datasets.
198 Read more
The difficulty with this line of argument is that appeals to our ordinary conception of the self can cut both ways. We understand the self to be the bearer of our mental representations, so to that extent B fits the template of self-hood. However, we also understand the self to have the properties listed in the ‘Appearance’ column of the table above yet, as the illusion model itself concedes, neither B nor any other candidate for self-hood has these properties. I have used the fact that B meets a certain key criterion of self-hood as a reason to regard B as the self, but an advocate of VST can use the fact that B fails to meet certain criteria of self-hood as a reason not to regard B as the self, and indeed can use those criteria to conclude that no actual entity should be regarded as a self. Perhaps this line of argument can be resisted by claiming that some criteria are more important than others: if being the bearer of mental representations is a core criterion of self-hood, perhaps that gives us a reason to regard B as the self that trumps the other considerations raised. However, although I find it plausible that bearing mental representations is fundamental to our conception of self-hood, it is doubtful that a non-question-begging case could be made for such a conclusion. As such, we must adopt a different tack to show that the illusion model is preferable to the hallucination model.
21 Read more
(Nippold, 1995). As children get older there is a tendency to include more than one characteristic in their definitions. Moreover, prior to 7 years of age children‟s definitions are simple, often focussing on perceptual or functional information (Benelli, Arcuri, & Marchesini, 1988; Storck & Looft, 1973) and lacking in superordinate terms (Watson, 1995). In contrast children over the age of 7 produce definitions that are more precise, include conventional social information (Benilli et al., 1988; Litowitz, 1977) and gradually include superordinates (Snow, 1990; Watson, 1995). The inclusion of the relationship between words e.g. superordinates, subordinates or inclusion relationships captures relationships between word meanings and are germane to hypotheses about WFDs. They provide the possibility of gaining insight into the organisation of the children‟s semantic categories. Furthermore, there is evidence to indicate that word class is a critical factor. Formal definitions of verbs are organised differently to those of objects (Gentner, 1978, 1982; Miller, 1991). There are fewer superordinates available for verbs and fewer verb superordinates produced by children with there being little or no developmental changes in production of verb superordinates (Skwarchuk & Anglin, 1997). The few studies that have compared children‟s skills at defining verbs and nouns have highlighted the significant differences between the word classes (Anglin, 1985; Nelson, 1978). Thus, definitions meet the requirement to assess semantic representations, demonstrate developmental progress and provide word class effects.
27 Read more
We presented a method that learns word mean- ings from video paired with sentences. Unlike prior work, our method deals with realistic video scenes labeled with whole sentences, not indi- vidual words labeling hand delineated objects or events. The experiment shows that it can cor- rectly learn the meaning representations in terms of HMM parameters for our lexical entries, from highly ambiguous training data. Our maximum- likelihood method makes use of only positive sen- tential labels. As such, it might require more train- ing data for convergence than a method that also makes use of negative training sentences that are not true of a given video. Such can be handled with discriminative training, a topic we plan to ad- dress in the future. We believe that this will allow learning larger lexicons from more complex video without excessive amounts of training data. Acknowledgments
11 Read more
To conclude, Figure 10 shows an example of trajectory- based video description using spatiotemporal object trajectories of two faces and the corresponding object prototypes (frontal and profile faces). Only the true tracks are computed by the proposed algorithm and false de- tections and associated tracks are filtered out using skin color segmentation and postprocessing. Videos results are available at http://www.elec.qmul.ac.uk/sta ﬃ nfo/andrea/ detrack.html.
Entailment Relations Turkers were instructed to assign the lowest score when they could not un- derstand the consequent of the entailment relation. As a baseline, 1000 randomly sampled implications that meet our patterns have a quality of 0.33. Fig- ure 4b shows that extracting high quality entailment is harder than object-object relations likely because supposition and consequent need to coordinate. Re- lations involving furniture are rated higher and man- ual inspection revealed that many relations about furniture imply stative verbs or spatial terms. Generalized Relations To evaluate generaliza- tions, Figure 4c, we also present users with defini- tions 4 . As a baseline, 200 randomly sampled gen-
Finally, we come to the third type of 3-D model which is based on surface represen- tations. There are two types of surface primitive (or surface patch): planar patches and curved patches. Although there is no universal agreement about which is the best, the planar patch approach is quite popular and yields polyhedral approximations of the ob- ject being modelled. This is quite an appropriate representation for man-made objects which tend predominantly to comprise planar surfaces. It is not, however, a panacea for 3-D representational problems and it would appear that many of the subtleties of 3-D shape description cannot be addressed with simplistic rst-order planar representations. Nevertheless, it does have its uses and, even for naturally curved objects, it can provide quite a good approximation to the true shape, if an appropriate patch size is used.
Together, these studies illustrate how VR can be applied to gain novel insights into the neural mechanisms of spatial navigation in rodents. However, there are important differences between rodent and primate spatial navigation (see Zhao, 2018, for a recent review). In humans, body-based cues may not always be critical for efficient navigation (e.g. Waller and Greenauer, 2007). A greater reliance on visual cues in humans and non-human primates during navigation may lead to differences between the representation of VR in comparison to real-world environments across species (Ekstrom, 2015). In rodents, the proportion of photoreceptors on the retina is substantially lower and they have limited binocular overlap and are more dependent on head movements to cover their visual surroundings (Huberman and Niell, 2011). The primate visual system, in contrast, is able to extract spatial information over larger distances, and some spatial cell types observed in the primate MTL are primarily tuned to visual information (Ekstrom et al., 2003). Moreover, Killian et al. (2012) found evidence for grid cells in the entorhinal cortex of non-human primates during visual exploration of distant space, a finding that was subsequently replicated and extended in humans by means of functional magnetic resonance imaging (fMRI), suggesting that the representation of the visual search space in 2D may drive the signal (Julian et al., 2018; Nau et al., 2018b). These findings suggest that primates, in contrast to rodents, may entertain multiple representations of space in parallel when lying in the scanner or when navigating in VR, resulting in a
Low-dimensional vector representations of objects [1, 2, 3] are a key part of many computer systems. Object embedding makes it possible to calculate between discrete objects so that mathematical methods can be used to find and quantify the relationship between different discrete objects, including but not limited to calculating cosine distances, calculating clusters, and so on. This method is widely used in user portrait and recommendation system. In this paper, we discuss a novel method of neural embedding by using the attributes of objects as priori information to get better object vector representations. Logs contain existing priori knowledge which can be used to find the relationship between objects and establish a model.
In this chapter, we propose a method to integrate multiple representa- tions directly into the clustering algorithm. Our method is based on the density-based clustering algorithm DBSCAN [EKSX96] that provides sev- eral advantages over other algorithms, especially when analyzing noisy data. Since our method employs a separated feature space for each representa- tion, it is not necessary to design a new suitable distance measure for each new application. Additionally, the handling of objects that do not provide all possible representations is integrated naturally without defining dummy values to compensate the missing representations. Last but not least, our method does not require a combined index structure, but benefits from each index that is provided for a single representation. Thus, it is possible to em- ploy highly specialized index structures and filters for each representation. We evaluate our method for two example applications. The first is a data set consisting of protein sequences and text descriptions. Additionally, we applied our method to the clustering of images retrieved from the internet. For this second data set, we employed two different similarity models. The introduced solutions were published in [KKPS04a].
269 Read more
decision maker would have had suﬃcient experience with the choice domain (in our example, movies) so as to build latent attribute re- presentations of some dimensionality L for the various objects in the domain. When presented with a set of available objects at the time of choice (Toy Story or Star Wars), our framework predicts that the deci- sion maker would ﬁ rst map the objects onto their latent attributes. These multiattribute representations would then be aggregated into choices using one of the decision rules we have speciﬁed above. In the case of WAD, each attribute of the two movies would be assigned a separate continuous weight and the decision maker would choose the object with the higher weighted additive utility; in the case of WP, the objects would be compared against each other on every attribute di- mension, and these binary comparisons would be aggregated with continuous weights to determine the object with the highest weighted pros or cons; in the case of EW the decision maker would use equal weights (+1 for desirable latent attributes or − 1 for undesirable latent attributes) so as to give each attribute the same importance when computing utilities; for TAL each latent attribute would be transformed into a binary representation prior to aggregation, so as to tally the total number of good vs. bad attributes; for LEX only a single latent attribute would be used to calculate the relative desirabilities of the objects, and the object with the highest or lowest value on this attribute would be chosen; and for FFT, the decision maker would go through attributes sequentially, and choose one of the two objects if an exit condition on the attribute in consideration is satis ﬁ ed.
19 Read more
Our argument could also go wrong in describing the nature of adults’ number concepts. If instead of arriving at There, children arrive at Elsewhere, then the problems we raised may be irrelevant. We assumed that children even- tually attain a concept (i.e., mental representation) of the positive integers as distinct individuals that obey the Dede- kind-Peano axioms – what Hodes calls numerical-individual concepts. But several commentators (see Table R1C) ques- tion this assumption, suggesting instead that even adults may have no more than probabilistically defined number concepts (Morris & Masnick) or metalinguistic concepts of numerals (Hodes). Others doubt our conjecture that adults’ number concepts are imposed top-down by schemas reflecting the axioms (Noe¨l, Gre´goire, Meert, & Seron [Noe¨l et al.]). Perhaps such concepts are use-based (Smith) or reflect an underlying notion of equivalence between sets through one-to-one mapping of their elements (Decock; Pietroski & Lidz). We discuss these possibilities in section R5. The pigeonholing in Table R1 is one way to organize the commentaries, but it obviously does not capture their full import. Many commentators’ points fall into more than one of our categories. For example, if you believe that adults’ representations of the positive integers involve equiv- alence relations on sets of objects, then you presumably think that the starting point for understanding numbers is the ability to represent sets, and that the route from early to mature representations takes sets into equivalence classes of these sets. The likely There carries implications for Here and for the route between them. Our purpose is only to indicate the relative emphasis of the critiques, and we have located them in Table R1 at the position where we believe their points are most telling. Within each of the three groups in Table R1, we have also ordered the com- mentaries roughly from those taking more conservative approaches (i.e., defending current theories) to those adopt- ing novel proposals. This ordering suffers from the usual problem of having to collapse dimensions, and we hope readers won’t take our listing as more than an outline.
65 Read more
190 Eyetracking data were successfully collected in 16 of 18 subjects 191 using an infrared video eyetracking system (iView XTM MRI 50Hz, 192 SensoMotoric Instruments, Teltow, Germany). For each run, the hori- 193 zontal eye movement data were low-pass ﬁ ltered and drift corrections 194 were performed. As a measure of ﬁ xation reliability, we computed the 195 percentage of recorded eye gaze positions during stimulus presentation 196 within a 1.93° visual angle circle around the center of the ﬁ xation cross. 197 This radius corresponded to the eccentricity of the inner edges of the 198 two stimulus-containing boxes (see Fig. 1A). In addition, we computed 199 the number of saccades to the intact objects and the noise stimuli, sep- 200 arately for the attended and the unattended condition. Saccades were 201 de ﬁ ned as events of at least three consecutive data points in velocity 202 space exceeding a velocity criterion of 30°/s. Saccades were counted as 203 object-directed or noise-directed saccades, when their endpoint was lo- 204 cated within the object-containing box, or the noise-containing box, 205 respectively.