2 Background and Related Work
2.4 UCREL Semantic Analysis System
2.4.1 English Semantic Tagger
2.4.1.1 Semantic tagset
The categories representing different semantic fields are symbolized by codes referred to as "semantic tags", and together these semantic tags form a "semantic tagset". The semantic tagset which the USAS framework employs was loosely based on the categorization used in LLOCE (McArthur, 1981; see section 2.3.1.1.2). The UCREL team considered that this offered the most appropriate thesaurus-type classification for the type of sense analysis for which they wanted to develop their semantic tagger. The tagset has since been expanded and amended in the light of lessons learned from the practical tagging problems which were encountered in the course of the research. (Archer, Wilson, & Rayson, 2002, p. 2)
The present USAS tagset has been arranged into a hierarchy of 21 top level semantic categories which further expand into 232 subcategories. With the tagset, everything that exists in the world or can be imagined can be described, whether they be concrete entities or abstract concepts. Each category contains words which are related to each other. These words can be synonyms, antonyms, hyponyms, as well as meronyms, and they represent all parts of speech. Table 1 below displays the top level semantic categories of the hierarchy.
Table 1
Top Level Semantic Categories of the USAS Semantic Tagset A General & Abstract Terms
B The Body & The Individual C Arts & Crafts
E Emotional Actions, States, & Processes F Food & Farming
G Government & The Public Domain
H Architecture, Building, Houses, & The Home I Money & Commerce
K Entertainment, Sports, & Games L Life & Living Things
M Movement, Location, Travel, & Transport N Numbers & Measurement
O Substances, Materials, Objects, & Equipment
P Education
Q Linguistic Actions, States, & Processes S Social Actions, States, & Processes
T Time
W The World & Our Environment
X Psychological Actions, States, & Processes Y Science & Technology
Z Names & Grammatical Words
A list of all top level categories and subcategories is presented in Appendix A in English and in Appendix B in Finnish. The reader is advised to consult these appendices if a semantic tag is not explained or clear from context.
A semantic tag consists of various markers as described in Archer et al. (2002, pp. 1–2). A semantic tag always begins with an upper case letter which indicates the top level semantic category. This letter is followed by a digit which indicates the first subdivision in the field. The simplest possible semantic tag contains one upper case letter and one number. For
example, the tag for "sentimental" is E1 ("Emotional Actions, States and Processes: General") and the tag for "daffodil" is L3 ("Plants"). If there are more subdivisions, one or two more numbers can be added (e.g. the tag for the verb "reschedule" is T1.1 ("Time: General") and the tag for "tomorrow" is T1.1.3 ("Time: General: Future")). According to Piao et al. (2005a), the depth of the semantic hierarchical structure is limited to a maximum of three layers, since this has been found to be the most feasible approach. In theory, it would be possible to
include as many layers of subdivision of meaning until no further subclassification is possible, but semantic field analysis schemes which are too complex may cause problems for practical analysis. That said, the existing semantic categories can be subdivided for a particular task if need be, since the deep hierarchy structure allows to amend the system easily.
In addition to the numbers and digits, it may sometimes be necessary to add one, two, or even three plus or minus markers to the semantic tags to indicate antonymous pairs or a positive or a negative position on a semantic scale. For example, "old" is tagged as T2+, whereas "young" is tagged as T2-; "accessory" is N5++, whereas "inferiority" is A5.1-- ; "archaic" is T3+++, whereas "avant-garde" is T3---. Similarly, comparative and superlative forms of adjectives and adverbs which are formed with inflections are expressed utilizing plus and minus markers. For example, the adjective "easy" has been assigned the semantic tag A12+, the comparative form "easier" is tagged as A12++, and the superlative form "easiest" as A12+++. Moreover, markers "m" and "f" indicating gender are also used. For example, the semantic tag S4f is used for "aunt" and the semantic tag S4m for "bridegroom".
As noted in section 2.3.1.3, not all words always fall neatly into predefined semantic categories but rather are somewhat "fuzzy" sets, where one word can belong to two or even three categories. This multiple membership of categories is indicated in the context of the USAS framework by a "slash tag" (also known as a "portmanteau tag"). By way of illustration, "classroom" is tagged P1/H2, since it can be considered to belong both to the category "Education in General" (P1) and to the category "Parts of Buildings" (H2).
"Neurotic" is tagged B2-/X1, where the semantic tag B2 represents the category "Health and Disease", so B2- stands for ill health or disease, and X1 represents the category
"Psychological Actions, States, and Processes in General". "Tattoo" is tagged as C1/B1 ("Arts and Crafts" / "Anatomy and Physiology"). The semantic tag for the verb "improve" is
represented by A5.1+ ("Evaluation: Good/Bad", with the plus marker indicating "good") and also A2.1 ("Affect: Modify, Change"). Thus, A5.1+/A2.1 stands for "change into good". These markers will be discussed in more detail in section 3.4 with many Finnish-language examples from the equivalent semantic lexical resources for Finnish. In addition, the USAS tagset uses five other symbols (Archer et al., 2002, p. 2), but these will not be discussed here, since they are relatively rare in the English semantic lexical resources and do not appear at all in the Finnish semantic lexical resources.
Unlike many other present-day semantic taxonomies, the USAS semantic tagset is concept-driven rather than content-driven. This means that it aims at providing a conception of the world that is as general as possible, instead of trying to offer a semantic network for specific domains. (Piao et al., 2005a) If or when it is necessary to have a finer-grained taxonomy for a certain task or purpose, it will be relatively easy to expand the present system simply by adding new levels of subcategories or by using more specific slash tags.