While requirements expressed in natural lan- guage have the advantage of being intelligible to both clients and developers, they can of course also be ambiguous, vague and incomplete. Al- though formal languages could be used as an alter- native that eliminates some of these problems, cus- tomers are rarely equipped with the mathematical and technical expertise for understanding highly formalised requirements. To benefit from the ad- vantages of both natural language and formal rep- resentations, we propose to induce the latter au- tomatically from text in a semantic parsing task. Given the software requirement in Example (1), for instance, we would like to construct a represen- tation that explicitly specifies the types of the en- tities involved (e.g., object(account)) and that cap- tures explicit and inferable relationships among them (e.g., owns(user, account)). We expect such formal representations to be helpful in detecting errors at an early stage of the development process (e.g., via logical inference and verification tools), thus avoiding the costs of finding and fixing prob- lems at a later and hence more expensive stage (Boehm and Basili, 2001).
generates example natural language utterances for these logical forms. Specifically, the authors of the parser specify a seed lexicon with canonical phrase/predicate pairs for a particular domain, and subsequently a generic grammar constructs canon- ical utterances paired with logical forms. Because the canonical utterances may be ungrammatical or stilted, they are then paraphrased by crowd work- ers to be more natural queries in the target lan- guage. We argue that this approach has three limitations when constructing semanticparsers for new domains: (1) the seed utterances may induce bias towards the language of the canonical utter- ance, specifically with regards to lexical choice, (2) the generic grammar suggested cannot be used to generate all the queries we may want to sup- port in a newdomain, and (3) there is no check on the correctness or naturalness of the canonical ut- terances themselves, which may not be logically plausible. This is problematic as even unlikely canonical utterances can be paraphrased fluently.
reach its full potential only if it becomes a place where data can be shared and processed by automated tools as well as by people. To enable automation, integration and reuse of data resources on the web need to be encoded in structured machine-readable descriptions of their contents expressed using terms defined in domain ontology. Ontologies are set of knowledge terms, including the vocabulary, the semantic interconnections, and some simple rules of inference and logic of some particular topic (Thomas & Gruber, 1993). In order to create, interpret and compare meta-data annotations, ontologies, explicit definitions of the vocabulary used in an information source, are needed. While meta- data can often be generated from the content of an information source, interpretation and comparison of this data needs the machine to understand these different, new and the existing ontologies.
FrameNet 1 is a web-based general-domainsemantic lexicon that implements the semantic frame theory. Initially started by Baker et al. (1998), it has continued to grow and now contains more than 1,200 semantic frames (Baker, 2017). For every semantic frame in FrameNet, the following information is given: frame title, definition, frame elements and lexical units (LUs). LUs are words that evoke the frame, represented as a combination of their lemmatised form and part-of-speech (POS) tag. The concept of keeping an object, for example, which is stored in FrameNet as a semantic frame entitled Storing is evoked by the LU save.v where v stands for verb, among other LUs. Its core frame elements, which are essential in understanding the meaning of the frame, include Agent, Location and Theme. FrameNet also catalogues non-core frame elements which are used to enhance the understanding of a specific frame. For the Storing frame, Manner, Duration, and Explanation are considered as non-core elements.
The studying of connecting ontologies is new direc- tion in the world. Initial investigation studies original domain-level ontology for heterogeneity and explores how to create new ontology for covering original on- tology with collaboration and consistence, and also containing ontology grouping technology (for example ontology mapping, ontology aligning, ontology merging etc.). In 2007, the paper by Shuaib Karim  pre- sented a CO application framework that need not cover original ontology and focus on studying transfer princi- ple and intermediate concept among original ontologies. Cregan Anne  proposes to build semantic interop- erability by CO and gives some CO examples of gene ontology in 2008. However, in Cregan Anne’s paper, connecting manners of CO, incentive of connecting, method and critical content of building semantic inter- operability are absent. We also notice that Linked Open Data  initiative has become the existing foundation for federal Web of Data.
a corresponding knowledge base K. The learn- ing algorithm (Section 7.1) estimates the parame- ters of a linear model for ranking the possible en- tires in GEN (x, O). Unlike much previous work (e.g., Zettlemoyer and Collins (2005)), we do not induce a CCG lexicon. The lexicon is open domain, using no symbols from the ontology O for K. This allows us to write a single set of lexical templates that are reused in every domain (Section 5.1). The burden of learning word meaning is shifted to the second, ontology matching, stage of parsing (Sec- tion 5.2), and modeled with a number of new fea- tures (Section 7.2) as part of the joint model. Evaluation We evaluate on held out question- answer pairs in two benchmark domains, Freebase and GeoQuery. Following Cai and Yates (2013a), we also report a cross-domain evaluation where the Freebase data is divided by topics such as sports, film, and business. This condition ensures that the test data has a large percentage of previously unseen words, allowing us to measure the effectiveness of the real time ontology matching.
Lastly, the required technologies for implementing the Ontology-based Compliance framework, based on semantic rule-based approach were discussed. The proposed approach in this research is an answer to the demand of Privacy by Design. PRD is a recently introduced concept in domain of compliance to data protection laws. This is to ensure software products and business processes take compliance to data protection and privacy essential in very early stages of their development; design. KN-SoPD, is a useful conceptual model which can be used as a reference model for software developer or compliance officers in order to instruct them on a stage-by-stage process through compliance of a designing system or organisation. The advantage of this model is its unique umbrella of different resources from requirement engineering to laws, standards to best practices. Although KN-SoPD has been developed here for compliance to privacy, but the general conceptual model is designed in a way that can be used for compliance to any law. The designed automated tool (AU-SoPD), provides a very easy to be used environment for developers. The user interface only requires users to instance defined ontological concepts with individuals from system or business context and fill the ontological facts. The logical reasoner and defined rules in this approach, automatically calculates where application of law is necessary in system context and show elicited requirements. It has been developed in a way that all requirements from law to standards and best practices are resulted and shown in same time. But the query-based interface enables the user to trace back any extracted requirement to its parent-compliance resource where the requirement have been refined from. AU-SoPD can be used as a huge repository and knowledge representation environment in compliance to data
Requirement engineers and software developers have to meet users’ wishes in order to create newsoftware products. Such descriptions of software functionalities can be expressed in different ways: For example, by using controlled languages or formal methods, clarity and completeness can be achieved. But non-experts can hardly apply them and therefore do not belong to the user group. For this reason, users are encouraged to express their in- dividual requirements for the desired software ap- plication in NL in order to improve user accep- tance and satisfaction (Firesmith, 2005). In gen- eral, softwarerequirements are expressed through active verbs such as “to calculate” or “to publish” (Robertson and Robertson, 2012). In this work, we distinguish requirements expressed in NL between controlled and uncontrolled language.
With the rapid growth of internet technologies, the web has become the world's largest repository of knowledge. So, it is challenging task of the webmasters to organize the contents of the particular websites to gather the needs of the users. Optimised search engine is used for effectively searching keyword queries which are frequently visited by the user based on their interest. The most frequently used keyword queries are encrypted using homomorphic encryption which are semantically secured through searching mechanism. This paper presents a new framework for a semantic-enhanced Web Page Recommendation (WPR), and a suite of enabling techniques which include semantic network models of domain knowledge and Web usage knowledge, querying techniques, and Web-page recommendation strategies.Time based ranking technique is used to calculate the time taken by each user for visiting the recommended web-pages and also to keep record of the number of users. The privacy of the user is thus enhanced by using the clustered web-pages on the search engine. The proposed technique also provides a set of clustered web- pages which are effectively filtered through private searching keyword query and also provides better web- page features in order to satisfy the user’s requirements.
In this section we describe our new sequence- to-sequence/tree parser using paraphrase atten- tion. The motivation behind our parser is as follows. Existing attention based sequence-to- sequence/tree parsers are slow to retrain since both the encoder and decoder needs to be retrained simultaneously to achieve satisfactory accuracy, hindering real time domain adaptation. While it may be possible to freeze the decoder parameters, and finetune only the encoder, however this is still slow since the error gradients need to be propa- gated all the way back to the encoder. Freezing the encoder parameters and finetuning just the de- coder results in poor performance (shown in eval- uation Section 4) since the model fails to learn the proper encoder representation corresponding to new OOV words.
Abstract. Creation of an unambiguous requirements specification with precise domain vocabulary is crucial for capturing the essence of any software system, either when developing a new system or when recov- ering knowledge from a legacy one. Software specifications usually main- tain noun notions and include them in central vocabularies. Verb or ad- jective phrases are easily forgotten and their definitions buried inside im- precise paragraphs of text. This paper proposes a model-based language for comprehensive treatment of domain knowledge, expressed through constrained natural language phrases that are grouped by nouns and in- clude verbs, adjectives and prepositions. In this language, vocabularies can be formulated to describe behavioural characteristics of a given prob- lem domain. What is important, these characteristics can be linked from within other specifications similarly to a wiki. The application logic can be formulated through sequences of imperative subject-predicate sentences containing only links to the phrases in the vocabulary. The paper presents an advanced tooling framework to capture application logic specifications making them available for automated transformations down to code. The tools were validated through a controlled experiment.
When the semantics of a sentence are not representable in a semantic parser’s output schema, parsing will inevitably fail. Detection of these instances is commonly treated as an out-of-domain classification problem. How- ever, there is also a more subtle scenario in which the test data is drawn from the same domain. In addition to formalizing this prob- lem of domain-adjacency, we present a com- parison of various baselines that could be used to solve it. We also propose a new simple sentence representation that emphasizes words which are unexpected. This approach im- proves the performance of a downstream se- mantic parser run on in-domain and domain- adjacent instances.
drivers, such as security, availability, performance, usability, maintainability, and others. In total, there were some eighty nine high-level requirements to contend with, which is sizeable. The requirements followed the organisational structure as found in . The requirements section was split into different sub-sections each detailing requirements for a given subsystem. Different properties of requirements are also defined like; different requirements’ attributes information that the overall system should have. Prior to the start of the project, the architects were given sessions where the project was described, including the application domain and structure of the requirements, and any questions or concerns were addressed. Prior to conducting the study, these requirements were validated by two people for acceptability in general, and the semantic content of the document was not altered. The result of this process is that a few grammatical fixes were made, along with the clarification of certain requirements.
Early semantic role labeling was span-based (Gildea and Jurafsky, 2002; Toutanova et al., 2008, inter alia), with spans corresponding to syn- tactic constituents. But, as in syntactic parsing, there are sometimes theoretical or practical rea- sons to prefer dependency graphs. To this end, Surdeanu et al. (2008) devised heuristics based on syntactic head rules (Collins, 2003) to transform PropBank (Palmer et al., 2005) annotations into dependencies. Hence, for PropBank at least, there is a very direct connection (through syntax) be- tween spans and dependencies.
There has been other recent interest in similar datasets for sentence completion (Zweig et al., 2012) and machine reading (Hermann et al., 2015), but un- like other corpora our data is tied directly to Freebase and requires the execution of a semantic parse to cor- rectly predict the missing entity. This is made more explicit by the fact that one third of the entities in our test set are never seen during training, so without a general approach to query creation and execution there is a limit on a system’s performance.
Data For evaluation, we use the navigation task from MacMahon et al. (2006), which includes three environments and the SAIL corpus of instructions and follower traces. Chen and Mooney (2011) seg- mented the data, aligned traces to instructions, and merged traces created by different subjects. The corpus includes raw sentences, without any form of linguistic annotation. The original collection pro- cess (MacMahon et al., 2006) created many unin- terpretable instructions and incorrect traces. To fo- cus on the learning and interpretation tasks, we also created a new dataset that includes only accurate in- structions labeled with a single, correct execution trace. From this oracle corpus, we randomly sam- pled 164 instruction sequences (816 sentences) for evaluation, leaving 337 (1863 sentences) for train- ing. This simple effort will allow us to measure the effects of noise on the learning approach and pro- vides a resource for building more accurate algo- rithms. Table 1 compares the two sets.
Supervised training procedures for seman- tic parsers produce high-quality semanticparsers, but they have difficulty scaling to large databases because of the sheer number of logical constants for which they must see labeled training data. We present a technique for developing seman- tic parsers for large databases based on a reduction to standard supervised train- ing algorithms, schema matching, and pat- tern learning. Leveraging techniques from each of these areas, we develop a semantic parser for Freebase that is capable of pars- ing questions with an F1 that improves by 0.42 over a purely-supervised learning al- gorithm.
Our experiments with word clusters showed that word clusters derived from unlabelled domain texts consistently contribute to a greater parsing accuracy, and that both the domain relevance of the unlabelled data and its quantity are major factors for successful exploitation of word clusters. Experiments with a crowd-sourced PoS lexicon however were not as conclusive: whereas supplying the lexicon to the parser often resulted in certain accuracy gains, they were not as large as those for word clusters. This suggests word clusters created automatically from relevant domain texts are a better tool to deal with unknown vocabulary than a generic hand-crafted and wide-coverage lexicon. Another interesting finding was that co-training was most effective when the evaluation parser was not used for creating extra training data (the so-called tri-training technique), and when removing very short sentences from automatically labelled data before re-training the evaluation parser.
In this example it was also presented the possibility of using the Level Refer- ence to further facilitate the reading of the revealed results. This Level Reference indicates the degree of criticality of the application domain and software re- quirements, showing in a simple color scale how critical they are, with red being an indication for high criticality, the yellow is an indication for the average criti- cality and green an indication for low or no criticality. But this Level Reference can be configured in any way, according to the needs and characteristics of each project.