models works well in many NLP tasks (Chang et al., 2007; Mann and McCallum, 2008; Druck et al., 2008; Bellare et al., 2009; Ganchev et al., 2010). Such constraints can be used in a semi-supervised or unsupervised fashion. For example, (Mann and Mc- Callum, 2008) shows that using CRF in conjunction with auxiliary constraints on unlabeled data signif- icantly outperforms traditional CRF in information extraction, and (Druck et al., 2008) shows that using declarative constraints alone for unsupervised learn- ing achieves good results in text classification. We show that declarative constraints can be highly use- ful for the identification of information structure of scientific documents. In contrast with most previous works, we show that such constraints can improve the performance of a fully supervised model. The constraints are particularly helpful for identifying low-frequency information categories, but still yield high performance on high-frequency categories.
10 Read more
Our experiments show that weakly-supervised learning can be used to identify AZ in scientific documents with good accuracy when only a limited amount of labeled data is available. This is helpful thinking of the real-world application and porting of the approach to different tasks and domains. To the best of our knowledge, no previous work has been done on weakly-supervised learning of information structure according to schemes of the type we have focused on (Teufel and Moens, 2002; Mizuta et al., 2006; Lin et al., 2006; Hirohata et al., 2008; Shatkay et al., 2008; Liakata et al., 2010).
11 Read more
In the past, many methods have been developed to utilise the massive amount of available scientific literatures, including those measuring the prolificacy of authors  and detect- ing research trends. Many papers have applied data mining techniques to analyse the literatures  and summarise disciplines . A lot of focuses have been paid towards the bibliographic data, leading to either science landscapes , or the exploration of citation networks, or the inter- relationships of authors. CiteSpace II  is an approach that is based on citation network analysis. IN-SPIRE by Wong et al.  is a text analytics tool to identify research topics over time; Heimerl et al.  presented an interactive visual approach for scientific literature classification; and the PaperLens system  is another system with similar fea- tures. It also shows the most often cited authors and papers every year. CiteRivers  focused on citation by present- ing a new representation of citations based on community structure and the underlying topics. Examples of the recently published work on visual approaches for scientific literature browsing and search based on topic exploration include the Action Science Explorer , which was designed to struc- ture and analyse which a collection of scientific documents for literature overview, and Beck et al. , which supported paper search and key paper identification through the struc- ture of citation network; ThoughFlow , which visualised literature collections using topic models to bridge the infor- mation gap between activities for research idea generation; Cite2Vec , which presented a citation-driven document exploration through word embeddings. However, very few work aims to facilitate joint analysis of contents, topics and citations, leaving a crucial analysis gap here.
18 Read more
Machine Learning for Information Structure Nearly all previous work on automatic detection of information structure has relied on supervised algorithms and, consequently, on corpora consisting of thousands of manually annotated sentences (Teufel and Moens, 2002; Lin et al., 2006; Hirohata et al., 2008; Shatkay et al., 2008; Teufel et al., 2009; Guo and Korhonen, 2011). Recently, (Varge et al., 2012) were the first to apply unsupervised learning to the information structure of scientific documents. They applied standard word-level LDA models to the IMRAD scheme for the biomedical domain (along with their own information structure scheme for the aerospace domain). This purely lexical approach ignores other important linguistic phenomena, such as discourse patterns and syntactic properties, which play a role in information structure. The 35 F-score performance of their model indeed show that there is much scope for improve- ment. Our model integrates a much wider range of linguistic knowledge about the task at both within-document (e.g. discourse patterns) and cross-document (e.g. sentence similarity) levels, and can be flexibly applied to both fully unsupervised and transductive learning scenar- ios, depending on how much prior knowledge about the task is actually available. Although the transductive learning scenario can realistically occur when developing new corpora or applications, it has not been addressed in previous work on our task.
12 Read more
We present a novel system providing sum- maries for Computer Science publications. Through a qualitative user study, we identi- fied the most valuable scenarios for discov- ery, exploration and understanding of scien- tific documents. Based on these findings, we built a system that retrieves and summarizes scientific documents for a given information need, either in form of a free-text query or by choosing categorized values such as scientific tasks, datasets and more. Our system ingested 270,000 papers, and its summarization mod- ule aims to generate concise yet detailed sum- maries. We validated our approach with hu- man experts.
We presented a new framework for automatic in- duction of declarative knowledge and applied it to constraint-based modeling of the information struc- ture analysis of scientific documents. Our main con- tribution is a topic-model based method for unsuper- vised acquisition of lexical, syntactic and discourse knowledge guided by the notion of topics and their key features. We demonstrated that the induced top- ics and key features can be used with two differ- ent unsupervised learning methods – a constrained unsupervised generalized expectation model and a graph clustering formulation. Our results show that this novel framework rivals more supervised alterna- tives. Our work therefore contributes to the impor- tant challenge of automatically inducing declarative knowledge that can reduce the dependence of ML algorithms on manually annotated data.
14 Read more
The article addresses how remnant or transformed colonialist structures continue to shape science and science education, and how that impact might be mitigated within a postcolonial environment in favor of the development of the particular community being addressed. Though cognizant of, and resistant to, the ongoing colonial impact globally and nationally (and any attempts at subjugation, imperialism, and marginalization), this article is not about anticolonial science. Indeed, it is realized that the postcolonial state of science and science education is not simply defined, and may exist as a mix of the scientific practices of the colonizer and the colonized. The discussion occurs through a generic postcolonial lens and is organized into two main sections. First, the discussion of the postcolonial lens is eased through a consideration of globalization which is held here as the new colonialism. The article then uses this lens to interrogate conceptions of science and science education, and to suggest that the mainstream, standard account of what science is seems to represent a globalized- or arguably a Western, modern, secular-conception of science. This standard account of science can act as a gatekeeper to the indigenous ways of being, knowing, and doing of postcolonial populations. The article goes on to suggest that as a postcolonial response, decolonizing science and science education might be possible through practices that are primarily contextually respectful and responsive. That is, localization is suggested as one possible antidote to the deleterious effects of globalization. Trinidad, a postcolonial developing Caribbean nation, is used as illustration.
11 Read more
The need to group search results based on a published date of a web document (HTML or PDF) has increased many folds due to aptness or relevancy of the result based on date. If we analyze research areas in any domain, we can easily understand why researcher or practitioner want to read only current journals and papers or blogs. Technology landscape changes rapidly and within a year some innovation, and findings turned out to be obsolete. Hence it is imperative for any content aggregator to sort documents based on published date before providing the result to their user. The researcher always needs the latest update in their research area hence they are more interested in new journals and research papers on their topic. Similarly, if there are any news articles, everyone wants to read the latest news but in the case of blogs both previous and current topics are required for complete details regarding a subject. We tried to extract published date based on document type because the meaning changes based on document type. For example, the published date for journals, research and news articles are considered as last updated date at the same time for the
Methods which are used for project are based in available scientific literature, legal documents from tourism sector and respective sectors as well as surveys through questionnaires focused on foreign tourists in a sample of 100 questionnaires in which 96 from them were ready to answer. Based to the questionnaires we will see their perceptions regarding development of stabile tourism as well as the impact of tourism in environment in Rugova region. Mainly countries from where tourists came from were: Macedonia, Albania, Montenegro but there were also tourists from other countries as Germany, Switzerland, Italy.
Keyphrases are single or multi-word linguistic units that represent the salient aspects of a doc- ument. The task of ranked keyphrase extraction from scientific articles is of great interest to sci- entific publishers as it helps to recommend arti- cles to readers, highlight missing citations to au- thors, identify potential reviewers for submissions, and analyze research trends over time (Augenstein et al., 2017). Due to its widespread use, keyphrase extraction has received significant attention from researchers (Kim et al., 2010; Augenstein et al., 2017). However, the task is far from solved and the performances of the present systems are worse in comparison to many other NLP tasks (Liu et al., 2010). Some of the major challenges are the var- ied length of the documents to be processed, their structural inconsistency and developing strategies that can perform well in different domains (Hasan and Ng, 2014).
In this paper, we described a study that explores the search space of extractive summaries across four dif- ferent domains. For the news domain we generated all possible extracts of the given documents, and for the literary, scientific, and legal domains we fol- lowed a divide-and-conquer approach by chunking the documents into sections, handled each section independently, and combined the resulting scores at the end. We then used the distributions of the eval- uations scores to generate the probability density functions (pdfs) for each domain. Various statistical properties of these pdfs helped us asses the difficulty of each domain. Finally, we introduced a new scor- ing scheme for automatic text summarization sys- tems that can be derived from the pdfs. The new scheme calculates a percentile rank of the ROUGE- 1 recall score of a system, which gives scores in the range [0-100]. This lets us see how far each sys- tem is from the upper bound, and thus make a better comparison among the systems. The new scoring system showed us that while there is a 20.1% gap between the upper bound and the lead baseline for the news domain, closing this gap is difficult, as the percentile rank of the lead baseline system, 99.99%, indicates that the system is already very close to the upper bound.
Indeed, the abductive concept originally introduced by Aristotle was the American philosopher Charles Sanders Peirce (1839-1914) who developed this approach into an explicit inference theory. Pierce proposed that the traditional mode of inference - induction and deduction - must be supplemented by a third mode - abductive - which he claimed to be qualitatively different from the other two. This Pierce approach can be understood by example of the study of oral interactions. According to Pierce, the study of oral interactions is generally explained by one of two ways, communication theory or explicit interaction theory, these two ways are used as a basis for analyzing actual examples of conversation; where conversation data is taken as the starting point of a conversation to formulate new theoretical concepts and rules. The first "deductive" model can be found in most linguistic pragmatic approaches, whereas the second "inductive" model is more dominant in ethnomethodological conversation analysis, but for practical scientific studies, researchers cannot draw conclusions based only on pure deduction or induction. The essence of any scientific process is the inferential step of some fact which may have puzzled researchers from the outset when setting out some theoretical hypotheses to explain later.
32 Read more
Subsequently, relevance of documents for each micro-topic is decided using two criteria: totality and directness. A cited resource is total if it con- tains all necessary information to derive the an- swer for the micro-topic and partial if it only ad- dresses a special case. A cited resource is also said to be direct, if the answer can be derived with lit- tle intellectual effort from its text, or indirect if the same information requires considerable effort (such as mathematical deduction or reasoning) for the information seeker to reproduce.
There are also sections of the Docs list that can help you find your documents based on how you view the document, such as which documents are Owned by me, Shared with me, Not in collections and so forth. Some documents may exist in more than one section, such as a document someone shared with you that you have opened.
38 Read more
Although Iran is scientifically advanced and has a well- experienced workforce in different fields, scientific cooperation is still in its early stage there (Harirchi, 2003). However, studies have shown that the production of science has grown very well in Iran, especially in recent years. But at the same time, problems such as less interaction of Iranian scholars are dominant over the scientific community in comparison with foreign scholars. Some cultural problems, such as theft of thoughts and the lack of mutual trust, have led the authors and the executives of the research projects to be very cautious and less insistence to work in group work (Osareh, 2005). Interactions and communication are considered the main components in the production of science and the mutual interaction and communication is the essence of scientific growth. Interaction and communication between professors are also among the most important areas of communication in higher education. The scientific communications and interactions mean the group relationships, collaborations and social or scientific interactions of academic teachers both inside and outside the workplace (Ghanei rad, 2006). Interactions, communication and collaborative activities are important in the professional development of faculty members, and they crystallize and visualize it in terms of participation by members in teamwork and team-related community and academic networks such as scientific associations and other scientific communities such as editorial boards, scientific committees research projects, group and joint research projects, etc. can be searched (Nourshahi and Samiei, 2011). The existence of deep interactions and communication between the beneficiaries of education in universities indicates the seriousness of education and, more generally, the development of scientific disciplines. The weakness of academic relations is, conversely, linked to other problems in the field of science. In general, there are several elements such as scientists, universities, academic publications, books, _____________________________
Academic entrepreneurship (AE) 1 is expected to change public research institutions 2 in a fundamental way to ensure they contribute to wealth and job creation. It is widely ac- cepted that this will be achieved by new kinds of relations between societal actors (uni- versities, governments, industry, NGOs and new intermediary institutions), formal and informal linkages and networks (Etzkowitz and Leydesdorff 2000; Link and Siegel 2007). Empirical studies have shown that academic entrepreneurship is a quite hetero- geneous phenomenon: there are strong national and regional differences in how wide- spread, how intense and institutionalised academic entrepreneurship is in different types of research organisations (e.g. Goel and Göktepe-Hulten 2017; Grimaldi et al. 2011). It also significantly varies in different institutional environments in terms of bar- riers and facilitators scientific entrepreneurs encounter (Davey et al. 2015) as well as what kind of characteristics they have (Werker et al. 2017). An “unevenness” of how scientific entrepreneurs commercialise and what their role is in commercialization in different disciplines and how research (and other) institutions accommodate AE is ob- served (Kleinman and Osley-Thomas 2014; Tuunainen and Knuuttila 2009).
23 Read more
Articles accepted at full text will be included in the review and reported studies subject to critical appraisal according to their design and quality. Reviewers will assess the methodologies used in all studies reported in articles accepted at full text. Two reviewers will examine a random subset of at least 25 % of the included stud- ies to assess repeatability of study quality assessment. Variation in methodological and analytical quality of scientific studies will be checked [11, 21–23]. For each study design elements that reduced susceptibility to bias will be recorded. Following the same approach applied by Sciberras et al.  the articles will be categorised according to the study design, respectively into: before after control impact (BACI) studies, control impact (CI) studies and before after (BA) studies; variation to spatial and temporal scale and habitat heterogeneity will be take into account as the two main sources of bias. BACI stud- ies, accounting both spatial and temporal variability in
Abstract. To improve the search performance of retrieval methods using co- citation linkages, this study proposes a technique to enlarge a co-citation net- work by incorporating satellite documents. This technique specifies satellite documents via full-text searches for terms obtained from documents having co- citation linkages with a seed document; the appropriateness of each co-citation linkage is checked by using the strength of the co-citation context based on the results of parsing documents that cite the seed document. This study evaluates search performance using the proposed technique with IR experiments. Specifi- cally, the random walk with restart algorithm, which can compute similarities between the seed document and each document in the network, is applied to the enlarged and initial networks. Scores of the normalized discounted cumulative gain (nDCG@K) were then compared. The results indicate that the search per- formance of the retrieval methods using the enlarged network outperforms those of a baseline method using the initial network.
The home page provides the user with three options. 1. View Database 2. Upload Documents 3. Search documents related to a specific keyword. The upload option lets the user upload files with .txt extension and a maximum size of 500 KB to the django database. Uploading files with any other extension results in file upload failure. The name of the document being the primary key, is a compulsory input along with the document being uploaded. The name of the file entered by the user MUST NOT contain any spaces and can have a maximum length of 200 characters. As soon as the user uploads a document, RAKE is called and a collective database entry of the document, its entered title and the extracted keywords is made. A snap shot of the database is depicted ahead. The search term UI enables the user to enter a key word or a key phrase which gets mapped to the key words of uploaded documents in the database. Consequently, documents whose keywords match the entered key word are displayed to the user, who then chooses the documents to be summarized. The view database option lets the user browse through all the documents in the database and summarize the documents of his/her choice.
As part of their ongoing research, the Swiss Seismological Service (SED) has developed a new Earthquake Catalogue of Switzerland (ECOS’09) which was made available to the project  and will be published officially soon. Although updates to the earthquake catalog were not identified as a having a high potential to reduce the epistemic uncertainty in the hazard, swissnuclear decided to include the catalog updates as part of the PRP to ensure the compatibility of the PRP results with the new generation of seismic hazard assessments for Switzerland which will be based on the new catalogue. A new magnitude conversion/scaling relationship was used for the development of the updated Earthquake Catalogue of Switzerland. Using the geometries of the current SP1 source zones and following the procedures for computing the source parameters described by each expert team in the PEGASOS elicitation summaries, the new catalogue data have been used to calculate new activity rates (a-values), b-values, and Mmax values. A hazard feedback analysis based on the new ECOS and the new source parameters showed to have a significant effect on the hazard results in reducing the mean ground motions and especially at low annual probabilities of exceedance. Furthermore, extensive sensitivity studies on each SP1 parameter for each team were performed in order to identify the most relevant aspects of the SP1 logic trees and it could be confirmed, that the Mmax distributions of the source zones are the most important parameter controlling the hazard results. All other model parameters only play a secondary role with respect to their effect on the hazard results, even if they are scientifically justified to be part of a logic tree. On the other hand, in order to be computationally achievable it turned out that the SP1 logic tree of one expert team needs to be trimmed in order to be implemented. The tree trimming in the PRP is done according to very strict quality assurance criteria. As consequence the project TFI (Technical Facilitator and Integrator) and SP4 group had to build a (trimmed) SP1 “hazard logic tree” entering the hazard computation which was based on the “scientific logic tree” defined by the experts. There might be some value in the future to let the experts build a “hazard logic tree” themselves, even if this requires a longer iterative process and a good focus on only the parts of relevance for the hazard.
10 Read more