Source Code Clone Search

Top PDF Source Code Clone Search:

Why and How to Control Cloning in Software Artifacts. Elmar Juergens

Why and How to Control Cloning in Software Artifacts. Elmar Juergens

The paper [107] by Jiang et al. introduces an approach that can be summarized by dynamic equiva- lence checking. The basic idea is, that if two functions are different, they will return different results on the same random input with high probability. Their tool, called E Q M INER , detects functionally equivalent functions in C code dynamically by executing them on random inputs. Using this tool, they find 32,996 clusters of similar code in a subset of about 2.8 million lines of the Linux ker- nel. Using their clone detector Deckard they report that about 58% of the behaviorally similar code discovered is syntactically different. Since no systematic inspection of the clusters is reported, no precision numbers are available. Again, due to several practical limitations of the approach (e. g., randomization of return values to external API calls), the recall w.r.t. simion detection is unclear. In [1], Al-Ekram et al. search for cloning between different open-source systems using a token- based clone detector. They report that, to their surprise, they found little behaviorally similar code across different systems, although the systems offered related functionality. The clones they did find were typically in areas where the use of common APIs imposed a certain programming style,
Show more

215 Read more

Automatic Code Review by Learning the Revision of Source Code

Automatic Code Review by Learning the Revision of Source Code

Recently, deep learning (Goodfellow, Bengio, and Courville 2016), which is a recent breakthrough in ma- chine learning domain, has been applied in many areas. Soft- ware engineering is not an exception. Yang et al. applied Deep Belief Network (DBN) to learn higher-level features from a set of basic features extracted from commits (e.g., lines of code added, lines of code deleted, etc.) to predict buggy commits (Yang et al. 2015). Guo et al. use word embedding and one/two layers Recurrent Neural Network (RNN) to link software subsystem requirements (SSRS) to their corresponding software subsystem design descriptions (SSDD) (Guo, Cheng, and Cleland-Huang 2017). Xu et al. applied word embedding and convolutional neural network (CNN) to predict semantic links between knowledge units in Stack Overflow (i.e., questions and answers) to help de- velopers better navigate and search the popular knowledge base (Xu et al. 2016). Lee et al. applied word embedding and CNN to identify developers that should be assigned to fix a bug report (Lee et al. 2017). Mou et al. (Mou et al. 2016), ap- plied tree based CNN on abstract syntax tree to detect code snippets of certain patterns. Lam et al. (Lam et al. 2015) combined deep model autoencoder with a information re- trieval based model, which shows good results for identify- ing buggy source code. Huo et al. (Huo, Li, and Zhou 2016; Huo and Li 2017) applied learned unified semantic feature based on bug reports in natural language and source code in a programming language for bug localization tasks. Wei et al (Wei and Li 2017) proposed an end-to-end deep feature learning framework for functional clone detection, which exploiting the lexical and syntactical information via AST- based LSTM network.
Show more

8 Read more

Clone code detector using Boyer–Moore string search algorithm integrated with ontology editor

Clone code detector using Boyer–Moore string search algorithm integrated with ontology editor

In programming languages, duplicate code is very difficult to find out from where the source code is copied from. The detection of similar source code files according to the methods, fields, properties etc. These source code files are from various student assignments and from various projects. It is very difficult to find out the duplicate code from big projects like ERP, Accounts package etc. In most of the applications duplicate is generated according to their requirement programmers are copying the code from various sources using internet. In this paper, our proposed system works on string search or method search algorithms according to the methods used in the program and parameters used in the method.
Show more

9 Read more

Code clone detection using string based tree matching technique

Code clone detection using string based tree matching technique

There are quite a number of works that detect the similarity by representing the code in tree or graph representation and also some using string-based detection, and semantic-based detection. Almost all the clone detection technique had the tendency of detecting syntactic similarity and only some detect the semantic part of the clones. Baxter in his work (Baxter et al., 1998) proposes a technique to extract clone pairs of statements, declarations, or sequences of them from C source files. The tool parses source code to build an abstract syntax tree (AST) and compares its subtrees by characterization metrics (hash functions). The parser needs a “full- fledged” syntax analysis for C to build AST. Baxter's tool expands C macros (define, include, etc) to compare code portions written with macros. Its computation complexity is O(n), where n is the number of the subtree of the source files. The hash function enables one to do parameterized matching, to detect gapped clones, and to identify clones of code portions in which some statements are reordered. In AST approaches, it is able to transform the source tree to a regular form as we do in the transformation rules. However, the AST based transformation is generally expensive since it requires full syntax analysis and transformation.
Show more

12 Read more

Title: A Combined Weighted Approach to Detect Code Cloning

Title: A Combined Weighted Approach to Detect Code Cloning

In this paper, a multi parameters based weighted approach is defined to perform the code detection. This approach includes the textual analysis, token based analysis and the statistical measures to identify the code clone for a software system. In this section, the explorations to the different approaches available for code clone detection are explored. The section also defined the properties associated with these approaches. In section II, the work defined by the earlier researchers is discussed. In section III, the proposed model for code clone detection is explained along with algorithmic approach. In section IV, the conclusion obtained from the work is defined.
Show more

6 Read more

Z80 H Floating Point FROUND pdf

Z80 H Floating Point FROUND pdf

BMATHO OBJ CODE M STMT SOURCE STATEMENT... OBJ CODE M STMT SOURCE STATEMENT.[r]

24 Read more

Scalable Source Code Similarity Detection in Large Code Repositories

Scalable Source Code Similarity Detection in Large Code Repositories

Source code similarity are increasingly used in application development to identify clones, isolate bugs, and find copy-rights violations. Similar code fragments can be very problematic due to the fact that errors in the original code must be fixed in every copy. Other maintenance changes, such as extensions or patches, must be applied multiple times. Furthermore, the diversity of coding styles and flexibility of modern languages makes it difficult and cost ineffective to manually inspect large code repositories. Therefore, detection is only feasible by automatic techniques. We present an efficient and scalable approach for similar code fragment identification based on source code control flow graphs fingerprinting. The source code is processed to generate control flow graphs that are then hashed to create a unique fingerprint of the code capturing semantics as well as syntax similarity. The fingerprints can then be efficiently stored and retrieved to perform similarity search between code fragments. Experimental results from our prototype implementation supports the validity of our approach and show its effectiveness and efficiency in comparison with other solutions.
Show more

11 Read more

Code clone detection using string based tree matching technique

Code clone detection using string based tree matching technique

CODE CLONE DETECTION USING STRING BASED STRING BASED TREE MATCHING TECHNIOUE.. Contains confidential information under the Official Secret Act 1972*.[r]

24 Read more

Software Refactoring Technique for Code Clone Detection of Static and Dynamic Website

Software Refactoring Technique for Code Clone Detection of Static and Dynamic Website

Now-a-days cloning of codes or programs of the developer or authorized person leads a positive approach. But the code cloning is done by unauthorized person leads a negative approach. In the recent years, many clone detection tools have been proposed. It produces an over whelming volume of simple clones of data or structure [3]. Code clone detection the content similarity between the programs or webpages. An attempt is made to desgn a method called “SD Code Clone Detection” for both static and dynamic webpages. It is based on levenshtein’s approach. This method comprises some steps like, parsing & analysis, tree construction, code similarity measure and clone detection. Experiments are carried out with open source websites and webpages created by some volunteers. Experimental results are recorded and are showing the better detection rate.
Show more

10 Read more

Windows, Linux, Mac Operating System and Decision Making

Windows, Linux, Mac Operating System and Decision Making

This paper presents a comparative study of choosing footstep of Windows, Linux and Mac, the three popular operating systems. This paper provides seven factors which needed to be considered before choosing an operating system. They are convenience, ca- pability, security, interface, recovery, booting time and cost. These seven factors were generated from background study and analysis. Some important characteristics such as dependency upon hardware, user interface, and security are also discussed. Moreover, open- source and closed source paradigm were discussed for user support. Finally, based on user data, it has been shown how these three- operating systems are used by users in the last couple of years.
Show more

5 Read more

Dissecting and digging application source code for vulnerabilities

Dissecting and digging application source code for vulnerabilities

In whitebox testing we are assuming to have full access to source code and we use this source as our target to determine possible vulnerability. We are going to use OWASP’s WebGoat 2 as sample application to understand methodology. We will use AppCodeScan to scan Java based source code of WebGoat. AppCodeScan is simple tool to assist in dissecting and digging the source code; it is not an automated source code scanner.

9 Read more

Coherent clusters in source code

Coherent clusters in source code

code lines to have nearly identical length and indentation. Inspection of the source code reveals that this block of code is a switch statement handling 234 cases. Further investigation shows that copia has 234 small functions that even- tually call one large function, seleziona, which in turn calls the smaller functions e↵ectively implementing a finite state machine. Each of the smaller functions returns a value that is the next state for the machine and is used by the switch statement to call the appropriate next function. The primary reason for the high level of dependence in the program lies with the statement switch(next state), which controls the calls to the smaller functions. This causes what might be termed ‘conservative dependence analysis collateral damage’ because the static analysis cannot determine that when function f() returns the constant value 5 this leads the switch statement to eventually invoke function g(). Instead, the analysis makes the conservative assumption that a call to f() might be followed by a call to any of the functions called in the switch statement, resulting in a mutual recursion involving most of the program.
Show more

45 Read more

A Hybrid Approach of Module Sequence Generation using Neural Network for Software Architecture

A Hybrid Approach of Module Sequence Generation using Neural Network for Software Architecture

Many software organizations adapt the solutions provided by different software life cycle models. The refining of different steps to achieve final objective by these life cycle models helps to produce quality products as well as helps to understand in advance about the feasibility or acceptance of final work done by the organization. But the main aim of organization is to produce high quality products in a timely manner as well as cost effective manner [1]. To increase the utilization of available resources, when taking mitigating actions, such as code inspection, refactoring etc., the ability to identify potentially, referenced components would assist. Predictive models have been a focus of research in empirical studies of software systems [20]. Life cycle models fulfill the quality criteria but along with the quality criteria we have to deliver or develop the products on timely bases. So at each and every step we have to improve or enhance the criteria of selection of steps of development. Number of researchers work on this area of quantification of implementation and structural design of systems [4, 30]. Their study observes that feature selection is the process of identifying a subset of features that improves a model’s specific performance [5]. Many researchers suggest to use search based optimization with testing release date planning and cost estimation [31, 32]. In this study we propose a Neural Network based algorithm as a search based feature selection strategy in order to find a subset of source code metrics that will generate an implementation sequence that enhances and simplifies a predictive model of software quality [6, 10]. To built predictive models for generating efficient sequence of modules for implementation, fit databases are required which contains instances of faulty and non-faulty modules. Preparation of balanced fit dataset is also always not possible when using industrial systems. Many modules in the fit dataset are actually imbalanced i.e. these exists a large difference between the number of fault
Show more

5 Read more

Recommendations for Datasets for Source Code Summarization

Recommendations for Datasets for Source Code Summarization

We observe a large “false” boost in BLEU score when split by function instead of split by project (see Figure 4). We consider this boost false be- cause it involves placing functions from projects in the test set into the training set – an unrealis- tic scenario. An average of four runs when split by project was 17.41 BLEU, a result relatively consistent across the splits (maximum was 18.28 BLEU, minimum 16.10). In contrast, when split by function, the average BLEU score was 23.02, and increase of nearly one third as seen in Ta- ble 1. Our conclusion is that splitting by func- tion is to be avoided during dataset creation for source code summarization. Beyond this narrow answer to the RQ, in general, any leakage of infor- mation from test set projects into the training or validation sets ought to be strongly avoided, even if the unit of granularity is smaller than a whole project. We reiterate from Section 1 that this is not a theoretical problem: many papers published using data-driven techniques for code summariza- tion and other research problems split their data at the level of granularity under study.
Show more

7 Read more

SYSTEM CONTROLLER. User Manual SPC-6000

SYSTEM CONTROLLER. User Manual SPC-6000

The requirement to provide Installation Information does not include a requirement to continue to provide support service, warranty, or updates for a work that has been modified or installed by the recipient, or for the User Product in which it has been modified or installed. Access to a network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and protocols for communication across the network. Corresponding Source conveyed, and Installation Information provided, in accord with this section must be in a format that is publicly documented (and with an implementation available to the public in source code form), and must require no special password or key for unpacking, reading or copying.
Show more

66 Read more

How To Manage Source Code With Git

How To Manage Source Code With Git

Eli M. Dow is a software engineer in the IBM Test and Integration Center for Linux in Poughkeepsie, NY. He holds a B.S. degree in computer science and psychology and a master's of computer science from Clarkson University. He is an alumnus of the Clarkson Open Source Institute. His interests include the GNOME desktop, human-computer interaction, virtualization, and Linux systems programming. He is the coauthor of an IBM Redbook Linux for IBM System z9 and IBM zSeries.

8 Read more

Free Software - Source Code and Licensing

Free Software - Source Code and Licensing

It used to be the case that ‘free’ GNU/Linux distributions weren’t as usable as their non-free counterparts. That could make switching difficult for non-technical users. That things have changed is partly thanks to what we’d consider the most popular GNU-centric distribution, Trisquel. Trisquel has been downloaded 344,786 times since its 2.0 release, and now uses Ubuntu as its foundation, making it an easy migration for millions of Ubuntu users. The latest release is a re-working of the 14.04 Long Term Support version of Ubuntu, which means you’ll get updates until 2019. There’s a wide variety of download choices, from a 3 GB ISO that includes source code to a 25MB ISO that needs a network installation. We opted for the 1.5GB DVD image, which can also operate as a live desktop. Its Gnome- based installer looks amazing, and while it’s great that the Orca screen reader was
Show more

8 Read more

The generation of the source code for slot games

The generation of the source code for slot games

• Naˇ cin plaˇ cevanja ˇ cez vse moˇ zne plaˇ cilne linije (ang. ways pay) je naˇ cin preverjanja simbolov na kolutih, pri ˇ cemer preverjamo vse moˇ zne plaˇ cilne linije. modul) je ime[r]

66 Read more

Study of the compatibility mechanism. EUPL (European Union Public Licence) v. 1.0

Study of the compatibility mechanism. EUPL (European Union Public Licence) v. 1.0

Nevertheless, given the non-reciprocal character of compatibility, one must not forget the implications of creating a “compatible licence”. Whether its positive aspects are to grant flexibility in software development, and allow the merging of codes from different origins, its negative aspects entail a loss of control on the downstream developments. Indeed, insuring compatibility equals to accepting the fact that the project “switches” from the EUPL to another licence (to which the EUPL has been made compatible), which is itself potentially incompatible with the EUPL. In that sense, including a compatibility clause is an act of denial by which its drafter bears the risks that the licence could actually not apply to a further version of the software. As a consequence, such a compatibility clause also raises the issue of forking, which means that the same project could lead to creation of various development branches, the interactions between the latter being restricted as long as contributions are made on a copyleft basis. While forking would promote the diffusion of previously EUPLed code and the improvements of the initial project, various (but not all) actors inside the FOSS community consider this as a bad practice.
Show more

22 Read more

Volume 63: Software Clones 2014

Volume 63: Software Clones 2014

Abstract: Code clone analysis is valuable because it can reveal reuse behaviours ef- ficiently from software repositories. Recently, some code reuse analyses using clone genealogies and code clones over multiple projects were conducted. However, most of the conventional analyses do not consider the developers’ individual difference to reuse behaviors. In this paper, we propose a method for code reuse analysis which takes particular note of the differences among individuals. Our analysis method clarifies who reused whose source code across multiple repositories. We believe the result might provide us with constructive perceptions such as characteristics of reused code itself by multiple developers, and developers who implement reusable code.
Show more

6 Read more

Show all 10000 documents...