Static Analysis In Software Security
Project Report For Summer ProjectAt Institute for Development and Research in Banking Technology May 1- June 30, 2013 Guide: Dr. V.Radha Institute for Development and Research in Banking Technology, Hyderabad By: Krishnendu Saha Indian Institute of Technology, Kharagpur
CERTIFICATE OF COMPLETION
This is to certify that Mr Krishnendu Saha hassuccessfully completed Summer Project on “Static Analysis in Software Security" under the guidance of Dr. V .Radha, IDRBT. The Duration of this Project was from May 1, 2013 to June 30, 2013.
Dr. V .Radha
Institute for Development and Research in Banking Technology (Guide)
Abstract
Security is sometimes considered as perimeter security i.e. restricting attackers from reaching deep inside our enterprise. But to be totally secure, software must be without any
weakness that may go wrong even under some internal causes . So security should be concerned through out al the process of software development . That is where the utility of static analysis tools come .They can find out vulnerability just by looking at the source code at the time of coding itself thus saving software testing time as much less vulnerable code. In this project I have built some code checker that works as
plugins of Eclipse IDE for C/C++ language. Though C is a highly used language many of its library functions are vulnerable .
CONTENTS
Topics Page
1. Introduction: ...4
1.1 Static Analysis In context of Software Security ...5
2. Static Analysis...8 2.1 Definition... 8 2.2 Working Procedure ... 8 2.2.1. Build Model...10 2.2.2 . Perform Analysis ……….………...13 2.2.3 . Present Results ... ………16
3 . Ways of Implementing Static Analysis Tools. ...16
3.1. My Static Analysis Tool Implementation……….…..16
4 Implementation Tools....17 4.1 Hardware Details...17 4.2. Software Details ...17 4.2.1.Eclipse IDE……….……….………...18 4.2.2.Codan ...18 4.2.3. PDE……….………..18
5.Implemention And Results………..……….20
6.Future Work Scope……….…….……….27
7.Limitations …….. ………...27
8.Conclusions ……….………....28
1.Introduction:
In this digital age we use software to in every phase of our life whether it be our day to day articles or satellites in outer space. Softwares automate totally or partially the things we use . So a small mistake may lead to a huge apocalypse . Hence softwares should be reliable for our own safety. Also there are bunch of people ( hackers) who tries to jeopardize the system . Cyber threat is a matter of huge concern these days . Software security is the practice of building software to be secure and function properly under malicious attack . The traditional way of making software less vulnerable is to test it with different sets of inputs thus finding out areas of weaknesses .But if we can apply our knowledge about common vulnerabilities at the time of building it then the huge cost , effort and time can be saved .
And here comes the importance the importance of static analysis
.
After all an attacker becomes successful if there is weakness in code . If the vulnerable points are reduced then we may demand our software to be much more fail proof .
1.1.
Static Analysis
In context of
Software Security
: Software security means working of software correctly (giving correct outputs) under all possible situations even under malicious attacks (i.e. intentionally trying to find software weaknesses and exploit them). Software Security is sometimes thought as security features
cryptographic ciphers, passwords , and access control mecha- nisms . But For a program to be secure, all portions of the program must be secure, not just the bits that explicitly address security. In many cases, security failings are not related to security features at all .
In conventional and mostly used way software security is considered in test and field phases of software building . But those are actually effort to make up coding malpractices .
The root to security issues lies in coding malpractices and using
vulnerable Library functions and API ‘s . So security issues must
be considered during coding with faulty library functions and in the early stages of software development .
Dynamic Analysis ,Firewall Virus Scanner , Penetration Detection , Intrusion Detection
The root to security issues lies in coding malpractices and using
vulnerable Library functions and API ‘s . So security issues must
be considered during coding with faulty library functions and in the early stages of software development .
It is easier to fix the problems in the development stages as they are simple . But in testing phase if some bugs appears then it may require to recheck the whole programme again .
Static Analysis
Architectural risk Analysis
Security requirements
2.Static Analysis:
2.1. Definition :
Static analysis is analysing the source
code of software without executing it .
2.2. Working Procedure :
It is divided in four steps.
1.Build Model
2. Perform Analysis: Performing analysis needs another
basic step of gathering security knowledge.
2.1Security Knowledge
3.Present Results
1)Lexed Tokens 2)Parse Tree 3)Abstract Syntax Tree 4)Control Flow Graph 5)Data Flow Diagram 1)Lexical analysis
2)Parse Tree and AST Analysis
3) Control Flow Graph Analysis
4)Data Flow Diagram Analysis
5)Taint Analysis 6)Value Range Propagation
1) Common Weakness Enumeration (CWE) (http://cve.mitre.org/cwe/)
2) OWASP Honeycomb Project
(http://www.owasp.org/index.php/Category:OWASP_Hone ycomb_Project)
3) SAMATE group at NIST ( http://samate.nist.gov ) 1)error (severe threat),
2)warning (may or may not be a security bug but obeying it is good practice),
3)info(good coding practice but no threat
2.2.1.
Build model
: In analysis to understand the code byanalysis tools it needs to be represented by data structures that most nearly represents the property to be analysed. Those basic data structures are actually build by compilers and static analysis tools borrow them and . Those data-structures are lexer tokens ,parse tree ,abstract syntax tree (AST) , control flow graph(CFG), dataflow diagram(DFD) .This models are build by compilers or static analysis tools or by both.
Models Used in Analysis:
Lexed Tokens The source code converted into a token stream discarding unimportant whitespaces and comments .
E.g..: Source Code: if (ret)
mat[x][y] = END_VAL;
This code produces the following sequence of tokens:
Lexer Output: IF LPAREN ID(ret) RPAREN
ID(mat) LBRACKET ID(x) RBRACKET LBRACKET ID(y) RBRACKET EQUAL ID(END_VAL) SEMI
Some of the token needs extra one property like name for identifier(ID). These token stream is subsequently used in making parse tree.
Parse Tree : A language parser uses a context-free grammar (CFG) to match the token stream. The grammar consists of a set of productions that describe the symbols (elements) in the language . The parser performs a derivation by matching the token stream against the production rules. If each symbol is
Control Flow Graph
: It is the graphical way of representing all possible way the flow of programme may occur . Each node in CFG represents a basic block that has no branching or looping . CFG gives the idea of Cyclomatic complexity that directly shows the no. of possibility of errors. During dynamic analysis it also helps us to get exhaustive sets of test cases .Source Code : if (a > b) { nConsec = 0; } else { s1 = getHexChar(1); s2 = getHexChar(2); } return nConsec;
CFG Builder output:
Data Flow Diagram: Data Flow Diagram shows all the possible path of data in put and output and data transfer between different entities within the software thus giving us the points where data should be validated and setting up trust boundaries.
If(a>b)
nConsec=0; s1 = getHexChar();
s2= getHexChar();
return nConsec;
2.2.2. Perform Analysis :Analysis is performed on the tokens or nodes of tree or graphs .
Lexical analysis : Simplest of all analysis techniques helps in checking syntactical errors and it uses in most cases regular pattern matching . Not much useful than detecting wrong identifier names or function names. Tools using lexical analysis techniques are ITS4, RATS, and Flawfinder.
Parse Tree and AST Analysis : These representations helps us understanding of semantics of the program . So helps us in finding deviation of rules of grammar and security rules like a if block should start and end with curly bracket. Most modern compilers does these kind of checking and violation comes as a parse error . Codan the code analysis platform in CDT(C/C+ Development Tools ,eclipse plugin ) uses AST to built checkers . Similar Platform PMD ,Crystal (eclipse plugins) uses AST for detecting errors in Java .
Control Flow Graph Analysis: AST and parse trees though appear useful enough in detecting most of rule violations they fail in case of rules that apply for branching in code .As example we may take that opened file or database should be closed once and only once in a flow control of a programme . CFG is analysed in a number of stages starting from basic block , and then a procedure( method or function) and then to a bigger module like class. Fortify Source Code Analyser , Klockwork .
Data Flow Diagram Analysis:A Data Flow Diagram (DFD) with security-specific annotations is used to describe how data enters, leaves and traverses the system: it shows data sources and destinations, relevant processes that data goes through and trust boundaries in the system. A DFD has a fixed set of component types: Process, HighLevelProcess, Data Store and External Interactor. A process is concern of DFD diagram . A High Level program is represented by hierarchical multistage DFDs .A datastore may be a database, a file, or the Registry . An ExternalInteractor represents an entity that exists outside the system being modelled and which interacts with the system at an entry point , typically a human. The data flows are represented by arrows . A DFD used in threat modelling often separates elements that have different privilege levels using a Boundary to describe locations where a privilege impersonation on the part of an adversary could occur, a machine or process boundary may be
Taint Analysis:The concept of tainting refers to marking data coming from an untrusted source as “tainted” and propagating its status to all locations where the data is used. A security policy specifies what uses of untrusted data are allowed or restricted. An attempt to use tainted data is a violation of this policy is an indication of a vulnerability. Tainted data should not be used in any function which modifies files, directories and processes, or executes external programs. If the rule is violated then the program should be aborted .
1. Initialize all variables as NOT TAINTED.
2. Find all calls to functions that read data from an untrusted source. Mark the values returned by these functions as TAINTED.
3. Propagate the tainted values through the program.
If a tainted value is used in an expression, mark the result of the expression as TAINTED.
4. Repeat step 3 until a fixed point is reached.
5. Find all calls to potentially vulnerable functions. If one of their arguments is tainted, report this as a vulnerability.
1 unsigned int n; 2 char src[10], dst[10]; 3 n = read_int (); 4 if (n <= sizeof (dst)) 5 memcpy (src, dst, n); /* n is < sizeof (dst) */ 6 else 7 memcpy (src, dst, n); /* n is > sizeof (dst) */
Using Taint analysis memcpy() of both line no. 5 and 7 will be marked as
vulnerability where as in actual case 5 is a false positive. So , taint analysis has a high possibility of giving false positive.
Value Range Propagation: In this case the tainted variables should also carry a range of its possible values. If some vulnerable function uses that variable then it may be checked that the value range of that variable still makes the function vulnerable or not. Thus we may avoid some false positives .
1 unsigned int n; 2 char src[10], dst[10]; 3 n = read_int (); 4 if (n <= sizeof (dst)) 5 memcpy (src, dst, n); /* n is < sizeof (dst) */ 6 else 7 memcpy (src, dst, n); /* n is > sizeof (dst) */
2.2.2.1Security Knowledge
: The main logic behind these toolsis to learn from our past mistakes and attack due to weaknesses in code and prevent them from happening again . There are many such collections of common mistakes done by programmers
1) Common Weakness Enumeration (CWE) (http://cve.mitre.org/cwe/) 2) OWASP Honeycomb
project(http://www.owasp.org/index.php/Category:OWASP_Honeycomb_Project) 3) SAMATE group at NIST (http://samate.nist.gov)
In these collection of numerous errors there are patterns and repetitive errors and so the most generic problems may be categorized .
2.2.3.Presenting And Processing Results
:
The security vulnerabilities are reviewed manually and those are fixed .In some cases the analysers itself give some solution for the problem. The problems are given in different categories like error (severe threat), warning (may or may not be a security bug but obeying it is good practice), info(good coding practice but no threat ).
3.Ways of Implementing Static Analysis Tools:
Application of Static Analysis Tools in Practical World:
I. Integration with compilers: Static analysis tools are part and parcel of modern compilers . They does all the basic checking like type checking ,style checking , parse errors , identifiers never used many others .But it needs compilation of whole programme .
E.g..- gcc compiler (c language)
II. Integration with IDEs: IDEs (Integrated Development Environment ) use vulnerability checkers as add-ons or plugins to show the errors in coding in the editors while writing .This is the most useful form of static analysis tools as it not only shows error but also gives quick fixes .
Eg.- Eclipse ,Netbeans
III. Stand Alone Platforms: This kind of tools are generally the most sophisticated ones and detects most complicated problems .They are exclusively built for detecting software weaknesses .
Eg.- Fortify Source code Analyser ,Klockwork ,Ounce.
3.1 . My Static analysis Tools implementation :
I have built static analysis tools for an IDE (Integrated Development Environment) .
The IDE chosen is Eclipse ,one of the most used platform by software developers . This tools are integrated as plugins to eclipse and runs in the back end to find error in code
4.Implementation Tools :
4.1 Hardware Details:
Model : Dell PC
Processor : Intel(R) Core™2 Duo CPU
Installed memory (RAM) : 4GB
System type : 64-bit OS
4.2 Software Details:
Operating System : Windows 7 Basic
Softwares Used : Jdk 1.6, Mingw compiler, Eclipse IDE
4.2.1. Eclipse IDE :
Eclipse is one of the most used IDE for java . But it also gives tools to build software in otherlanguages . As here I have used CDT (C/C++ Development
Tools ) which comes as plugin to the eclipse.
4.2.2. Codan(Code Analysis):
Codan which is alight-weight static analysis framework in CDT ( CDT is Eclipse's
C/C++ Development Tools) which would perform real time analysis on the code to find common defects, violation of policies, etc. Framework contains common components and APIs that is shared between static analysis tools for C/C++, such as:
Profile Editor (Problem Preferences)
We can enable or disable our checker
Severity of the Problem is specified. We can change the
severity of the problem
When we keep cursor on the checker the description about
• How to build an AST of a C/C++ source : Windows >Show view > Others >C/C++>DOM AST
• How to get CDT: Help> Install New Software>Add
the Url: http://download.eclipse.org/releases/indigo And then select CDT .
4.2.3. PDE (Plugin Development Environment):
To develop eclipse plugins there is a plugin development platform.
The ways of building plugins may be seen from reference 4.
The Basic steps of Plugin Development are:
1.First Go to File> New project > Plugin project 2. Then the MANIFEST.MF in META-INF is edited
• Add Dependencies (i.e. The plugins that are needed to run this checker plugin)
• ADD Runtime
• ADD Extensions (e.g. These checkers need a point of extension org.eclipse .cdt.codan.core.checkers)
• Add checker by right click>New>checker .Give class name as name of its source code .
• Under checker Add problem by right click>New>problem. And there message that should be shown when error occurs and default enable etc.
• On the Overview page in the Exporting part click on Organize Manifests Wizard >finish , Externalize Strings Wizard >finish
• At last in the Export Wizard portion Archive file give the name of your plugin .
• Now this .zip folder may be included in eclipse plugins folder to make it permanent in codan.
3.To test the plugin run it and then another eclipse window opens .Right faulty code that your checker is supposed to catch . You can see error and messages in the editor.
5.Implementation and Results:
C language(So also in C++) has a lots of vulnerable library function which may be used to crash a program if the arguments are unchecked and unsanitized . So we may use some checker to make the programmer aware about such vulnerable functions or whether the values of arguments taken have no potential threat.
Codan Plugins developed:
I. For C function int strncpy(char * dst, const char* src
,size_t n) : It is erroneous to give value of n greater than or equal to size of destination (dst) allocated . So it must be checked when this vulnerable function is used . It is an example of buffer overflow.
Algorithm Used :
1) First find a call to function strncpy() by viewing all call to ICPPASTFunctionCallExpression and then checking whether first IASTNode is string “strncpy”. If it is true then get the String value of the next three nodes . The first string is the name of destination and third String is string form of value of n.
2) Now we need to get the space allocated for destination character pointer . For that we may visit all the IASTDeclarationStatements made and find out the declaration of the destination character pointer and the space allocated .
3) Last step is to compare the allocated space of destination character pointer and the value of n.
Limitations And Inefficiency of the checker:
1)In the 2nd step of the algorithm the size of the allocated space of
destination is determined by accessing the nodes to proper position . But this method is inappropriate as space allocation may be done in two different ways .so the solution to these may be maintaining a symbol table during static analysis as done during compilation.
II. For C function fopen(stream , ’r’) and fopen( stream , ’w’) :
When a file is opened in ‘r’ mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file
overwritten . There should be a block before every fopen() function with ‘r’ or ‘w’ mode to check those above conditions .
Algorithm Used :
1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening .
2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above i.e. in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block. But in case of write case there should be access() function with appropriate return statement.
III. For C function printf() and its friends [fprintf() ,sprintf(),sprintf() , vprintf()] as well as scanf() and its friends sscanf() , fscanf() , vscanf(): All these function takes a format string and all the arguments needed mentioned in format string . Error here may occur in two cases .
1) If number of format specifiers in the format string is not
equal to arguments present . 2) If there is no format string .
3) If format specifier and the corresponding argument indicates two different type.
Algorithm: 1) Inside a function get account of all the IASTDeclarations and the type of variables . And at the same time see the IASTFunctionCallExpression .If a function declaration is printf or scanf that may be known from its first node, then see the total number of nodes of it ( let be x) .
2)The 2nd node is the format string . A regular
expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with java.regex.Pattern ( %[-+#0]?[(0-9)*]?[.(0-9)*] ?[hlL] ?[cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error .
3) To check the third one we will have to check all the format specifier and the corresponding variable type.
6.Future Work Scope :
1)Maintaining a symbol table for all the variables (i.e. type ,
allocated space , name for easy access of them while needed .This may help in many problems .
e.g.- To check all the values possible in the switch argument variable are covered by the cases .
2) Building checkers for other problems. E.g.-
i) using some variable without initialization. ii)using some
variable after freeing the space.
3) Building static analysis checkers for other languages .
7.
Limitation of the Project:
1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler.
2) A checker platform by which we may visit all the nodes of CFG and analyse them individually, so that we may be able to solve problems that involves understanding of CFG.
e.g.- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end.
8.Conclusion:
Alan Turing, as part of his conception of a general purpose
computing machine, showed that algorithms cannot be used to
solve all problems. In particular, Turing posed the halting
problem, the problem of determining whether a given algorithm
terminates (reaches a final state). The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out .So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem .The only way to do this is dynamic analysis.
9.References:
1. Secure programming with Static Analysis(By Brian Chess, Jacob West Addison Wesley)
2. Compilers (By Aho Sethi Ullman )
3. Checking Threat Modelling Data Flow Diagrams for
Implementation Conformance and Security( Daniel Wang-Peter Torr)
4.Control flow graph Generator (By Aldi Alimucaj)
5. How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM)
6. ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch Tadayoshi Kohno Gary McGraw 7.Static Analysis tools (University of Toronto)
8. http://wiki.eclipse.org/CDT/designs/StaticAnalysis
9. http://www.eclipse.org/articles/Article-PDE-does-plugins/PDE-intro.html