Spring 2015: C Programming for chemists
(with elements of C++)
Davide Ceresoli (davide.ceresoli@istm.cnr.it)
Instituto di Scienze e Tecnologie Molecolari (CNR-ISTM)
Outline
1 Course logistics
2 The way to program
3 The C(++) programming language
Introduction
Dr Davide Ceresoli, Researcher at CNR-ISTM office: bld. 5, ground floor, “corpo C”
phone: 02-503-14276
email: davide.ceresoli@istm.cnr.it
background: laureain Materials Science, PhD in Physics
activity: ab-initio DFT calculations, computational chemistry, High Performance Computing (HPC), code development
Course organization
6 CFU, 48 hours (16 lecture + 32 computer lab) lectures (settore didattico via Celoria):
fridays 14:30-16:30, room 206
thursdays 13:30-17:30, room 309 (computer lab)
course website: https://sites.google.com/site/dceresoli/t/
Course objectives
No previous knowledge of programming is assumed! By the end of the course, you will:
Understand fundamental concepts of programming imperative languages
Design algorithms to solve simple problems Learn the C(++) programming language
Textbook and reference material
Allen B. Downey, How to Think Like a Computer Scientist in C++, available free at:
http://www.greenteapress.com/thinkcpp/index.html
MIT IAP2010, Practical Programming in C, http://ocw.mit.edu/
courses/electrical-engineering-and-computer-science/ 6-087-practical-programming-in-c-january-iap-2010/
MIT IAP2011, Introduction to C++, http://ocw.mit.edu/
courses/electrical-engineering-and-computer-science/ 6-096-introduction-to-c-january-iap-2011/
Programming tools
On Windows machines: GCC 4.8.1 compiler bundled with Notepad++
Available free at: https://code.google.com/p/pocketcpp/
Policies and grading
Lectures: can be interactive, with questions and interactive problem solving
Labs: mandatory recommended attendance
Each lab session has a practical programming assignments to be done individually in class
In the last lab session you will start the programming assignment for the final exam
Modern computer architecture
Take-home message
The CPU
The CPU reads the both code and data from RAM
The code is executed one instruction at the time, data is moved to/from memory
Conditionally, the CPU jumps to a different location in the code The operating system (OS) takes care of:
scheduling execution ofN processes onmCPU cores
Low-level language
The CPU understands only low-levelAssemblyinstructions
This are very simple instructions such as: move bytes from RAM to CPU registers and back do arithmetic operations on CPU registers
advance to the next instruction, or jump to an other location each CPU has a different set of instructions!
D i s a s s e m b l y of s e c t i o n . t e x t :
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 < d i s t a n c e >:
0: 55 p u s h % rbp
1: 48 89 e5 mov % rsp ,% rbp
4: 48 83 ec 20 sub $0x20 ,% rsp 8: f2 0 f 11 45 f8 m o v s d % xmm0 , -0 x8 (% rbp ) d : f2 0 f 11 4 d f0 m o v s d % xmm1 , -0 x10 (% rbp ) 12: f2 0 f 11 55 e8 m o v s d % xmm2 , -0 x18 (% rbp ) 17: f2 0 f 10 45 f8 m o v s d -0 x8 (% rbp ) ,% x m m 0 1 c : 66 0 f 28 c8 m o v a p d % xmm0 ,% x m m 1 20: f2 0 f 59 4 d f8 m u l s d -0 x8 (% rbp ) ,% x m m 1 25: f2 0 f 10 45 f0 m o v s d -0 x10 (% rbp ) ,% x m m 0 2 a : f2 0 f 59 45 f0 m u l s d -0 x10 (% rbp ) ,% x m m 0 2 f : f2 0 f 58 c8 a d d s d % xmm0 ,% x m m 1 33: f2 0 f 10 45 e8 m o v s d -0 x18 (% rbp ) ,% x m m 0 38: f2 0 f 59 45 e8 m u l s d -0 x18 (% rbp ) ,% x m m 0 3 d : f2 0 f 58 c1 a d d s d % xmm1 ,% x m m 0 41: e8 00 00 00 00 c a l l q 46 < d i s t a n c e +0 x46 > 46: f2 0 f 11 45 e0 m o v s d % xmm0 , -0 x20 (% rbp ) 4 b : 48 8 b 45 e0 mov -0 x20 (% rbp ) ,% rax 4 f : 48 89 45 e0 mov % rax , -0 x20 (% rbp ) 53: f2 0 f 10 45 e0 m o v s d -0 x20 (% rbp ) ,% x m m 0
58: c9 l e a v e q
High-level languages
For us, it is easier to program in higher-level languages
The previous block of bytes and assembly instructions, is equivalent to:
# i n c l u d e < m a t h . h >
d o u b l e d i s t a n c e (d o u b l e x , d o u b l e y , d o u b l e z ) {
r e t u r n s q r t ( x * x + y * y + z * z ); }
This code runs on almost every CPU
From source code to CPU instructions
Compiled languages
Source code is analyzed entirely and checked for errors
Then, the source code is compiled (aka translated) into assembly instructions
Possibility to perform deep code optimizations
Finally, multiple code units arelinkedtogether and with librariesof functions
The results is theexecutable, that can be run by the operating system
at the maximum performance
From source code to CPU instructions
Interpreted languages
Source code is processed line-by-line and checked for errors
The interpreter runsthe source code, also interactively
Possibility to debug and inspect the code while writing Usually very easy and fun to program with
Difficult to optimize, performance much lower than compiled languages
From source code to CPU instructions
In the middle: just-in-time compiled languages
Source code is processed line-by-line and checked for errors, also interactively
The code is compiledon-the-fly to CPU instructions
Performance is between compiled and interpreted languages
Price to pay is longer startup time and need of support run-time libraries
The development of the C(++) programming language
The origin of C is tied to the development of the UNIX operating system
Not an academic language, created at Bell labs in 1978 by Brian Kernighan and Dennis Ritchie (K & R)
C is a “high-level” assembler, used to portUNIX on different CPUs Standardized by ANSI in 1990 (C90), revised in 1999 (C99) and in 2011 (C11)
C++ created in 1979 by Bjarne Stroustrup as asuperset of C (C-with-classes)
Available modern compilers
Gnu Compiler Collection (GCC),http://gcc.gnu.org
free and open source
runs almost on every combination of CPU and OS default on Linux
Clang, http://clang.llvm.org
free and open source
runs on few combinations of CPU and OS default on Mac OSX and BSD
Intel compiler ($$$): for High Performance Computing
The C language: pros
Imperative, structured (make use of “functions”)
Static type system (i.e. variables must be typed and declared)
Low-level access to memory and CPU
Synthetic: only a handful of keywords
Thelanguage to program operating systems (i.e. the Linux Kernel)
From tiny micro-controllers to supercomputers
The C language: cons
Poor error detection, run-time memory corruption
Built-in data structures too simple: scalars and vectors
Difficult to program!
How NOTto program in C
m a i n ( _ ){ _ ^ 4 4 8 & & m a i n ( -~ _ );
p u t c h a r ( - - _ % 6 4 ? 3 2 | - ~ 7 [ _ _ T I M E _ _ - _ / 8 % 8 ]
[" > ’ t x i Z ^(~ z ? "-48] > >" ; ; ; = = = = ~ $ : : 1 9 9 "[ _ * 2 & 8 | _ / 6 4 ] /( _ & 2 ? 1 : 8 ) % 8 & 1 : 1 0 ) ; }
C(++) programming style I
/* *
* \ f i l e cg . cc
* C o n j u g a t e - g r a d i e n t e n e r g y m i n i m i z a t i o n ( i m p l e m e n t a t i o n ). */
# i n c l u d e <i n c l u d e/ a m o r p h y . h >
b o o l
CG :: run ( o s t r e a m & os , o s t r e a m & err ) {
c o n s t u i n t m a x i t e r = 5 0 0 0 0 ;
os . s e t f ( i o s _ b a s e :: s c i e n t i f i c , i o s _ b a s e :: f l o a t f i e l d ); u i n t o l d _ p r e c = os . p r e c i s i o n ( 6 ) ;
d o u b l e e n e _ o l d ;
os < < " C o n j u g a t e - g r a d i e n t m i n i m i z a t i o n : " < < e n d l ; os < < " d e l t a : " < < d e l t a < < e n d l ;
os < < " f t o l : " < < f t o l < < e n d l ;
os < < " N u m b e r of p a i r s : " < < v l i s t . u p d a t e () < < e n d l ; c a l c _ p o t e n t i a l (f a l s e);
os < < " P o t e n t i a l e n e r g y : " < < t o z e r o ( e p o t * e n e r g y _ u n i t ) < < " eV " < < e n d l ;
e n e _ o l d = e p o t ;
// i n i t i a l i z a t i o n
u i n t n a t o m s = c o n f . g e t _ n a t o m s ();
vector < vector3 > g ( n a t o m s ) , h ( n a t o m s ) , xi ( n a t o m s );
for ( u i n t i = 0; i < n a t o m s ; i ++) {
C(++) programming style II
d o u b l e fnorm , dg , dgg , g a m m ; u i n t i t e r = 1;
w h i l e (t r u e) {
f n o r m = 0 . 0 ;
for ( u i n t i = 0; i < n a t o m s ; i ++) {
c o n f [ i ]. r += d e l t a * xi [ i ]; xi [ i ] = - c o n f [ i ]. f ; f n o r m += c o n f [ i ]. f . n o r m 2 (); }
f n o r m = s q r t ( f n o r m )/ c o n f . g e t _ n a t o m s ();
dg = 0.0 , dgg = 0 . 0 ;
for ( u i n t i = 0; i < n a t o m s ; i ++) {
dg += g [ i ]. n o r m 2 ();
dgg += dot ( xi [ i ] + g [ i ] , xi [ i ]); }
g a m m = dgg / dg ;
for ( u i n t i = 0; i < n a t o m s ; i ++) {
g [ i ] = - xi [ i ];
xi [ i ] = h [ i ] = g [ i ] + g a m m * h [ i ]; }
if ( i t e r % 100 == 0) {
C(++) programming style III
< < s e t w ( 1 2 ) < < e p o t < < " f n o r m = " < < s e t w ( 1 2 ) < < f n o r m < < e n d l ; }
if ( f n o r m < f t o l ) b r e a k;
v l i s t . u p d a t e (); c a l c _ p o t e n t i a l (f a l s e); e n e _ o l d = e p o t ;
if ( i t e r ++ >= m a x i t e r ) b r e a k; }
os < < " c o n v e r g e n c e r e a c h e d in " < < i t e r < < " i t e r a t i o n s " < < e n d l ; os < < " d e l t a _ E = " < < s e t w ( 1 2 ) < < e n e _ o l d - e p o t
< < " f n o r m = " < < s e t w ( 1 2 ) < < f n o r m < < e n d l ; os < < " P o t e n t i a l e n e r g y : " < < t o z e r o ( e p o t * e n e r g y _ u n i t )
< < " eV " < < e n d l ;
os . p r e c i s i o n ( o l d _ p r e c ); os < < e n d l ;
Template C++ source file
// f i l e n a m e . cc - author , d a t e
# i n c l u d e < i o s t r e a m >
# i n c l u d e < cmath >
# i n c l u d e < string >
// m o r e i n c l u d e s h e r e ...
u s i n g n a m e s p a c e std ; // don ’ t c h a n g e t h i s l i n e
// the e n t r y p o i n t of y o u r p r o g r a m
int m a i n (int argc , c h a r * a r g v [])
{
// e n t e r h e r e y o u r c o d e
...
Hello World in C++
// h e l l o . cc - D a v i d e C e r e s o l i , 0 3 / 1 2 / 2 0 1 4 # i n c l u d e < i o s t r e a m >
# i n c l u d e < cmath >
# i n c l u d e < string >
u s i n g n a m e s p a c e std ;
int m a i n (int argc , c h a r * a r g v []) {
c o u t < < " H e l l o w o r l d ! " < < e n d l ;
r e t u r n 0; }
cout is the standard output (i.e. the terminal window) strings must be enclosed in double quotes
Hello World in C++ (2)
// h e l l o 2 . cc - D a v i d e C e r e s o l i , 0 3 / 1 2 / 2 0 1 4 # i n c l u d e < i o s t r e a m >
# i n c l u d e < cmath >
# i n c l u d e < string >
u s i n g n a m e s p a c e std ;
int m a i n (int argc , c h a r * a r g v []) {
c o u t < < " W h a t ’ s y o u r n a m e ? "; s t r i n g n a m e ;
cin > > n a m e ;
c o u t < < " Hello , " < < n a m e < < e n d l ;
r e t u r n 0; }
cinis the standard input (i.e. your keyboard)
string namecreates a variable nameof type string
Let’s try it
Using C(++) as a calculator
// s p h e r e _ v o l u m e . cc - D a v i d e C e r e s o l i , 0 3 / 1 2 / 2 0 1 4 # i n c l u d e < i o s t r e a m >
# i n c l u d e < cmath >
# i n c l u d e < string >
u s i n g n a m e s p a c e std ;
int m a i n (int argc , c h a r * a r g v []) {
c o u t < < " E n t e r the r a d i u s of the s p h e r e : ";
d o u b l e r a d i u s ; cin > > r a d i u s ;
d o u b l e v o l u m e = 4 . 0 / 3 . 0 * M _ P I * pow ( radius , 3 . 0 ) ; c o u t < < " The v o l u m e is : " < < v o l u m e < < e n d l ;
r e t u r n 0; }
double radiuscreates a floating point variableradius
Let’s try it
Errors
Syntax errors
Detected at compile time and prevents compilation
Typo, missing ;, wrong number of brackets, “open” string
Logic errors
The code compiles correctly but doesn’t work as expected
For instance: 4/3vs4.0/3.0
Uninitialized variables, overflow in computations, undefined behavior
Difficult to debug
Runtime errors
The code compiles correctly but it crashes
Mathematical domain errors (sqrt(-2.0)), write past arrays boundary, file not found, forbidden OS operations
Very difficult to debug
Recap
Low-level vs high-level languages
Compilers vs interpreters
The first program in C(++)
Writing to output, reading from input
Variables declaration and use Simple mathematical operations
More of this, next Thursday...