CS 294-73
Software Engineering for
Scientific Computing
http://www.cs.berkeley.edu/~colella/CS294
Lecture 25:Mixed Language
Different languages
• Technical, historical, cultural differences can result in
people choosing to program codes you would like to use in different languages.
• Several options are available to you when you wish to
invoke this code as Third Party Library
- Re-implement the functions you want in your own language
- While this sounds ridiculous, it is actually the dominant means of
communicating algorithms to this day.
- Translate the code using a tool
- f2c is a classic in this category
- creates code that lives in your repo.
– You “branch” from the reference implementation
- Perform inter-language procedure calls, compilation and linking.
- Implement a client-server or Service Oriented Architecture (SOA)
Re-implement
• Cons
- You’ll probably not have as much testing and robustness as the widely used
package
- You might get the routine wrong.
- infeasible past a relatively simple level of complexity
- You miss out on advances others make in the state-of-the-art.
• Pros
- If you are re-implementing something from a slower execution language
(python, or Java) it will probably run faster.
- Most tools are happiest when working with a single-language code environment
- profilers, build systems, debuggers, revision control, documentation.
- nothing special needed to call code written in your own language.
• Good practice
- Keep an active regression test system that compares your implementation with
the reference package
Client-Server Models (or SOA)
• Problem:
- Code written in language A
- Code written in language B
- One needs an operation to be performed be the other
• Solution: protocol written in language C!
- example
- HTTP HyperText Transfer Protocol
- clients written in any language (browsers, crawlers, etc)
- servers written in any language (apache, php, perl, C)
• While network-centric in practice it is not an essential element.
- I can hook up programs with unix pipes and STDIN/STDOUT
• For long term viability and flexibility this is a good way to insulate
your development
- transfer all discussion of interoperability to just your own user community, as
expressed in your transfer protocol. Communication protocols are much more forgiving than a language semantic…but also a terror to debug.
C++ calling C
• Despite the abbreviation, C++ is not C.
• The main difference is how symbols are named - symbol is the text string that is used to label a function name
• C compilers are all compatible in their convention for
what symbols are named.
- I can compile with different C compilers and usually successfully link
them together into a common executable.
• C++ compiler have traditionally all had their own
proprietary naming conventions. • Easiest to show with an example
f1.cpp vs f1.c
void f1(int a_a, int a_b, int a_c)!
{!
int temp = a_a;!
double b = a_b;!
}!
>g++ -c –o f1.o f1.cpp!
>gcc –c –o f1c.o f1.c!
Name Mangling
0000000000000020 s EH_frame1! 0000000000000000 T __Z2f1iii! 0000000000000040 S __Z2f1iii.eh! 0000000000000020 s EH_frame1! 0000000000000000 T _f1! 0000000000000038 S _f1.eh!• Since C does not have overloading, or classes, symbol
names do not require any form of mangling
• Mangling is the process of making up unique string
names for member function and overloaded functions. • You can look at various object files for classes and
Linking
• So what are the consequences on linking your C++
program to C compiled code ?
#include "f1.h”! int main()! {! f1(1,2,2);! return 0;! }! > g++ -c -o f1test.o f1test.cpp! > g++ f1test.o f1c.o!
Undefined symbols for architecture x86_64:!
"f1(int, int, int)", referenced from:!
nm f1test.cpp
0000000000000020 s EH_frame1! U __Z2f1iii! U ___gxx_personality_v0! 0000000000000000 T _main! 0000000000000040 S _main.eh!• The linker was looking for __Z2f1iii! • But f1c.o contains _f1
• C++ and C name their functions differently
extern “C”
extern "C" {! #include "f1.h”! }! int main()! {! f1(1,2,2);! return 0;! }! 0000000000000020 s EH_frame1! U ___gxx_personality_v0! U _f1! 0000000000000000 T _main! 0000000000000040 S _main.eh!We can tell the C++ compiler to NOT mangle a function name, but just use the default C naming conventions. Also, I should mention this is called “conditional compilation”
Writing a portable C header: f1.h
#ifdef __cplusplus!
extern "C" {!
#endif!
void f1(int a_a, int a_b, int a_c);!
#ifdef __cplusplus!
}!
#endif!
A macro that is defined by all C++ compilers
Fortran 2003 C Interoperability
• More cumbersome than you would imagine
subroutine f1(a, b, c) BIND(C)!
USE ISO_C_BINDING!
integer (C_INT) a, b, c;!
return!
end!
• Like with the C++ compiler, I can now tell the Fortran compiler to
create a symbol with a C naming convention.
• To call this function from C++ I would need a declaration
extern "C" {!
void f1(int* a, int* b, int* c);!
C as the typical default LCD
• Most languages will provide a means to make their
functionality accessible to a C interface.
• In general, this creates the two step process as shown
for Fortran 2003 and C++
1. library code is instructed to create a C named symbol
2. calling code is instructed to look for a C named symbol
- Then the linker can find everything and hook it all up
• Some languages, like C++, embed their strong typing in
their naming convention. Others, like C and F77, do not catch the error of calling a function with the wrong
number or type of arguments.
• We’ll stop here with compiled languages. The approach
for the rest is the same, but you will rarely encounter the other compiled languages.
Interpreted languages
• Interpreted languages (Java, Python, MATLAB, etc.)
• These are not linkable (no .o objects)
• The procedure is quite different depending on which direction you
want to go.
• Interpreted language are executing inside a virtual machine.
- It is like an abstraction of a computer, but is in reality just another executing
process
Python calling C++: Extending
• Recall, most languages provide a mechanism to create a C binding
for their functions.
- Write wrapper code that includes Python.H
- This includes parsing your string inputs
- compile to a C binding
- link into a shared library
Simple wrapped function
#include “Python.h”!
static PyObject * toy_system(PyObject *self, PyObject
*args)!
{!
const char *command; int sts;!
if (!PyArg_ParseTuple(args, "s", &command)) !
return NULL; !
sts = system(command); !
return Py_BuildValue("i", sts);!
Not done yet. Also need a vtable
static PyMethodDef ToyMethods[] = !
{ {"system", toy_system, METH_VARARGS, "Execute a shell command."}, !
{NULL, NULL, 0, NULL} /* Sentinel */ };
• //and still not done yet
static struct PyModuleDef toymodule = {!
PyModuleDef_HEAD_INIT,!
"toy", /* name of module */!
NULL, /* module documentation, may be NULL */!
-1, /* size of per-interpreter state of the
module,!
or -1 if the module keeps state in global
variables. */!
ToyMethods!
…and still not done yet: Init function
PyObject* PyInit_toy(void)!
{!
PyObject* res = PyModule_Create(&toymodule);!
if (!res) return NULL;!
return res;!
}!
Compiling and linking a python module
> g++ -c -fPIC -I/Library/Frameworks/
Python.framework/Versions/3.2/include/python3.2m/
-o toy.o toy.cpp!
- -fPIC creates “Position Independent Code”. It means that all the
pointer offsets in the function are relative to the stack frame, not the function address. This makes is relocatable.
> g++ -shared -L/Library/Frameworks/
Python.framework/Versions/Current/lib -lpython3.2
-o toy.so toy.o!
Using a Python Module
> python!
Python 3.2.2 (v3.2.2:137e45f15c0b, Sep 3 2011, 17:28:59) !
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin!
Type "help", "copyright", "credits" or "license" for more
information.! >>> import toy! >>> toy.system("ls -la");! total 40! drwxr-xr-x 5 bvs bvs 170 Nov 21 17:15 .! drwxr-xr-x 102 bvs bvs 3468 Nov 21 16:39 ..! -rw-r--r-- 1 bvs bvs 831 Nov 21 17:03 toy.cpp! -rw-r--r-- 1 bvs bvs 1872 Nov 21 17:03 toy.o! -rwxr-xr-x 1 bvs bvs 8748 Nov 21 17:06 toy.so!
Why would we do this ?
• Python is nice and fun to use. Good for rapidly
prototyping new ideas.
• The interpreter can make you code quite slow.
• You can link in optimized and compiled code for the
Why would we not do this ?
• Good bye debugging
• Good bye profiling
• To get those things back you end up re-implementing
your code base in the compiled language
Debugging Python Modules
>gdb python!
(gdb) run!
>>> import toy!
Reading symbols for shared libraries . done!
>>> !
Program received signal SIGINT, Interrupt.!
0x00007fff8a3ad932 in select$DARWIN_EXTSN ()! (gdb) break toy_system! Breakpoint 1 at 0x100669e66! (gdb) cont! Continuing.! toy.system("ls -la");!
Breakpoint 1, 0x0000000100669e66 in toy_system ()!
(gdb) where! #0 0x0000000100669e66 in toy_system ()! #1 0x00000001000b31f4 in PyEval_EvalFrameEx ()! #2 0x00000001000b41ba in PyEval_EvalCodeEx ()! #3 0x00000001000b44cf in PyEval_EvalCode ()! #4 0x00000001000db16e in PyRun_InteractiveOneFlags ()! #5 0x00000001000db43e in PyRun_InteractiveLoopFlags ()! #6 0x00000001000dbc71 in PyRun_AnyFileExFlags ()! #7 0x00000001000f0982 in Py_Main ()! #8 0x0000000100000e5f in dyld_stub_strlen ()!
C++ calling Python: Embedding
#include <Python.h>!
int main(int argc, char *argv[])!
{!
Py_Initialize();!
PyRun_SimpleString("from time import ctime\n”!
“print(ctime())\n");! Py_Finalize();! return 0;! } ! >g++ -I/Library/Frameworks/Python.framework/Versions/ 3.2/include/python3.2m/ -L/Library/Frameworks/ Python.framework/Versions/3.2/lib -lpython3.2 simple.cpp! >./a.out!
To get fancier, you need to use Py interface
• Catching the return types from functions - PyObject
• dynamic casting return types to their derived types
• Turning your arguments into strings that get handed
through the python parser. • This gets ugly very quickly
- It also changes syntax as you move through minor Python version
numbers (?!?!) yes, that is a “bad thing”
• You end up parsing a lot of PyList objects in the raw
Tools
• SIP and SWIG
- Automate much of the mechanical part of generating wrapper code
given existing C or C++ code. Just have to follow certain coding conventions
- Still not really automatic
- Not all language semantics have an analogue between the
languages.
• Boost.python
Boost.python
#include <boost/python.hpp>!
char const* greet() !
{ return "hello, world"; }!
BOOST_PYTHON_MODULE(hello_ext) !
{ using namespace boost::python;!
def("greet", greet); !
}!
I’ll skip the complicated compilation….!
>>> import hello_ext !
>>> print hello_ext.greet()!
Java works in a similar way
• The two languages are contemporaries really
• Java has been a bit schizophrenic about it’s C interface. - Native C interfaces make it very hard to make your program certifiably
secure.
- Native C code makes your Java code non-portable, non-”webbish”
• Still, you have to either provide *everything* in your
language, or provide an interface to C.
- Not too many device drivers get written in Java. (Java’s support for
Real Time execution is still pretty new and brittle).
• The Java Native Interface is specified in <jni.h>
HelloWorld.h
#include <jni.h>!
#ifndef _Included_HelloWorld!
#define _Included_HelloWorld!
extern "C" { !
JNIEXPORT void JNICALL
Java_HelloWorld_print (JNIEnv *, jobject);!
}!
HelloWorld.cpp
#include <jni.h>!
#include <stdio.h> !
#include "HelloWorld.h”!
JNIEXPORT void JNICALL
Java_HelloWorld_print(JNIEnv *env, jobject obj) ! { ! printf("Hello World!\n"); ! return; ! }!
HelloWorld.java
class HelloWorld !
{ !
private native void print(); !
public static void main(String[] args)!
{ ! new HelloWorld().print(); ! }! static ! { System.loadLibrary("HelloWorld"); }! }!
Building it all and running
>javac HelloWorld.java !
>g++ -shared HelloWorld.cpp -o libHelloWorld.so !
>java HelloWorld!
Hello World!!
• Now, there are also tools to help you get further along - javah
- reads a java class file and generates a header file stub for you
- SWIG
- can parse your C/C++ code and generate shadow classes and
wrappers to help you with JNI as well.
• Microsoft thinks they know better, or they just want Java
Java Embedding
#include <jni.h>!
JNIEnv* create_vm(JavaVM ** jvm) !
{ JNIEnv *env; JavaVMInitArgs vm_args; !
JavaVMOption options; ! options.optionString = "-Djava.class.path=D:\ \Java Src\\TestStruct"; ! vm_args.version = JNI_VERSION_1_6; ! vm_args.nOptions = 1; ! vm_args.options = &options;! vm_args.ignoreUnrecognized = 0; !
int ret = JNI_CreateJavaVM(jvm, (void**)&env, &vm_args);!
if(ret < 0) printf("\nUnable to Launch JVM\n"); !
return env;!
What do you do with a JavaVM ?
• Much the same as you would do with a Python VM
• Build up strings to pass to Java functions
• Handle Java Objects as return types.
• Most useful if you have a large Java written GUI already
set up and working, but want to call it from your code.
- This doesn’t come up very often
• Mostly just wanted to show that interpreted languages
have two kinds of interoperability, and there is a virtual machine.