In our work we have modified the JikesRVM to build a prototype of a distributed Java Virtual Machine. Rather than running the VM on top of a page-based DSM software that is responsible for object allocation and manages the cache coherence issues, we decided to develop our Shared Object Space inside the JikesRVM. Our object-based DSM is able to use the abundant runtime information of the JVM which gives the opportunity for further optimizations. With the implementation of the lazy detection scheme of shared object, we only move objects into the Shared Object Space if they are reachable from at least two threads located on two different nodes. We added a faulting scheme so that a copy of a shared object can be requested from its home node if the object is not locally available. Our cache coherence protocol handles the synchronization on objects correctly as specified in the JMM. Upon an acquire operation the protocol guarantees that the latest data of a shared object is fetched. A release operation triggers a writeback of all previously written objects to their home nodes. By separating objects into shared and node-local objects, we only have to deal with shared objects specially since node-local objects for example can be reclaimed by the local garbage collector. Furthermore, synchronization on node-local objects do not trigger a distributed cache flush that is quite expensive as shown in Section 6.3. We have also added mechanism for distributed classloading by using a centralized classloader that replicates the class definitions to all worker nodes. We implemented a distributed scheduler that includes a load balancing function to help us to decide where to allocate and start a Java application thread by choosing the most underloaded node. Additionally, by redirecting all I/O operations to the master node we hide the underlying distribution from the programmer and the Java application itself and gain full transparency. Since we do not introduce any additional Java language constructs, we achieve a complete SSI.
We have shown in our benchmarks section that our Shared Object Space works correctly. Shared objects are faulted in when they are needed and the lazy detection scheme makes sure that objects that are reachable from other threads located on different nodes become shared objects. The distributed classloading makes sure that each node has a consistent view of the class definitions. Our cache coherence protocol deals with synchronization correctly as long as monitors are used. However, to run our DJVM in a productive environment, a garbage collector component needs to be implemented to deal with shared objects. Furthermore, for performance reasons as seen with the synchronization benchmarks, it is advisable to develop a mechanism for migrating an object’s
60 7. Conclusions and Future Work home node.
As mentioned in Section 7.2.8, additional checks for volatile fields need to be inserted in the compiler in order to completely cover the thread synchronization used in Java. Furthermore, the java.lang.Thread wrapper class needs to be adjusted to delegate all method calls correctly to its corresponding VM-internal representation. For the benchmarks we have only rewritten the join() and start() methods. As far as classloading is concerned, we left out annotation support since the meta-objects created for annotations are treated differently. We did not consider the case when the application creates an own classloader. In this case the classloader object should be allocated on the master node and a proxy hat performs the redirection should be generated on the worker nodes.
We experienced several problems during the development of the DJVM. First we encountered problems regarding deadlocks. At some point we had to block the execution of a thread until a reply message arrives, e.g. when a worker node redirects the classloading to the master node, the requesting thread must wait until all class definitions arrive. Since the requesting thread could have acquired a lock before, this resulted in a classical deadlock situation. Due to the lack of proper debugger support it was difficult to detect the exact place where the deadlock happens. We also had problems especially when it comes to the VM and application object separation as mentioned in Section 7.1. We had to introduce exception cases so that VM objects do not become shared objects. Because several static methods in the Java class library are synchronized, this results in a costly cache flush of non-home shared objects even if the method was called inside the VM code. These problems can be solved by defining a clear boundary between VM and application objects.
A.1
DJVM Usage
This section describes how to run the DJVM. We will give a list of the additional command line options we introduced for our distributed runtime system. Note that one machine in the cluster should be configured as the master node and must be started first. The other cluster nodes should be configured as worker nodes.
• -X:nodeId=x: The node’s ID within the cluster. x must be 0 on the master node.
• -X:totalNodes=x: The total amount of nodes within the cluster. Currently, we support up to four cluster nodes.
• -X:nodeAddress=x: The IP or MAC address of the node.
• -X:useTCP=x: If TCP communication should be used, x should be set to true.
• -X:nodeListeningPort=x: The listening port where the other nodes can connect to establish a communication channel. x should have the same value on all nodes.
• -X:masterNodeAddress=x: The IP or MAC address of the master node. This command line option is only needed on the worker nodes.
A sample configuration to run a machine as the master node using TCP communication within a cluster consisting of two nodes could look like this:
rvm -X:nodeId=0 -X:totalNodes=2 -X:nodeAddress=129.132.50.8 -X:useTCP=true -X:nodeListeningPort=60000
The worker node could be started with the following arguments:
rvm -X:nodeId=1 -X:totalNodes=2 -X:nodeAddress=129.132.50.9 -X:useTCP=true -X:nodeListeningPort=60000 -X:masterNodeAddress=129.132.50.8
Note that the use of raw sockets requires root privileges.
62 A. Appendix
A.2
DJVM Classes
In this section we give a short description of the most important classes used by the DJVM.
A.2.1 CommManager
The CommManager is the first component of the DJVM that is booted. It sets up the cluster by establishing a connection to all cluster nodes and by starting the MessageDispatcher thread that resides in a big loop to receive further messages. The CommManager also provides methods for sending and receiving messages over TCP or raw sockets. A pool of MessageBuffers is created when the CommManager is booted. A MessageBuffer can be obtained by calling the method getMessageBuffer() and should be returned with returnMessageBuffer().
A.2.2 Message
Figure A.1 shows the message class hierarchy of our message model. All shown classes are ab- stract. Concrete instances inherit from one of the abstract classes, e.g. a message that deals with classloading inherits from MessageClassLoading whereas a synchronization messages inherit from MessageObjectLock, etc. The methods that must be implemented from the subclasses are described in Section 5.1. Message MessageClassLoading MessageIO MessageObjectAccess MessageObjectLock MessageScheduling MessageStatics MessageSystem
Figure A.1: Message class hierarchy.
A.2.3 SharedObjectManager
The SharedObjectManager is responsible for all operations executed on shared objects. It provides methods for sharing an object, faulting an object in, computing and applying diffs, invalidating all non-home shared objects, updating invalid shared objects, etc.
A.2.4 GUIDMapper
The GUIDMapper contains the GUID table as a member and provides methods for finding an object given its GUID. The method findOrCreateGUIDObj() is the only way to find or create a GUID object. All GUID objects are stored in a set. This is needed since our dangling pointers are actually references to GUID objects whose last bit is set to 1. If the GUIDMapper did not hold a valid reference to the GUID object, the garbage collector could collect a GUID object since the dangling pointer did not point to that object.
A.2.5 LockManager
The LockManager implements the functionalities for locking and unlocking of shared and node- local objects. Additionally, it is responsible for the mapping between a MonitorThread and the remote thread acquiring or releasing the lock on an object. Due to some name space conflict, the implementations for waiting and notifying on objects have been moved into the VM_Thread class.
A.2.6 DistributedClassLoader
All methods for replicating classes are implemented in the DistributedClassLoader class. It contains all serialization and deserialization methods for the class definitions such as serializeAtoms, serializeTypeRefs, serializeFields, etc. During boot time, the flag redirectedToMaster is set on the worker nodes so that all subsequent classloading attempts are forwarded to the master node.
A.2.7 DistributedScheduler
The functionalities for distributing a thread to an other node are implemented in the
DistributedScheduler class. A native method is declared as a member to call the load average library. The logic of the distributed scheduling is also implemented in this class, i.e. if a thread is allocated and started on a remote node, the counter for this particular node is increased and decreased if the thread terminates.
A.2.8 IORedirector
The methods declared in the IORedirector intercept the calls executed in the VMChannel class if the node is a worker node. In this case the corresponding redirection method constructs a MessageIO message that is sent to the master node for further processing.
64 A. Appendix