Actors - Basic Language Structure - A linguistic approach to concurrent, distributed, and adapt

3.1 Basic Language Structure

3.1.1 Actors

In Ensemble, an actor is a first class entity which represents a single locus of control with encapsulated state. An actor is defined with a name and one or more user defined interfaces (Section 3.1.3), as seen in Listing 3.1, lines 5 and 14. The interface is used to define the channels which may be used in the body of the actor. The following line references refer to Listing 3.1.

Figure 3.2: Ensemble Channel Configurations

define any actor specific state constructs (line 6). The entities defined in this section are only available within the enclosing actor. The second section is used to define one or more constructors for the given actor (lines 7 & 15). Any actor-defined state may be referenced from a constructor. The third section is used to define the actor’s behaviour clause (lines 8- 11 & 16-20). The behaviour clause contains the logic of the actor, and is repeated infinitely, until explicitly told to stop. Telling an actor to stop will not kill it immediately, instead the actor will complete the current path through the behaviour clause before stopping. Killing an actor immediately would likely interfere with the overall application logic because of the channel connections, and the expected interactions via such channels, between actors. This choice simplifies the reasoning of stopping an actor.

Actors may create other actors, however actors may not stop other actors. To do so would violate the encapsulation of actors, and mitigate the simple flow of logic offered by per actor behaviour clauses. An actor must stop itself. However, one actor may send a message to another actor via the channel mechanism, requesting that the receiving actor commit suicide.

3.1.2 Channels

In Ensemble, communication between actors is achieved by passing messages along typed, unidirectional channels. Channels are first class entities in the language and are arranged into two sets: in channels which consume data, and out channels which produce data. This distinction is necessary as a channel represents one half of a connection; two channels must be connected together before data may be sent between actors. The two channels being connected must be of opposite direction (in + out), and convey the same data type. It is a compiletime error (and logically wrong) to connect two channels of the same direction or different types together. Channels convey data of any language type or user defined type, including channels. The ability to send channels between actors enables dynamic runtime

topologies of actors. Figure 3.2 shows the possible configurations of channel connections at runtime.

Many actor languages choose to use implicit actor mailboxes to facilitate communication, hence messages are sent directly to actors. Ensemble uses multiple channels as it decouples actors from each other and enables the reconfiguration of channel connection topologies at runtime. Also, this enables channels to be used without concern for whether or not they are connected, as explained below.

Blocking-Rendezvous Channel Communication

By default, communication between actors in Ensemble follows the blocking-rendezvous model. This means that data is only passed between actors when both the sender and the receiver are explicitly trying to communicate.

For example, consider actor A trying to send a message on channel output which is connected to channel input in actor B. If actor A executes a send on output, the execution of the actor A will block until actor B executes a receive on channel input. Equally, if actor B executes a receive on channel input it will block until actor A executes a send on channel output. In this way, both actors must rendezvous before messages are sent. By default, a deep-copy is made when sending data from one actor to another. This is done to enforce the shared-nothing semantics of the actor model, and ensures that each actor has a distinct copy of the data without requiring immutable state, hence, race conditions are not possible. When the data conveyed by a channel is another channel a duplicate of the channel, including any existing connections, is created and sent to the receiving actor. Once received, the channel is adopted by the receiving actor. This ensures that both sending and receiving actors have distinct channels, which have identical connections; hence this guarantees that one actor can not use a channel which is owned by another actor. Deep-copying in this manner is not always efficient, and is discussed further in Section 3.2.

There is no requirement in the language for a channel to be connected before a communication action is performed on it. In the previous example, should actor A attempt to send and actor B attempt to receive when their channels are not connected, they will both block indefinitely. Any subsequent connect operation on these channels will bind them together, facilitate the message being sent from A to B, and unblock A and B. This feature is useful in decoupling the overall application design and individual actor logic. Here the actor sim- ply waits for input, without having to worry about the overall topology. This is common in distributed and event-driven applications, where logic will wait for input.

As well as single channel operations, it is also possible for an actor to receive data from multiple channels in a single action via the select statement, see lines 30-40 in Listing 3.2.

1 type Isender is interface(out integer output) 2 type Ireceiver is interface(in integer input) 3 type Iselector is interface(in integer input1;

4 in integer input2)

5 stage home{

6 actor sender presents Isender{

7 val = 0;

8 constructor(integer init){

9 val := init;

10 }

11 behaviour{

12 send val on output;

13 val := val + 1;

14 }

15 }

17 actor receiver presents Ireceiver{

18 constructor(){}

19 behaviour{

20 receive val from input;

21 printInt(val);

22 printString("\n was received");

23 }

24 }

26 actor selector presents Iselector{

27 example = 42;

28 constructor(){}

29 behaviour{

30 select{

31 receive val from input1 where example > 39: {

32 // do stuff

33 }

34 receive val from input2 : {

35 // do stuff 36 } 37 default : { 38 // do other stuff 39 } 40 } 41 } 42 } 43 boot{

44 send1 = new sender(-100); 45 send2 = new sender(0); 46 send3 = new sender(100); 47

48 recv = new receiver();

49 connect send1.output to recv.input; 50

51 sel = new selector();

52 connect send2.output to sel.input1; 53 connect send3.output to sel.input2;

54 }

55 }

A select statement is used to non-deterministically choose between one or more in channels based on the state of the channels and an optional boolean guard (where clause). The boolean guard values are used to decide which channels of the select statement may be cho- sen from, and forms a set of eligible channels. These expressions are evaluated at runtime, and may contain any literal, expression, variable or procedure which is in scope. Once a set of eligible channels have been determined, they are examined to determine if any are ready to provide data immediately. If there is more than one such channel, a non-deterministic choice is made and the data is retrieved from that channel with the same logic as receive. If there were one or more eligible channels, but none of which are ready to provide data, the actor will block on all eligible channels unless a default clause is provided, in which case the default clause will fire. A blocked actor will awaken when data is pushed to one of the channels upon which it has blocked.

The select statement is one way in which user level timeouts can be implemented. By selecting between a set of channels conveying useful data, as well as one registered with the timer actor (Section 3.1.8) to deliver data after a set time, the actor will only block as long as the timeout. Conceptually, select is similar to the select function in the sockets networking API [100].

Buffering

In practice, always blocking on single channels can lead to channel deadlock, where a number of actors are blocked awaiting messages from each other. This can be particularly prob- lematic where messages are sent in non-deterministic order from actors which interact with the outside world; consider an actor waiting for a timeout. This is particularly true for actors which consume and produce data across channels at different rates.

For this reason, in channels may optionally be defined with buffers. Here, the communication semantics are modified as follows. An actor always sends on an out channel; a message will be buffered if the in channel to which it is bound has buffer space available. If there is no space in the buffer, the sending actor will block as before. When the receiving actor invokes a receive on the in channel, it will either retrieve any data in the buffer, or default to the blocking-rendezvous semantics. For the purpose of simplicity, only in channels may have buffers. The use of buffering enables asynchronous communication and hence provides a way to avoid deadlock. The correct provision of buffer sizes is an application specific detail.

It is important to note that the completion of a send operation guarantees that the message has been conveyed to an actor, but it does not guarantee that the actor has processed the message. If required, such guarantees should be implemented within the application. Also, communication guarantees ordering per connection, but not between actors. Consider the

actors connected as shown in c) of Figure 3.2. In this case, all data sent from S1 will arrive in order relative to S1, and equally for S2. However, there is no causal ordering of data delivery between S1 and S2. Section 4.4.2 provides a discussion of how this guarantee is provided at runtime.

Another option to relieve channel deadlock would have been to perform analysis to determine if the current configuration of channel connections would result in deadlock, as is done in the GO language [101], however this was not feasible for a number of reasons. Firstly, in Ensemble, all channel connections are dynamic and occur at runtime. This precludes the use of static analysis at compiletime, as the topology is not static. Secondly, unlike GO, Ensemble’s channels facilitate location transparency (Section 3.4), which would require distributed analysis of the connection graph. While this is possible, it is beyond the scope of this dissertation.

3.1.3 Interfaces

An interface is a language construct which is used to define one or more channels, Listing 3.2 lines 1 - 4. An interface is a static entity, consisting of only the channels expressed in its definition. Any channel defined in an interface is available within an actor which presents that interface.

In order to preserve the encapsulation semantics of the language, there are limitations on the actions possible on channels which are dereferenced from an interface. This is because such channels represent the actual channels of the actor referenced by the interface, rather than the channels of an actor which are connected to such channels. Consequently, channels dereferenced from interfaces may only be bound to, or sent along a channel. It is not possible to disconnect, send along, receive from, or select across such a channel. To do so would violate the isolation of the referenced actor, as one actor would be able to manipulate another actor’s state directly. The connect operation is such a violation, however it is necessary to enable useful work between this actor’s channels and the remote actor’s channels. It is allowed as it does not modify any existing connections in the channel. In contrast, to perform a disconnect would potentially remove connections between the remote actor and third party actors, and is a clear violation of the semantics of the language. This does not prevent the local actor from performing a disconnect on the local channel which was bound to the interface channel, or having the actor referenced by the interface performing a disconnect on its local channels.

3.1.4 Stages

A stage represents a memory space within which actors operate. Conceptually, there may be many stages per physical computer, although currently there is only one per physical location; this is a limitation of the runtime, rather than the language or model, Section 4.4.2. A stage is defined with a name, which is used as an identifier at runtime, Listing 3.2 line 5. Note that this name is not necessarily universally unique, however there are other mecha- nisms which can be used to determine uniqueness, see Section 3.4. An actor may reference the current stage it occupies with the here keyword.

Within a stage there are two main sections. The first section is used to define language constructs to be used within this stage. This section usually contains actor definitions, but may also include query and procedure definitions, Listing 3.2 lines 6-42. Equally, this section may be left empty. Defining an empty stage is useful when creating an Ensemble environ- ment across multiple physical locations for use with adaptation, Section 3.4.

The second section is the boot clause, Listing 3.2 lines 43-54. The boot clause acts as the main for this stage and will be invoked upon its creation. Any statement may be invoked within the boot clause, except actor specific actions, such as publishing properties (Section 3.4.1). Again, this clause may be left empty. The boot clause is primarily used to instantiate actors, and connect channels. Any actors instantiated within this boot clause will be created and execute within the context of this stage.

Whereas actors are entirely controlled from within the language, stages are controlled from the command line, outwith the language. This is discussed further in Section 3.4.6.

3.1.5 Types

Listing 3.3 shows the default types available in the language, as well as how to define new types. All types in Ensemble must be initialised with legal values when declared; there is no NULL type in the language. This is done to make the language safer and remove the possibility of invalid values. Although it is still possible to create incorrect code from a logical perspective, there will be no runtime type errors or illegal references.

Memory is allocated from the heap using the new keyword. As Ensemble uses automatic garbage collection to simplify memory management, there is no explicit action to return memory to the heap.

The Any Type

The any type is an infinite union type, which may be used to abstract any other type. Al- though a particular type may be cast to an any type, the original type can only be recovered

1 type Itype_examples is interface(out integer output) 2 type my_struct is struct(integer a)

3 type my_enum is enum(alpha, beta, gamma, delta) 4

5 stage home{

6 actor type_examples presents Itype_examples {

7 int_val = 1; 8 uint_val = 1u; 9 long_val = 1L; 10 double_val = 0.0; 11 str_val = "hello"; 12 bool_val = true;

13 struct_val = new my_struct(int_val);

14 array_val = new my_struct[100] of struct_val;

15 constructor() {} 16 behaviour { 17 ... 18 } 19 } 20 boot{ 21 s = new sender(); 22 } 23 }

Listing 3.3: Declaration of Ensemble Types

via the use of the project statement, as shown in Listing 3.4.

The project statement ensures that the any type can only be decoded to types which are defined within the current scope. This provides safety in the language as an any type cannot be arbitrarily cast to any other type. Should the underlying type of the any not be one of the types specified in the project clause, the mandatory default clause will be selected. In this case, the any type can still be used; for example, the value can be sent along a channel or used with another project statement.

One advantage of the any type is in combination with channels. By only receiving messages of the any type over a single channel, the receiving actor can either successfully decode the type and process the data, or discard the data. This enables logic more similar to traditional actor-based approaches which do not use explicit channels, but rather send messages to actors directly as proposed by Hewitt [4].

Collections

Currently, the only language-specified collection type is an array. Ensemble arrays are fixed length, although this size can be specified as either a literal value, or expression. Arrays may contain references but all elements must be of the same type. However, an array with elements of the any type is allowed.

1 type Iexample is interface(in any input)

2 type complicated_struct_type is struct(integer alpha;

3 string beta)

4 ...

5 behaviour{

6 receive any_val from input; 7

8 project any_val as mesg{

9 integer : {

10 printString("Got an Integer!\n");

11 printInt(mesg);

12 }

13 complicated_struct_type : {

14 printString("Got a complicated_struct_type!\n");

15 printInt(mesg.alpha);

16 printString(mesg.beta);

17 }

18 default : {

19 printString("Don’t recognise the type\n");

20 }

21 }

22 }

Listing 3.4: The Any Type and Project Statement

Given the types available in the language, higher level collections can be created. For example, it is possible to create user-defined linked lists using the struct and any types; linked lists are used within the draughts example, discussed in Section 5.3.1. Although it would have been useful to have a more complete set of collections, it was not the focus of this work.

3.1.6 Security

Security is not represented within the language as it was beyond the scope of the work. This said, Section 4.7 discusses security within the runtime.

3.1.7 Failure Model

The failure model in Ensemble uses a combination of explicit and implicit error handling. This is done to simplify the programming model, while enabling the option of fine grained control.

Explicit Error Handling

Explicit error handling is enabled via exceptions, which are defined by the language; it is not possible to create a user-defined exception. This choice was made to ensure that exceptions

Exception Description

OutOfMemoryException All heap memory allocated

NullPointerException There was a NULL pointer found in the runtime IndexOutOfBoundsException Trying to access out with the bounds of an array

DivisionByZeroException Trying to divide by zero

DuplicatePropertyException Trying to publish an array of properties where there are two properties with the same key

StageNotFoundException The supplied stage was not accessible ChannelNotFoundException The supplied channel was not accessible ConnectionFailureException A network error occurred

SpawnException An error occurred with the spawn process MigrationException An error occurred with the migration process

ActorNotCompatibleException The targeted actor may not be replaced by the one provided

Table 3.1: Description of Ensemble Exceptions

are used for truly exceptional events, as opposed to being used as flags. Using exceptions in this way minimises the complexity of code. The primary purpose of exceptions are to report on failures in the distributed features of Ensemble. Table 3.1 describes the possible exceptions in the language, and Listing 3.5 shows an example of explicit exception handling.

Implicit Error Handling

Implicit error handling is influenced by the let it fail model, as first introduced with Er- lang [102]. In this model, programmers are advised to let their applications crash, rather

In document A linguistic approach to concurrent, distributed, and adaptive programming across heterogeneous platforms (Page 50-61)