A data-centric event-oriented RTOS for MCUs

(1)

- simplify multithreaded programming

and boost performance

Introduction

This paper presents a fundamentally new approach to real time programming that resulted in a new type of real time operating system. The data-centric event-oriented RTOS is compared to the typical approach of traditional RTOSes. Some of the ideas have already been

presented in my article “A data-centric OS for MCUs – a real-time publisher-subscriber-mechanism”, published in may 2006 in “Embedded Systems Europe”. Since then – and with some input from Willert Software Tools - I have been working on the concept and

implemented the DCEO-RTOS. By “data-centric” I mean that the focus of programming is shifted from code to data-flow, by “event-orientation” I mean that the changing of data is treated as an event.

The key benefits are faster reaction for high priority tasks, highly improved use of processor performance, easier – and hence safer – multithreaded programming and finally easier implementation of more flexible systems.

Throughout this presentation I will:

1. explain the foundation stones of the concept

2. show how they are being put together to form an RTOS 3. partly explain its interface and configuration, and 4. show an example project with working code. Foundation - REACT

REACT using interrupt service routines (ISRs). That’s what processors provide interrupts for: true and fast reaction. So why not use them? Do not wait for the task-scheduler to come around and cut the next time-slice, let the system react immediately.

Reduce jitter

The task-schedulers are the central piece of code in traditional RTOSes. They introduce jitter by cutting time into slices. The magnitude of this jitter is – at least - the length of such a time-slice. Compared to this it is obvious that the reaction of an ISR is faster and comes with less jitter.

Do not switch tasks

Reacting from inside the SW-ISR supersedes the necessity of a task-switch, too. In

traditional RTOSes the scheduler is a high priority interrupt that saves the state of the currently running task-status and restores the state of the next task to run. This saving and restoration is expensive in terms of processor time that is consumed. All register contents have to be saved and the tasks stacks have to be swapped. Obviously, not having to do this saves processor time and improves the deterministic reaction. 1 main() 2 { 3 CreateTask(MyTask); 4 ... 5 while (TRUE); 6 } 7 8 MyTask() 9 { // implements reaction 10 while (TRUE) 11 { 12 WaitForEvent(&myEvent); 13 // react to myEvent here 14 reaction = g_input * xxx 15 }; 16 } 17 18 MyISR() __irq 19 { // do some HW stuff to 20 // retrieve input 21 g_input = HW; 22 SetEvent(&myEvent); 23 }

(2)

Reduce programming overhead

Using a traditional RTOS the burden of realizing a reaction is entirely on the programmers side. He has to program reaction. A very common way to do this is shown in listing 1. Something happens that causes an initial interrupt. The ISR sets an event. A task that waits for the event is eventually resumed by the RTOS. Very often this task then accesses the global data (that has been filled by the ISR) calculates and realizes a reaction. To me this “ISR-setting-an-event-to-trigger-some-other-task-waiting-for-the-event” appears to be a long winded way to achieve a simple thing – namely react.

Also, putting a task to sleep when it starts waiting and waking it up again involves two task-switches, and thereby produce overhead again.

Simplify Synchronization

The next interrupt might occur while MyTask is in the middle of reading the global input data. So the global data I use for calculating the reaction might be corrupted. If part of g_input has already been read by MyTask, when the interrupt comes again and overwrites g_input then MyTask will execute using inconsistent data. So two asynchronous processes accessing the same bit of global data have to synchronize their access to that piece of global data. I know, I am being obvious again. The point I am trying to make is not to explain synchronization, it’s the fact that synchronization is expensive. Synchronization will be discussed in more detail further down.

Traditional RTOSes come at the cost of jitter, performance overhead due to task-switches, programming overhead, and the overhead and added jitter due to synchronization – not to mention the difficulty of identifying the necessity of synchronization in more complex systems.

OK, I have been bashing the “traditional RTOS” a bit. I’m afraid you’ll rightly expect me to come up with a good alternative now. I’ll try my best.

Use SW-interrupts

SW-interrupts are the same as HW-interrupts, the only difference being their cause. While the HW-interrupt is e.g. caused by the completion of an AD-conversion or an external pin, the cause of a interrupt is setting the trigger bit by code. The advantages of using a SW-interrupt are:

• no jitter due to cutting time into slices

• no performance overhead due to task-switching • pre-emptive by nature

Have a look at listing 2 and compare it to listing 1. The main() function only configures interrupts and loops doing nothing. ‘MyTask’ again implements the reaction, but this time it is an ISR, clearly an ISR for a SW-interrupt. Note that this task is not created (CreateTask) and doesn’t loop. It ends, when the reaction is finished and its called again when the interrupt is triggered again. It also does not “wait for an event”. The ‘real’ ISR triggers the SW-interrupt request instead of “setting an event”.

(3)

Comparing the lines-of-code of this very simple example you can see that the programming overhead is a little less, too.

By using a SW-interrupt for the reaction the real work has been shifted from the OS-kernel (task-management and event-handling) to the processor hardware. If you call this usage of interrupts unsafe or tricky programming ask yourself how long does an ISR have to be to be unsafe? This sort of coding has been running very efficiently for years in many projects I have been involved in. Letting user-code run in interrupts is no more scary than user code in main.

What does it cost? These advantages are not for free. You pay by sacrificing a few interrupt sources. But using the DCEO-RTOS you DO NOT need an interrupt for each task! Very few will suffice. Most processors provide an abundance of available interrupt sources and no single project uses them all.

Synchronization

The benefits of multithreaded or multitasked programming are that important things can be done first and pre-empt the less important ones. This is great, but there is a downside attached to it: Multithreaded programming is a lot more complicated and many errors related to coincidence occur never or very rarely during testing. With this kind of errors I am usually quite happy when I am able to reproduce them and can start debugging (in real time of course, which works very well for example with KEIL’s ULINK). The problem is to identify which variables need synchronization. Does a 32 bit data item need to be synchronized when accessed from two processes? If you have a 32 bit processor, bus and RAM then the answer is no. If any of processor, bus or RAM is less than 32 bit then the answer is yes. In a complete 32 bit system a 32-bit data item is written or read in one instance. So reading and writing cannot interrupt each other to cause data corruption.

You need to synchronize, whenever you access a consistent data item that is larger than the smallest data-width in your system. A systematic approach of identifying synchronization requirements is to look at all global resources, for each find all its usages in the source code and then determine in which tasks, interrupts and threads that code may be executed. A good call graph generated by a compiler/linker is helpful here.

Synchronization cannot be avoided altogether if the decision in favor of parallel processes has been made. In the next passage I’ll quickly examine the fundamental approaches to synchronization and discuss their pros and cons.

Here are two substantially different ways to deal with synchronization:

1. Mutually exclude all threads from simultaneously accessing the same variable. 2. Use double buffers or ringbuffers to avoid the need for process locking.

Mutex is short for mutual exclusion. One thread enters a mutex. When another thread wants to access the mutex while the first still owns it, then the owning thread is resumed and carries on at the waiting threads priority until it leaves the mutex again. Then the waiting task continues as shown in figure 1. This is called priority inversion.

1 main() 2 { 3 // setup ISRs 4 while (TRUE); 5 } 6 7 MyTask() __irq 8 { // implements reaction

9 // react to myEvent here

10 reaction = g_input * xxx 11 } 12 13 MyISR() __irq 14 { // do some HW stuff to 15 // retrieve input 16 g_input = HW; 17 myTaskTrigger = 1; 18 }

(4)

Priority inversion occurs only if the timing of the tasks is such that the higher priority task is trying to enter an already entered mutex. It involves two task switches at 3 and 4. This overhead occurs only sporadically and adds to jitter in the reaction of both tasks 1 and 2.

Buffering is another way to deal with synchronization. Double-buffers or ring-buffers are used to store the data if the real resource is currently blocked. The resource itself then checks the buffer for waiting items when it becomes available again. A thread that wants to send a byte via a locked RS232 just puts the data into the next slot of the ringbuffer. Now the task of synchronization is reduced to “obtaining the next free slot”. Then only the

incrementation of the ring buffer index needs protection.

Other means of synchronization like disabling of interrupts (the brute force solution) or lock-free algorithms (buffering taken to the

extreme) are beyond the scope of this paper.

Publisher – Subscriber -Mechanism The DCEO-RTOS has a publisher-subscriber mechanism built into the core. Once a data-objects is created any part of the project can subscribe to changes of that data-object. When the data changes the core will call the subscribing callback at a predefined priority and passes a safe copy of the data into the callback. As a result the programmer of the callback doesn’t have to worry about

synchronization at all. This makes the project much easier to implement and errors become less likely.

DCEO-RTOS

Data-Publisher

safe data copy passed into callbacks

change data object Subscriber 1 (callback function) at low priority Subscriber n (callback function) at priority m

Fig 2: Publisher - Subscriber Mechanism

low prio high prio Critical Section 1 Task 1 enters Mutex A OS calls higher prio task 2 Process flow Task 1 Task 2

Task 2 tries to enter Mutex A. This causes a priority inversion and Task 1 is resumed at HIGH priority.

inactive Task running Task suspended Task

Task 1 leaves Mutex A and is suspended to low priority again. Task 2 is resumed. Task 2 leaves Mutex A Task 2 finished Task 1 finished spare processor time ;-) 1 2 3 ₄ 5

(5)

Cyclic Tasks are provided, too. They are callbacks - similar to the data-subscriber-callbacks – but the are called when a cyclic timer or a timeout has expired. They can be freely

reprogrammed at run-time.

A Task-Group is a collection of all those timer callbacks or data-subscriber callbacks (the mini-tasks) that run at the same interrupt priority. All these mini-tasks cannot interrupt each other. Threads that cannot interrupt each other need no synchronization. Tasks within the same group can safely share common resources, e.g. access a display. So drawing a user-interface boils down to creating a lot of subscribers (at the same priority) to all the data that is to be shown on the display. They can run at a fairly low priority. When some data changes the appropriate subscriber gets called and redraws its corner in the display.

After having touched on a number of problems of multithreaded SW-development now is the right time to put the pieces back together and illustrate the concept by use of an example.

Example

The projects objectives: Perform cyclic AD-conversions with an adjustable cycle period at which to trigger the conversions. React by setting an output frequency.Report changes in AD-values via RS232 (which represents the user interface for this example). Also implement a simple clock and report time to RS232 every cycle. The period for AD-conversions is to be set by a RS232 command.

In a first attempt to meet these objectives I forget the clock and provide only a placeholder for the reaction. I start the SW-design by thinking up modules, mini-tasks and the interrupt levels they should run at and the data-objects that flow through the project.

Here’s the list of code modules I plan to implement or use:

• AscString for sending and receiving complete strings via RS232 • ADC for the AD-conversion

Here is a list of priority levels (interrupt priorities the project needs) – from high to low: • Subscriber to new ADC values implementing HW-reaction – highest priority • ADC ISR – the ISR has its own interrupt level

• Trigger next AD conversion – cyclic mini-task at medium priority • (RS232 modules own ISRs)

• all usage of RS232 communication on one level for synchronization (AD result reporting and later timer for the clock). Subscriber and later a cyclic-mini task, too. This is to be defined as a low priority task group.

To list the objects for the implementation one should ask oneself “What are the items the application uses to react on?” not “which variables will I need?” Here are the data-objects for this first step of the demo project.

• AD conversion result

(6)

Three priorities for the task groups have been identified as listed above and shown in figure 3. Apart from that there are two ISRs which have their own priority. The DCEO-RTOS provides a central place to define

task-groups and interrupt resources for user implemented interrupts. Listing 3 shows how the task groups are defined to use interrupt levels 7, 11 and 14 and channels 25, 26, and 28. (The ARM7 has 32 Interrupt request channels, lower priority values having higher priority.)

The user interrupts fit in at interrupt levels 8 and 12 with channels 18 and 7. These channels are assigned by the LPC2138 processor to UART1 and AD0.

The one remaining interrupt shown in listing 3 is the timer interrupt used for the time base. So if the application

programmer needs to program something more important and exact than the time base of the RTOS, he is free to adjust the priority of this time base a little lower and fit in his own interrupt above. However, the priority of the timebase has to be higher than the highest priority of any task-group. void main() { Task_Init(); AscInit(115200); AscStrInit(0, '\r'); AdcInit();

// enter callbacks for task-group priorities DataAddNotifyCB(OnAdValChangedHigh, obIdAdcVal, 0); TaskT_CallBackCyclicAt(30, OnCyclicAdcConvTimer, 1, 1); DataAddNotifyCB(OnRxStrRecv, obIdRecvString, 2); DataAddNotifyCB(OnAdValChangedLow, obIdAdcVal, 2); while(TRUE); }

Listing 4: register mini-tasks with the DCEO-RTOS

// define your Task Groups here const TaskGroup g_taskGroups[] = { // ilvl vicChannel { 7, 25}, // high { 11, 26}, // medium { 14, 28} // low }; ...

// define your own interrupts here

ILVL VIC channelNo

#define ADC_ILVL 8 #define ADC_CHANNEL 18 #define ASC_ILVL 12 #define ASC_CHANNEL 7 #define TASK_T_ILVL 4 #define TASK_T_CHANNEL 5

Listing 3: Define the interrupt resources for Task-Groups and own interrupts

Fig 3: Relations between priorities, modules and data-objects ADC ISR Reaction write to UI trigger conv. ASC ISR Data object AD value passed to subscribers High priority

(7)

At this point it becomes obvious how the DCEO-RTOS is using up one interrupt for each task-group.

The tasks are initialized as shown in listing 4. Setting up of the system is straightforward. You register your own callback functions (which are the mini-tasks) with the RTOS. Subscriber functions (here OnAdValChangedxxx) are registered. The last parameter of the registration function is the priority at wich the subscriber task is to be called.

Now we’re nearly done. Let’s just have a look at how to set a data-object (Listing 5), an example of a timer-task (Listing 6) and a subscriber task (Listing 7).

Note that any data-object can be set safely from any priority without any need for synchronization.

Note that the OnCyclicAdcConvTimer in listing 6 receives a parameter int i. This parameter contains the same value as was passed in the registration function. This gives you the freedom to write a single callback and register it with two or more timers while the task knows which time base it is being called from.

The function OnAdValChangedHigh gets a data pointer with the new data passed into it. You can safely use the data pointed to by pData and not worry about synchronization.

void adcIsr (void) __irq {

WORD adVal; ...

adVal = (AD0DR & 0xffc0)>>6; // Read A/D Data Register

if (adVal != g_lastAdVal)

{ // if the value differs from the last, set the new one DataSetObject((BYTE*)&adVal, sizeof(adVal), obIdAdcVal); g_lastAdVal = adVal;

} ... }

Listing 5: the AD ISR writes the conversion result to a data-object.

void OnCyclicAdcConvTimer(int i) { // start AD-conversion

AdcConversionStart(); }

Listing 6: example of a timed task

void OnAdValChangedHigh(BYTE* pData, DWORD objectId) { // do something with pointed to by pData;

WORD adcValue = *(WORD*)pData; ...

}

(8)

The running example project Figure 4 shows a performance measurement of the running project. Channel 1 toggles when a new conversion is triggered. Channel 2 shows activity of the cyclic medium prio task, that triggers conversions. Channel 3 shows activity of the ADC ISR. Channel 4 represents the activity of the subscriber-task for the high priority reaction. Note that the reaction intercepts the ISR itself.

Channel 5 shows activity of the low priority subscriber that converts the data to a string and write it to the RS232 port.

Channel 7 shows activity of the time base (100µs resolution) and channel 8 measures all processor activity other than idle. Two more generally relevant measures can be read out of this snapshot. The context-switch-time measures the time to switch from one task to another. In figure 4 I have measured this time from leaving the high prio reaction task (channel 6 2nd time high) to continuing the ADC ISR (channel 6 2nd time low). The measured 4.25µs compare well to other RTOSes running an ARM7 at 60MHz (internal).

Two other times are relevant: Channel 7 low to channel 1 high (7.75µs) is a measure of the overhead for a timer callback (time-base-tick to start of mini-task including context switch). 1st Channel 6 high to 1st channel 6 low measures the overhead involved from setting a data-object to its subscriber-task being activated (15.75µs, again including context-switch). This allows you to create projects with a total reaction time (external event to external HW-reaction) from as little as about 20µs.

Next Step: add a clock to the project

The first step has been completed and we still have to add a clock output to the UI.

To illustrate the flexibility I have added UI-commands to turn the clock on and off. “clock on” turns the clock on, “clock off” turns it off. I also added the commands “ad report on” and “ad report off”. So the user can now configure the UI that he or she wants to see. The majority of the required code for this change is shown in listing 8. A mini-task that increments and writes the clock count (OnCyclicClock) has been added as well as the reaction to the new user commands. The latter simply adds or removes a subscriber-task and the timed mini-tasks at run-time.

Because this example uses the publisher-subscriber mechanism in a limited way only I’d like to dwell on its impact on the SW-development process a bit more.

(9)

Cause-effect relations: one to one, one to many, many to one

The above example only illustrates a one-to-many relation between cause and effect. The AD-value changed and there were two reactions at two different priorities. One at high priority, generating some HW output the other a low priority, reporting the changed value to the user. This is a one-to-many relation: two reactions for one cause.

The many-to-one relation is also useful. Imagine a case where a control loop takes an input and also has an amplification parameter. You’d like to react to changes of the input value, but if the amplification changes and the input value does not, you’d certainly want your output to be adjusted immediately. So the project needs to react to changes of either the input or the amplification. You could write two subscribers, each calling the same reaction function.

WORD g_clockCount; void OnCyclicClock(int i) { char clockString[32]; BYTE len; g_clockCount++;

sprintf(clockString, "clock count = %d\r\n", g_clockCount);

len = strlen(clockString); AscStrSend(clockString, len); }

void OnRxStrRecv(BYTE* pData, DWORD objectId) { ... switch(CiGetCommand(pRecvStr->string, pRecvStr->len)) { case CmdPeriod: ... unchanged case CmdAdReportOn: if (DataAddNotifyCB(OnAdValChangedLow, obIdAdcVal, 2)) ... write response break; case CmdAdReportOff: if (DataDelNotifyCB(OnAdValChangedLow, obIdAdcVal, 2)) ... write response break; case CmdClockOn: g_clockCount = 0;

if (TaskT_CallBackCyclicAt(10000, OnCyclicClock, 1, 2) != NULL) ... write response break; case CmdClockOff: if (TaskT_CallBackRemoveCyclicAt(10000, OnCyclicClock, 1, 2)) ... write response break; ... }

(10)

But it’s easier, faster and less code to react using one subscriber task and register it with two data-objects containing input and amplification. The subscriber-task knows which data-object has changed by looking at the ID of the data-object passed in as a parameter.

Effect on SW-Architecture: module dependencies, reusability and testability

In a publisher-subscriber-mechanism the dependencies between modules only go in one direction. In this example the ADC-module does not depend on any other module. It publishes the data-object ID of the AD value in its interface (i.e. its header file). Without the DCEO-RTOS the provider of this data would have to agree on how the data is passed on (global data or message queue item), how access to it is synchronized, the event that is set when new data is available, etc. This creates a mutual dependency between the modules involved. One module cannot be reused as it is without the other. For the same reason reusing the module in specifically designed stress testing projects is not as simple. Concept summary

The DCEO-RTOS provides a framework for creating highly responsive systems fulfilling hard real-time requirements. A high degree of flexibility can be realized by changing subscribers at run-time. Needs to synchronize are waived in most cases due to properly designed task-groups whose mini-tasks can freely share resources. Safe pointers to copies of the content of data-objects get passed into subscriber-mini-tasks. Performances burdening task-switches and the task-scheduler have been replaced altogether by using the processors prioritized interrupt logic to do the same job faster. Coding for transporting data is partly replaced by flexibly subscribing to data-objects and data triggers itself through the project. However, some overhead due to the implementation of the publisher-subscriber mechanism and cyclic tasks remains.

Feature summary:

• Performance React to changed data in subscribers with fresh data being passed into the mini-task. No performance loss due to synchronization of data-access.

• Simplicity Short mini-tasks, no programmed waiting for and setting of events • Safety Safe data passed into mini-tasks, no worry about synchronization or

asynchronous effects.

• Flexibility Freely change subscribers, cyclic and time-out mini-tasks at run-time. • Reusability Unidirectional dependency between provider and subscriber allows

software modules to be reused unchanged in other projects by sharing code across projects with the help of a source code

versioning system.

• Testability Easily create designated projects to apply stress-tests to code-modules.

• Compatibility For those who like to carry on programming the way they are used to, there is a compatibility switch. Turn it on if you still want events, mutexes and real context switches where every task has its own stack. It can be turned off at a later stage to take full advantage of the DCEO-RTOS.

Dirk Braun graduated at King’s College - University of London. His SW-experience started with the development of PC-programs in industrial automation, databases and moved on to electronics, OO - programming and SW-design for embedded systems. He held various courses on C, C++, Java and SW-design.