Challenges during modelling - Discussion on modelling with Computational Causal Behaviour Model

3.4 Discussion on modelling with Computational Causal Behaviour Models

3.4.2 Challenges during modelling

The process of developing a model is a complex one where the experimental settings, the experiment planning, the correctness of the ground truth, the model implementation, and the inference mechanism all influence the final product. Even more, the experience showed that problems in one of these components could cause unexpected and unwanted model behaviour or evaluation results. For that reason, below some of the most important issues are discussed. 3.4.2.1 Problems with the experiment planning and settings

Planning an experiment might seem a straight forward task – decide on the scenario, decide on the participants and their agenda, decide on the sensors and provide the appropriate infrastructure. However, small details such as the step by step action execution or the constraints a participant should have during the experiment should also be carefully planned. For example, during the planning of the long meeting experiment, a discussion after each presentation was planned. However, it was not explicitly explained to the participants that the discussion can take place only after a presentation and not during the presentation. Thus when the meeting started where each of them presented a real topic, some of the participants asked their questions during the presentation making it impossible to distinguish between presentation and discussion, given the sensor infrastructure. Another example is the cooking task, where the participants knew what the task in hand is and what the different phases are, but each of them executed the actions in a given phase in completely different order. That made it impossible to introduce lock mechanisms that can considerably decrease the state-space.

One could argue that such variability is expected in real world scenarios, however when the goal of the experiment is to show that a given approach works at all, there should be carefully controlled experiment settings and tasks execution.

Additionally, one should always consider problems with the sensor infrastructure, failing sensors or missing records. Before the experiment, the damage of such problems could be decreased by checking whether the sensors and the whole infrastructure are working several times before the experiment, checking the sensors batteries, or having backup sensors. After the experiment, although the damage could not be possibly repaired, one should check the collected data for missing or faulty values, as such could cause unexpected model behaviour. 3.4.2.2 Problems with the ground truth

Conducting the experiment is actually the easier part, obtaining appropriate and correct annotation could turn into a real challenge. The annotation is the ground truth with which the activities estimated by the model are compared in order to evaluate how good the model is performing. It is obtained by defining natural language descriptions of the actions that take

place during the experiment, and the exact time they started and the time they lasted. The annotation is created by human annotators and usually with the help of different annotation tools e.g. [84].

To evaluate the performance of causal models, the annotation itself has to be causally correct. While for machine learning approaches it would not be a big issue if the person suddenly teleports from one place to another, for causal models that is just impossible. Additionally, publicly available datasets like those in the CMU Multi-Modal Activity Database [145] contain gaps in the annotation, annotation not matching with the activities shown on the corresponding video etc. Depending on the length of the activities short incorrectly annotated activities will also cause small deviation in the accuracy computation. However, long actions that were incorrectly annotated could cause considerable drop in the estimated model performance.

Depending on the approach used for activity recognition, gaps in the annotation, or objects and persons that just appear from one place to another without being manipulated or have moved could cause serious problems. In causal approaches like CCBM the model then is not able to causally explain given annotation. This could lead, first to decreasing the model accuracy, and second to inability to generate useful observations from the annotation. For example, in the cooking task there were some gaps in the first version of the annotation as well as objects that appeared from place to place without being moved. After generating observations from the annotation, this led to the model being unable to explain them not because the model was wrong but because the annotation was causally incorrect.

3.4.2.3 Problems with the models

Although CCBM uses straight forward precondition-effect pairs for describing the action templates, it could sometimes be a challenge to create a template reflecting the needs of the problem. This is mainly because the incorrect use of a single predicate or of a logical expression causes the model to act unexpectedly. Such problem could easily be tracked in simple models like the team model for the meeting scenario, or the office scenario. However, in a more complex model like the cooking task, it is almost impossible to find the problem without some kind of backtracking mechanism for model implementations. The cooking task model led to the need of recording the changes and the reasons for them after each model change.

Furthermore, the models were initially implemented with the idea of creating a general domain description. Yet, the practice showed that it is extremely difficult to cope with such model as it has high degree of freedom and causes problems with some of the action selection heuristics (e.g. the goal distance, which can be calculated only after the whole state space is analysed).

Another issue was caused by the actions durations. Choosing appropriate duration and the corresponding probability distribution could be crucial for the correctness of the inferred action. Actions with too short or long durations tend either to cause the model to select another action although in reality the current action is still running, or to continue estimating the same action although it has already ended.

However the opposite – assigning too exact duration, or allowing too small behaviour varia- tions, leads to model overfitting and its inability to cope with new data from the given scenario. For example, in the cooking task scenario, it was necessary to find the middle ground between the high state-space and the ability to explain new data. Another example showing overfitting, was the office dataset where for each action and user relatively exact duration was assigned. This would probably yield good results for more restricted model, however the high degree of freedom caused the wrong duration to be the one that is usually assigned.

A serious issue during the model development and evaluation was related to traceability issues. It was extremely difficult to discover the reasons behind past changes as there was no substantial documentation about these changes. Although the model changes were under version control, detailed information about the reasons behind these changes was missing. The same problem was observed during attempts to reproduce given results. This was due to the fact that the random seed generator uses the current time for generating values; due to different inference parameters that were not documented; due to undocumented changes in the model after the first results were obtained, etc. To avoid that, at a later point, a script was introduced generating a documentation file. The file then provides information about all parameters and scripts used for producing given results. The script also created copies of all scripts and files involved in the inference and evaluation process.

3.4.2.4 The influence of the inference mechanism on the model behaviour

As the particle filter provides an approximation of the estimated state, it is dependent on the random number (called random seed) used for sampling the state-space. In the particle filter at each time step the state space is approximated with a weighted set of samples (particles) where the weight of each particle is proportional to the particle’s probability. This however leads to the problem that at some point many of the samples have very low weights and only a few of them will have significant weight. Or with other words, the number of effective particles drops significantly with time. To cope with this problem and increase the number of effective particles, resampling is performed. During resampling, a new set of the original size of particles is drawn with replacement from the discrete approximation of the samples distribution. In this new set all particles have the same weight. However, this still does not solve the problem that particles with larger weights will be drawn more often than such with low weights. This means that after resampling the diversity of the particles will decrease. To solve this new problem, random numbers are used in order to determine which particles to be drawn. For more details on the particle filter and resampling see e.g. [68].

It is then possible that the random seed influences the system in a different way. For example, it is possible that the correct hypothesis is not available because it was not sampled. It is also possible that the particle representing the correct state has too low weight thus it was never selected. This will result in the wrong state being estimated. Another possibility is that there were not enough particles to represent the density distribution. It is especially true in problems with large state-spaces like the cooking problem where the state-space is several hundred thousand states.

Additional problem could be caused by the action selection heuristics. For example, a model using the goal distance as a heuristic will have problems with a subject that is not acting in an obviously goal oriented manner. Each of the action selection heuristics, when inappropri- ately used, can lead to the wrong hypothesis being selected.

Furthermore, it is possible that the actions were assigned inappropriate probability distribution, or the chosen values did not represent the reality. This could result in either the action terminating sooner than in reality, or that the inference engine believes the action is still being executed even when in reality new action has started.

The above issues do not mean that the causal model is incorrect, they just indicate that the combination of causal modelling and probabilistic inference can lead to unwanted con- sequences. What can be done in this case on a causal level, is to introduce some artificial constraints that do not improve the system model, but provide a mechanism for coping with problems on the inference level. Another option would be more appropriate usage of the ac-

tion selection heuristics, a better random number generator algorithm, and of course, careful decision on the probability distribution describing the action duration.

In document Methods for engineering symbolic human behaviour models for activity recognition (Page 105-108)