We introduced two novel approaches for feature-based taxi request prediction. To our knowledge this is the first work that specifically aims at using background
CONCLUSIONS AND FUTURE WORK 127
information for making predictions and to have the ability to make predictions in regions for which no historic data is available. Our experiments suggest that models that combine contextual features and recent observations outperform the models which take only one of these two into account. These models also perform significantly better in the task of making predictions for unseen regions in the same city. For the task of making predictions for unseen regions in a different city, none of the studied methods produced satisfactory results. We also contributed a method for analyzing data of this type by showing that non-negative matrix factorization can be used to discover behavioral patterns in the data. The main advantage of this method is that it offersinterpretable
predictions.
Prediction of taxi trips is usually a component of a larger pipeline that solves an optimization problem (e.g. routing or scheduling problems). For future work, we want to investigate whether our predictions will improve the quality of the final solutions in such a pipeline. Another extension is to use methods such as
Poisson dependency networks for modeling the dependencies among Poisson variables. Another interesting direction is the use of more relevant features. For example, it would be interesting to combine our work with that of Chang et al. (Chang et al. 2010) which focuses on detecting landmark in a city based on taxi data. By using those techniques we can develop a model about which type of landmarks draw most traffic and use that information as features in our prediction model.
Chapter 8
Conclusions and Future Work
8.1
Summary and Conclusions
Probabilistic inference and constraint satisfaction and optimization have been studied extensively for decades. The two fields are known to have connections, and their intersection has been studied before. However, these connections have not been fully exploited in solving problems that involve both constraint satisfaction and probabilistic inference. An example of such problems is maximization of the expected utility which is a natural extension of deterministic CSPs to a situation where there is only probabilistic knowledge about some of the problem parameters. Another class of such problems are those that involve constraining or optimizing the probability values themselves. Our contribution in this regard was to present two mechanisms for solving these two classes of problems. Both these methods build on existing constraint programming solvers. This means that a lot of facilities of these solvers (e.g. the complex constraints that they support) can be used by our methods.
The second contribution of this thesis was motivated by the recent interest in integrating data mining with constraint satisfaction and optimization. In particular, we formulated and solved two clustering problems as integer linear programming models.
Advances in optimization under uncertainty allow the user to formulate more accurate models by specifying a distribution over random variables. A problem that is rarely addressed in existing work on stochastic optimization is how to accurately approximate the real-world uncertainties using a probability distribution. This question has been extensively studied in the statistical
130 CONCLUSIONS AND FUTURE WORK
machine learning domain. Our third contribution was to use techniques from this domain to learn probability distributions for taxi passenger demand. In the first chapter we introduced the three main research question in this thesis. We will now review these questions again and summarize the answers that this thesis provides for them:
Q1. How can we combine principles of constraint satisfaction and probabilistic
inference to solve problems that involve both tasks?
We introduced two mechanisms for combining probabilistic inference and constraint satisfaction and optimization. The first mechanism models the computational steps of a probabilistic inference engine in terms of constraints. The second mechanism uses a novel depth-first search algorithm and calls an external probabilistic inference engine.
In the first approach, we compile a Bayesian network into a d-DNNF, and formulate and reason over this structure using a constraint programming solver. We show that using this approach we can support a wide range of queries and constraints in a flexible and declarative manner. We used this approach for pattern mining in Bayesian networks.
In the second approach, we presented a new stochastic constraint programming method. Existing works on stochastic constraint programming made at least one of the following assumptions: 1) the random variables are independent, 2) the probability distribution should be first converted to a list of possible worlds. Instead, we assumed a non-trivial factored joint distribution over the random variables. We introduced an And-Or search algorithm to combine constraint satisfaction and probabilistic inference. We introduced a novel bound that works directly on this tree. To compute this bound we used a state-of-the-art probabilistic inference engine. We implemented this mechanism within a generic constraint solver. So our method supports existing complex constraints.
Q2. What are the potentials of formulating constrained clustering as integer
linear programming?
We also formulated two clustering problems using integer linear programming: constrained minimum sum-of-squares (MSS) clustering and constrained graph clustering. To solve the first problem we used a formulation in which each possible cluster is represented by a variable. This leads to an exponential number of variables. We used a column-generation algorithm to incrementally add a significant subset of these variables to the model. We presented a novel branch and bound algorithm to solve thepricing subproblem, i.e. the problem of finding the next variable to be added to the model. This hybrid approach allows us to take care of the constraints when solving the subproblem. Using
DISCUSSION AND FUTURE WORK 131
this approach we obtained the optimal solution for a number of constraint clustering instances for the first time.
We also presented two formulations for a biologically-inspired graph clustering problem. The first formulation was based on enumerating all simple paths in the graph. In dense graphs, this formulation leads to a large number of constraints. The second formulation also included a worst-case exponential number of constraints. But we used a cutting-plane algorithm to include only a sufficient subset of these constraints in the model. We also introduced a bi-objective Pareto optimization method to balance the two components of the objective function. Our experiments showed that each formulation performs better on a certain type of problems. When the graph is sparse and the total number of simple paths is low, the first formulation is more efficient. For denser graphs, the overhead of solving the cutting-plane subproblem pays off and the second formulation performs better.
Q3How can we use data mining techniques to learn the distribution of passenger
requests from records of taxi trips?
We introduced two novel approaches for learning distributions for taxi passenger demand. The main novelty of our method is using background information for making predictions and to have the ability to make predictions in regions for which no historic data is available. Our experiments suggest that models that combine contextual features and recent observations outperform the models which take only one of these two into account. These models also perform significantly better in the task of making predictions for unseen regions in the same city.