Last modified: 28 March 2017

#### Abstract

In the past few decades, Artificial Neural Networks (ANNs) have been used to identify and model choice behavior in wide variety of fields (e.g., Bishop, 1995). To give some examples from the field of travel behavior, ANNs have been applied to model commuter mode choice and car ownership (e.g., Hensher and Ton, 2000; Mohammadian & Miller, 2002). ANNs aim to efficiently recognize patterns in the data, without being explicitly programmed where to look. A key feature of ANNs lies in their capability to approximate any Data Generating Process (DGP), provided that sufficient processing units are available; this feature is known as the Universal Approximation Theorem (Hornik et al., 1989). However, despite the strong pragmatic appeal of ANNs, they have been criticized for being too much ‘data driven’ and ‘theory poor’, in effect presenting the analyst with a black box-model of the DGP. This limitation has hampered their use by discrete choice modelers and travel behavior researchers.

In several ways, Discrete Choice Theory (DCT) – which is the dominant approach in the travel behavior research community to model choice behavior – is the mirror image of ANN. In contrast to ANN, DCT presupposes a particular decision rule (DGP) and estimates a model based on that rule on choice data. In addition to the classical linear-in-parameters utility maximization rule, several alternative decision rules have been proposed more recently; see Leong & Hensher (2012) and Chorus (2014) for overviews. Clear advantages of the DCT approach are that it allows for the extraction of deep behavioral insights from choice data (McFadden, 2001) and rigorous conclusions concerning welfare effects of policies (Small & Rosen, 1981). However, despite recent work which allows for a more flexible treatment of decision rules in discrete choice models (e.g., Hess et al., 2012; Van Cranenburgh et al., 2015), DCT can still be considered a relatively rigid approach to model choice data, compared to ANN.

Our paper sets out to explore in more depth the advantages and disadvantages (relative to DCT) of using ANN as a framework to analyze choice data. We focus in particular on ANN’s ability to learn which decision rule best represents the DGP in a discrete choice context.

To this end, we perform three types of analyses:

- We analytically explore to what extent ANN’s Universal Approximation Theorem applies to a discrete choice context, with particular emphasis on the role of the size of the training data set and the number of nodes of the ANN (Vapnik & Chervonenkis, 2015).
- Using synthetic datasets, we explore to what extent ANNs are able to explain choice behavior when the decision rule underlying the DGP is known to the analyst. We focus on standard Random Utility Maximization and Random Regret Minimization (Chorus, 2010) rules. Particular attention is paid to the role of error term variance and the size of the training data set.
- Using real datasets, we explore to what extent ANNs are able to explain choice behavior when the decision rule underlying the DGP is unknown to the analyst (as is usually the case). A range of comparisons is made between different types of ANNs and DCT-based choice models.

With reference to the above mentioned types of analyses, our preliminary results can be summarized as follows:

- The Universal Approximation Theorem applies to a discrete choice context.
- Out-of-sample analysis shows that the performance of the ANN improves when the size of the ANN training data set increases, and approaches the performance of a discrete choice model whose decision rule matches the decision rule of the DGP when the data training data set is sufficiently large, see the Figure 1 & 2 at supplementary files. More specifically, the fitted power functions (of the form y = a.x^b + c ) reveal their asymptotic behavior, which is in line with theoretical expectations. Furthermore, the parameters of the fitted power functions show that ANNs converge faster in case the DGP is RUM, as compared to when the DGP is RRM.

These conclusions pave the way for a more informed debate about the potential and limitations of using ANNs to explain and predict choice data; they also provide clues as to how ANN and DCT may be combined to capitalize on their respective strengths.

**References:**

Bishop, C. M. (1995). *Neural networks for pattern recognition*. Oxford university press.

Chorus, C.G. (2010). A new model of Random Regret Minimization. *European Journal of Transport and Infrastructure Research*, 10(2), 181-196

Chorus, C.G. (2014). Capturing alternative decision rules in travel choice models: A critical discussion. Chapter 13 (pages 290-310) in Hess, S. & Daly, A. (Eds.) *Handbook of Choice Modelling*. Edward Elgar

Hensher, D. A., & Ton, T. T. (2000). A comparison of the predictive potential of artificial neural networks and nested logit models for commuter mode choice. *Transportation Research Part E*, *36*(3), 155-172.

Hess, S., Stathopoulos, A., & Daly, A. (2012). Allowing for heterogeneous decision rules in discrete choice models: an approach and four case studies. *Transportation*, *39*(3), 565-591.

Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. *Neural networks*, *2*(5), 359-366.

Leong, W., & Hensher, D. A. (2012). Embedding decision heuristics in discrete choice models: A review. *Transport Reviews*, *32*(3), 313-331.

McFadden, D. (2001). Economic choices. *The American Economic Review*, *91*(3), 351-378.

Mohammadian, A., & Miller, E. (2002). Nested logit models and artificial neural networks for predicting household automobile choices: comparison of performance. *Transportation Research Record*, (1807), 92-100.

Small, K. A., & Rosen, H. S. (1981). Applied welfare economics with discrete choice models. *Econometrica*, 105-130.

van Cranenburgh, S., Guevara, C. A., & Chorus, C. G. (2015). New insights on random regret minimization models. *Transportation Research Part A*, *74*, 91-109.

Vapnik, V. N., & Chervonenkis, A. Y. (2015). On the uniform convergence of relative frequencies of events to their probabilities. In *Measures of Complexity* (pp. 11-30). Springer.