Introduction

Robo-advisory firms promise to provide low-cost access to diversified portfolios built following the academic literature on normative portfolio choice. Their competitive advantage is based on the ability to provide cheap access to diversified and customized beta (in modern words: financial inclusion). Customization should come at little marginal costs for a web-based platform. Traditional financial advisors have a poor track record for taking client characteristics into account. Foerster et al. (2017) find that only 12% of the cross-sectional variation in advice (across clients) arises from differences in client characteristics such as risk aversion, wealth, experience, occupation or time horizon. Mullainathan et al. (2012) show that advisors are systematically biased against passive investments and even ignore stated client preferences. Traditional financial advice suffers from agency conflicts and behavioural biases.Footnote 1 It is also costly (high fixed costs) and might not be available to investors with little wealth. This is often viewed as a major reason for household non-participation in financial markets.

All the above favours Robo-advice over traditional advice. However, Robo-advisor firms suffer from one key vulnerability: the difficulty of creating trust. To deflect this weakness, they make particular design choices. They offer passive funds and ETFs as well as automated portfolio solutions to avoid conflicts of interest (and save production costs). What else can Robo-advisors do to create trust? We believe that the low level of individualization in Robo-advice critically raised by Faloon and Scherer (2017)is not a design flaw but a deliberate design choice to create trust by offering familiar solutions close to popular investment rules.

Our paper offers statistical insights into portfolio recommendations for the German Robo-advisory market by web-scraping 16 Robo-advisors with a cumulative market share of 78%. We find little evidence for individualization of portfolio advice as investor heterogeneity arising from different investor balance sheets or differences in amount and characteristic (market or factor return) of investor human capital are largely ignored. Robo-advisors fail to offer the advice Merton (1971) gave exactly 50 years ago: allocate between speculative demand (frontier portfolios, identical to all investors), cash, and various hedging demands reflecting household balance sheets and exposures to systematic economic risks (different across individual investors). We believe these choices are not made because of ignorance of the existing academic literature but for commercial reasons. Complicated models that can deliver contra-intuitive solutions to the financially untrained client will not maximize revenues in a highly competitive market.

The existing literature on Robo-advice lacks cross-sectional evidence on empirical portfolio structures. Due to the lack of data, most papers review the economics of the industry as in Soehnke et al. (2020), Grealish and Kolm (2021) or Torno et al. (2021), while Puhle (2019) looks at the relative performance of different Robo-advisors. Scherer and Lehner (2021) are closest to us in methodology but only scrape a single “representative” US advisor. Even though they extract more than 150.000 portfolios, it is unclear how their results generalize in the cross section. Torno and Schildmann (2020) also analyze a large cross section of Robo-advisors (36), but rely on six different model customers. This leaves them with 216 (6 times 36) recommended portfolios instead of more than 240000 in our setting. This contrasts with our approach, where each data point represents a unique combination of questionnaire inputs and portfolio recommendations. We have as many different customers as we have data points. In addition, our focus on a single jurisdiction (identical regulatory framework and client preferences) results in the first data-intensive, cross-sectional study on portfolio structures offered by Robo-advisory firms. Finally, Tertilt and Scholz (2018) also investigate the question how different questionaire answers relate to recommended equity allocations. The authors document that many questions asked in questionnaires have no impact on portfolio recommendations. They use a similar set of Robo-advisors but rely on bivariate correlations (between recommendation and questionnaire input) without controlling for other questionnaire items, use a limited sample of variations rather that all possible permutations and do not attempt to answer the question of which set of questions are the most influential (variable importance relative to all other variables).

Our paper is structured as follows. Section 2 describes the sample of Robo-advisors involved in our empirical work and as a summary of questionnaire information required from potential customers. In Sect. 3, we describe the set of portfolios offered by each Robo-advisor and discuss whether these portfolios are consistent with modeling client circumstances from first principles. We then link questionnaire themes (e.g. time horizon, wealth, experience, ... ) with normative portfolio choice theory in order to a assess the importance of each question on the cross section of portfolio recommendations in Sect. 4. Section 5 describes our empirical strategy and presents the main results. We conclude in Sect. 6.

Robo-advisors and questionnairs

The Robo-advisory market in Germany is highly fragmented with about 30 competing firms.Footnote 2 The initial list of firms included Bevestor, Cominvest, Easyfolio, Evergreen, Fidelity, Financery, Fintego, Gerd Kommer Invest, Ginmon, Growney, Investify, Invoya, Liqid, Loni, Minveo, My si, Navigator, Onvest, Oskar, Pax-Bank, Pixit, Peaks, Peningar, Quirion, Raisin, Robin, Scalable Capital, Solidvest, Pixit, Truevest, Visualvest, Vividam, Whitebox, Zeedin. We only include advisors that can be systematically scrapped, i.e. we checked each Robo-advisor to see if it was possible to use a script programmed in Python to fill out the questionnaire that leads to a portfolio recommendation. For this purpose, we used one of two methods:

  1. 1.

    API (application programming interface): For communication with the web server, we used the direct programming interface. That means we send a POST request to the Robo-Advisor server, which is normally sent by the web browser. POST means that the server accepts the data contained in the request message, in this case the predefined input parameters. The response from the server was a portfolio recommendation.

  2. 2.

    Python library selenium Selenium. It opens a browser window that can be controlled by another Python script. The questionnaire is accessible through certain fields in the source code of the website using the xpath method. The result is the same as if we filled in the questionnaire by hand.

All Robo-advisors that allow scraping were included in the sample. This resulted in our focus list of 16 advisors summarized in Table 1.

How representative is our data for the German market in terms of Assets under management (AuM)? AuM numbers are notoriously difficult to get with many firms being very reluctant to share their numbers. This is not surprising as low AuM numbers signal low customer levels of trust in a given advisor. All AuM data are estimates derived from public sources. Where we did not find sources, we follow Deloitte (2016) and assumed 50 million ï?‘œ AuM as a default, as this represents the minium size to breakeven from the Rob-advisor’s objective. In summary, we cover 8.1 billion in AuM. This leads to a total market size of 11.2 billion by adding the 18 mandates that could not get scrapped. We assume they have on average the same size as the list of firms in table 1. However for the purpose of building this average, we removed the three largest Robo-advisors from our sample as it is highly unlikely that any of these advisors have similar AuMs. This leads to an average size of 0.169 billion for the remainder of the market. Under these assumptions, we cover 73% (\(\frac{8.1}{8.1+18\cdot 0.169}\)) of the German Robo-advisory market.

Table 1 German market for Robo-advice We display name, assets under management, start date and website

All of the Robo-advisors examined use a similar web-based questionnaire to gather the relevant information for portfolio modelling. The questions result in variables that are comparable across all advisors. Robo-advisors have asked very few questions outside these categories.Where they have been asked they have been insignificant and outside the most influential factors. Stylized questionnaire information is summarized in table 2. We report the specific topic of a given question, the major theme it belongs to, its typical number of variations, data type and the number of advisors that ask a particular question. Not all advisors ask all or the same questions. The number of answer categories also differs. The only question that is common across all 16 examined Robo-advisors asks for the investment amount. This information is irrelevant for investors with constant relative risk aversion (these investors find that the optimal allocation to risky assets is the same independent of the investment amount or level).Footnote 3 Investor information required for each Robo-advisor is fairly generic and hardly personalized. This is consistent with Beketov et al. (2018), who find that Robo-advisors use naive mean-variance portfolio construction. No data to assess the client’s household balance sheet or human capital is collected. While we could derive a proxy for human capital from the monthly income figure, we would need many strong assumptions about the (average) investor’s age, profession or expected wage growth. This limits the ability to customize solutions, but potential clients might feel these questions are too intrusive and time-consuming to enter into a website. Time horizon and risk-aversion-related questions are also very common among advisors. However, risk-averse investors with a 10-year time horizon are not a homogeneous group that deserves to be lumped together to receive identical portfolios.

Table 2 Input data to questionnaire Answers to the questionnaire are stored in the following variables. We report the specific topic of a given question, the major theme it belongs to, typical variations, data type and the number of advisors that ask a particular question

Efficient sets

What is the investment opportunity set offered by Robo-advisors? Table 3 summarizes our data set. We analyze 243.000 generic portfolio recommendations and their associated client characteristics across 16 German Robo-advisors. The data are gathered from the 1st of June to the 23rd of June 2021. To facilitate comparisons across Robo-advisors, we document the percentage of input combinations that result in allocations across 10 equity exposure bins. Equity allocations do not only contain equities. They contain all non-bond assets, i.e. equities, alternatives, real estate and commodities when offered.

We find that most (12 out of 16) Robo-advisors offer a parsimonious choice set of 10 or fewer portfolios. The remaining four advisors offer 11 79 or 19 portfolios. This does not only limit the scope for customization, it also shows at most very basic digitization. We suspect that all portfolios are pre-build rather than continuously created for each input combination. Existing Robo-advice comes in a tin. We interpret this as evidence for a scoring logic on top of an efficient frontier, rather than portfolio choice modeling with varying inputs from first principles.

Input combinations that lead to extreme allocations (100% equities or 100% bonds) are much less frequent than portfolios that carry intermediate risk. We view this as a safeguard against litigation risk. Corner portfolios are only offered if overwhelming user input justifies solutions that could be labeled as extreme (i.e. not diversified). In many cases, extreme portfolios are not even on offer. Only five Robo-advisors recommend an all-equity portfolio, while only one Robo-advisor recommends an all bond portfolio. The latter is at least in line with normative portfolio choice that demands minimum equity participation across all levels of risk aversion. Full (100%) bond allocations might also result in unattractive fees relative to return expectations in a low-interest rate environment. For example, fixed costs of 100 Euros would require an asset manager to charge 2% fees for a 5000 Euro account to merely break even. At the same time, most 10-year bonds in 2021 display negative yields in Euro (under either covered or uncovered interest rate parity).

Finally, we note that the extreme variation in investment opportunity sets will make it unlikely that two Robo-advisors recommend similar portfolios when faced with the same inputs. In most cases, this is no even feasible.

Table 3 Efficient set Recommended portfolio allocations for risky assets and their relative frequency. For each Robo-advisor, we compute the weight in risky assets (equities plus commodities) count their frequency with respect to 10 exposure bins ranging from 0-10% to 90-100% equities

Questionnaires and portfolio theory

Each question in a given questionnaire is viewed as a potential explanatory variable in a multivariate regression model. Compulsory inputs should be useful in determining portfolio allocations. In an empirical model they should explain at least some of the variation in portfolio recommendations across clients with different personal characteristics. Therefore, we use the available questionnaire information to build a quantitative model to measure each question’s impact on final portfolio recommendations. Every answer to a question is stored as either an ordered factor (example: risk aversion of 1 is smaller than risk aversion of 2) or an unordered factor (example: investment goals, as no goal is larger than another goal). We group the required inputs from Robo-advisory questionnaires into five categories related to portfolio choice: risk aversion, wealth, time horizon, experience and investment goals, as shown in table 2. Before we present our results, we quickly summarize what to expect from the perspective of normative portfolio choice.

Time horizon and wealthFootnote 4 Normative portfolio choice allows multiple theoretical relationships. The classical view (time does not diversify) has been forcefully argued by Samuelson (1969) and reiterated to the investment community in Samuelson (1994). Samuelson’s solution (time horizon and recommended equity weights are independent) is well known to rely on the assumptions of CRRA utility, independent returns and lack of estimation risk. Once we change these assumptions, we can argue either case. If we change from CRRA to DRRA (decreasing relative risk aversion) the optimal allocation to equities increases with wealthFootnote 5 Equally, Campbell and Viceira (2002) argue for an increase in equity allocations as time horizons lengthen. Their work is driven by the predictability of equity returns using vector-autoregressive models. There is however, considerable estimation risk in regressions of this kind and previous relationships can be overturned (optimal allocation to equities decreases with time horizon) once we add substantial estimation risk (Barberis 2000). Empirically, Spaenjers and Spira (2015) find that the share of risky assets increases with the investor’s subjective (personal, i.e. mortality table adjusted) time horizon. Bodie and Crane (1997) also find that empirically the allocation to equities increases with time horizon and wealth. In our judgment, the work by Campbell and Viceira (2002) now define the academic mainstream. We view a positive relationship between time-horizon and risk-taking and no relation between risk-taking and wealth as most consistent with normative portfolio choice.

Experience The influence of investor knowledge and personal experience on risk-taking has not been subject to normative models of portfolio choice. Instead, empirical studies document a positive statistical relation between investor education and chosen portfolio risk (after controlling for wealth, and other characteristics).Footnote 6 The conjecture is that less cognitive ability might act as a psychological barrier to financial market participation. Unfamiliarity with a complex subject such as investing also increases costs (measured in time and money) for low-skill households and hence leads to lower levels of investment. Ampudia and Ehrmann (2014) show, that while experience has an impact on risk- taking, it is not experience per se, but the type of experience that matters. Investors with positive (negative) stock market experience are more likely to hold substantial (small) positions in risky assets. Grinblatt et al. (2011) show that cognitive skills decrease information costs and therefore increase the likelihood of participating in financial markets. Campbell (2006)finds evidence that stock market participation positively correlates with education. Hsu (2012) also argues that lower skills lead to lower wealth accumulation. If households also display decreasing relative risk aversion, optimal demand for risky assets will decrease with wealth levels as local risk aversion increases. However, this does not equate to normative advice. Rather to the contrary. Van Rooij et al. (2011) also find that a lack of financial literacy leads to lower stock market participation. From a normative perspective, we would not think that risk-taking depends on investor experience. From an empirical perspective, we would expect lower education to lead to lower risk-taking. Nudging inexperienced households to invest more aggressively than they initially desire would create economic gains for those households at the expense of regulatory and litigation risks.Footnote 7

Goals. Questions concerning investment goals are behaviorally motivated but do not necessarily violate normative portfolio choice. Das et al. (2010) have shown that even though goal-based investing (building mental accounts) is behaviorally motivated, the portfolio of mental accounts plot close on the efficient-frontier. Proponents of goal-based investing will claim that investment goals differ in the required funding strategy to reach them. Bond allocations are optimal if the difference between current wealth and target wealth is low, the time horizon is short, and the required confidence is high. Equity allocations in turn are chosen for large differences in aspired to current wealth, little (high) required confidence and shorter (longer) horizons. Minimizing the probability of falling short of the funds needed to reach the respective goal is the implied measure of risk. Translated into our questionnaire, emergency funds are mainly invested in fixed income, while long-term or retirement objectives are best reached with equities. In our experience, this view has support among practitioners. Among academics, this is however disputed. The measurement of investment risk as the probability to underperform a wealth target is inconsistent with maximizing expected utility for well-accepted utility functions. In a mean-variance world, this has no consequences for efficient frontier portfolios. Mean-variance efficient portfolio sets also are mean-shortfall risk efficient (even though investors might choose different points along the mean-variance frontier). In reality, the world is non-normal, investors are not agnostic by how much a goal is not met and the combination of goal-based portfolios is not necessarily optimal in the presence of a long-only restriction.Footnote 8 In our view, any dominance of goal-based criteria would mark a deviation from normative portfolio choice.

Risk aversion Among the many inputs required from Robo-advisors questions, related to risk aversion should have the most direct influence on risk-taking. Higher risk aversion will lead to lower equity allocation. This is not only enshrined in normative portfolio choice but also meets regulatory demands for suitability criteria. We expect a negative relation, i.e. higher risk aversion leads to lower risk-taking.

What drives Robo-advice?

We established that Robo– advisors use similar, but still heterogeneous questionnaires. They differ in the number of variables, exact wording, number of variations available for each question, etc. This makes it difficult to summarize the impact of a given variable across Robo-advisors. We therefore chose the following approach.

  1. 1.

    We run a separate parametric OLS regression with ordered (if applicable) factors as independent variables for each Robo-advisor. The dependent variable is the recommended equity allocation. In line with the literature we do not attempt to compute and add the implied equity allocation from other asset classes (for example high yield or corporate bond equity beta) to the recommended equity allocations. As we need to deal with mostly ordered factors, we can not use one-hot encoding or Helmert contrasts in our regressions but rather use orthogonal polynomial contrasts. A more detailed description of our modelling approach can be found in the "Appendix".

  2. 2.

    We formally interrogate each regression model to identify the most influential variable(s). For this purpose, we borrow from the literature on interpretable machine learning and employ the following model agnostic algorithm suggested by Fisher et al. (2018). For each variable, we randomly permute the values of that particular feature and recompute the chosen performance metric, in our case \(R_{\rm perm}^{2}\). We then record the difference between the baseline metric and the permutated metric \(R_{\rm base}^{2}-R_{\rm perm}^{2}\) as our importance score.

  3. 3.

    The three variables with the highest importance scores are then selected as the most influential variables. We then report the category a variable has been assigned to, together with the sign of their individual regression coefficient as well as the cumulative \({\bar{R}}^{2}\) from stepwise regressions. This gives us an indication of the importance of the modeled relationship. We confirm the direction of the relationship with partial dependence plot.

All results are presented in Table 4. Risk aversion-related questions play a dominating role for recommended equity portfolio weights across all Robo-advisors. For 12 advisors we find that the top input is related to risk aversion. The sign is negative across all advisors, i.e. higher risk aversion leads to lower weights in risky assets. Bach et al. (2020) show that risk-taking (revealed risk aversion) is a major driver of cross-sectional differences in household wealth. The top 1% of wealthiest households take more systematic risks, invest in more volatile portfolios and earn much higher long-term average returns. Investors need to carefully assess their willingness to take risks. We also find that recommended portfolios show higher equity allocations for longer time horizon investors while wealth hardly plays a role in portfolio recommendations. Only three Robo-advisors display statistically significant coefficients for wealth and in each of these cases the marginal R-square of the wealth variable turns out to be small. This makes it unlikely that Robo-advisors use utility functions with decreasing relative risk aversion. Instead, the evidence is more consistent with negatively sloping term structures of risk due to mean reversion in equity returns.

For 3 of our 16 Robo-advisors, we find that investor experience is used as the most important input variable. This is surprising given the weak theoretical underpinning of this variable. We attribute this observation to anticipated regulatory concerns, i.e. mitigation of business risks. MiFID II, article 25(2) requires investment firms to ask investors for their “knowledge and experience in the investment field relevant to the specific product or service”. This question is of interest as it finds no resemblance to the theory of portfolio choice. ESMA’s request is instead based on an implied conjecture: less experience should result in less risk-taking. Their guideline on certain aspects of the MiFID II suitability requirements (50) explicitly states “Firms should be alert to any relevant contradictions between different pieces of information collected, and contact the client to resolve any material potential inconsistencies or inaccuracies. Examples of such contradictions are clients who have little knowledge or experience and an aggressive attitude to risk, or who have a prudent risk profile and ambitious investment objectives.Footnote 9

Investment goals have a minor impact for all but one advisor, where investment goals explain 77% of the variation in recommended portfolio weights. We also find 3 advisors with an extremely simple model that is fully captured by changes in risk aversion only. All other variables are stored and used for non-investment purposes.

Table 4 Top 3 questionnaire categories For each Robo-advisor we run an OLS-regression with ordered and unordered factors (user input choices). Input variables are one by one randomized such that we can compute an importance score as the difference between the \(R^{2}\) of the original data and the randomized data. The larger the difference, the more important the variable. We show the top 3 variables (by category), their cumulative R-squared as well as the R-squared of a model using all variables

Most regressions do not fully explain the dispersion in recommended equity weights. This is a clear indication of possible nonlinearities, i.e either nonlinear interactions across explanatory variables or threshold effects in individual variables. The latter is somewhat caught by the employed polynomial contrast used in our regression framework. Our average \({\bar{R}}^{2}\) is still around 82% and in virtually all regressions we do not find evidence that using more than three variables would significantly increase the model’s explanatory power. In other words, not all required by questionnaires has an impact on the final recommendation.

Our data show that current Robo-advisory offerings use inputs designed to locate investors on a given efficient frontier, while the frontier itself looks identical to all investors. What makes investors different are their various hedging demand originating from their household balance sheets but the relevant questions needed to model hedging demands are not asked. This is in stark contrast to portfolio choice in a modern multi-factor world as in Cochrane (1999). In a multifactor world, many investors will hold portfolios plotting below an efficient frontier as they can not take frontier-related factor risks. This is the whole point of a rational risk premium. Not every investor finds it optimal to take it. Investor heterogeneity arising from different investor balance sheets or differences in amount and characteristic (market or factor return) is largely ignored. This is somewhat disappointing as Robo-advisors fail to offer the advice Merton (1971) gave exactly 50 years ago: allocate between speculative demand (frontier portfolios, identical to all investors), cash and various hedging demands reflecting household balance sheets and exposures to systematic economic risks (different across individual investors).

We believe these choices are not made because of ignorance of the existing academic literature, but rather for commercial reasons. First, it is well known that trusted advice by “money doctors” as described by Gennaioli et al. (2015) reduces behavioral biases and can overcome complexity.Footnote 10 Earlier work by Sapienza et al. (2013) also finds the importance of trust for economic decision making. This statement is echoed by Merton (2017) in the context of Robo-advisory adoption rates: “What you need to make technology work is to create trust”. Hildebrand and Bergner (2020) make the same point.

But what creates trust? Jacovi et al. (2020) conjecture that (intrinsic) trust can be gained when recommendations line up closely with the user’s prior beliefs.Footnote 11 Hence portfolio recommendations receive more trust when they resemble solutions that coincide with the investor’s prior understanding of portfolio choice. For Robo-advisory as a business, there is likely a tradeoff between Merton (1971) and Merton (2017). Should the Robo-advisor offer theoretically consistent but initially unintuitive advice? A young government employee (assume 90% of his wealth is human capital that behaves like government bonds) with high risk aversion might still get a 100% equity portfolio. This is consistent as equities still only account for 10% of her total wealth. However, will the investor understand? Equally important, would that argument work in court after clients made large losses inconsistent with their stated risk aversion? Robo-advice as a business decides what works best in order to win and maintain new clients. Related work by Scherer and Lehner (2021) already provide evidence in this direction. Web-scrapping one of the largest US Rob-advisors, they document portfolio recommendations that are more consistent with client pre-perceptions rather than textbook financial modeling.

Conclusions

We estimate the impact of client characteristics gathered by Robo-advisor questionnaires on recommended portfolio structures for a large cross section of German Robo-advisors. Contrary to the academic progress on normative portfolio choice, we find that portfolio recommendations are driven mainly by questions with respect to risk aversion and investor time-horizon. Household balance sheets, human capital or economic hedging demands play no role. Instead, variables with little normative underpinning like personal experience or investor goals find their way into questionnaires or as Cochrane (2021) put it: “When theory is so persistently contrary to practice one of the two must be wrong”. Maybe the theory is just incomplete. The fact that Robo-advisors prefer a solution space that is more likely to confirm investors existing preconceptions makes business sense. It increases consumer trust and regulatory approval, both outside the scope of normative portfolio choice. Agency problems are everywhere.