Casinos, One-arm bandits, and a new metric: What can machine learning really teach us, and how expensive is our ignorance?

Casinos, One-arm bandits, and a new metric: What can machine learning really teach us, and how expensive is our ignorance?

It may seem strange to many in marketing, but IT and technology has a lot to teach us about how to handle communication within a highly complex environment.

This is particularly the case in the practical situation where we have limited constraints, like a finite budget and need to optimise who will provide an optimal marketing response.

Our most prevalent approach appears to have been driven by the proper need to gain credibility by providing financial reports that link individual level responses to their directly attributable costs. This is clearly an appropriate way to structure reporting to accountants, but looking at the performance of high level segments of these reports and defining return on investment at this level doesn’t necessarily lead to the optimum method of execution.

This conventional approach, can clearly show after the event, which high level segments of a customer population have provided the highest level of responses. What we have historically and implicitly assumed is that our population of customers is smooth and “well-behaved”. This allows us to legitimately use previous performance at this high level to direct future low level targeting of these same segments. Our approaches are then managed by appropriate individual KPIs that look for the optimum level of ROI, CPA etc. and works its way down until the budget is spent. What is wrong with starting from the most efficient segments and work your way down, surely this be default provides the most efficient future outcome.

However, two important assumptions quietly slipped passed us in this formulation. Overall we firstly assumed that the audience of customers and prospects is stable and well-behaved, operating independently of each other, where the macroscopic is similar in behaviour to the local and secondly we have taken the past as a reliable and presistent representation of the future.

Machine learning of the past couple of decades has had to address a similar problem, there it is classified as the exploit-explore trade-off. At any stage in a technological learning process, the system needs to decide whether at the next step to “exploit” existing knowledge or “explore” new possibilities. There is a vast field of study of this trade-off and it underpins a wide range of successful technological developments, for example the algorithms behind information search, as within Google. It is reminiscent of the difficult choice between investing in retention communications with existing customers or alternatively seeking to acquire new customers, this trade-off between acquisition and retention, similarly occurs within a fixed marketing budget.

So to the Casinos and one-armed bandits. Computer science has set up the study of the explore-exploit trade-off by thinking about a canonical problem as follows. Consider yourself, entering a Casino containing a multitude of one arm bandits, each with fixed but unknown odds of paying out? Whilst it isn’t very complementary each potential customer is similar to one of the bandits, can a communication be sent that has the best opportunity to deliver a pay-out, in this case a sale. Computer scientists have been working on this problem since the second world war, and found initial solutions in the early 1950’s. These initial solutions were mathematically profound but difficult to practically implement, they were based upon a particular form of discounting of future benefits.

Whilst somewhat impractical at large scales, these solutions already showed that organisations are sub-optimal if all they do is focus on known results, there is shown to be a premium to looking for new insights and exploring the unknown. This seems counter-intuitive but within the processing of machine learning this effect is well attested. It arises because, clean well behaved collections of items are very unusual, and in environments that are chaotic rather than random, it is worthwhile constantly looking around for local changes. In fact, the approach suggests that our marketing approach that starts from the optimal and works down, may be the wrong way around. It suggests that it is more effective to find those places that we know don’t work and move away from them.

Indeed, this changed perspective has been confirmed much more recently, machine learning has been built upon algorithms that seek to minimise regret, rather than maximise benefit. These sound like equivalent sides of the same coin, but that is only the case if the system is smooth. The approach changes the way to weight information that is unknown. Approaches to maximise benefit, tend to be pessimistic about the unknown and preferentially weight past performance as a prediction of the future. They increase the risk of over-fitting and creating a structural error which inadvertently generates unmeasured negative outcomes. Minimising regret, focuses on making sure the negatives are at a minimum, keeping them fully conscious, letting the positives look after themselves.

The other challenge to our common management of marketing, is the broad separation of acquisition and retention. Machine learning treats each case of exploitation or exploration differently, but it intimately switches between the two tasks. This suggests that whilst it is clearly correct to design different communications dependent on whether you talk to a new prospect or existing customer, the choices and selections at any given time should be intimately bound together. Our processes of maintaining highly separate departments, campaigns and responsibilities for acquisition and retention, is likely to be highly sub-optimal. It is likely to be much more effective if the two are managed together, and allowed to reinforce each other.

Lots of studies of machine learning strategies, indicate that within their highly measured context, it is well accepted that these alternatives to common marketing practice make substantial differences, in large databases and when processing millions of instructions, it means that search returns on Google might be near instantaneous rather than taking hours or days to complete.

So what might data marketing look like, if it adopted the tried and tested approaches of machine learning. Whilst it would still keep ROI as a number to report back to the finance department, it would use alternative KPIs to direct its own execution. Paradoxically, stepping back from ROI whilst directing execution should successfully deliver a higher ROI at the end. The suggestion here, based upon the concept of minimal regret, is that the KPI to follow is the volume of missed business, VMB, perhaps? Alongside this it is perhaps easier to implement a the removal of the dichotomy between retention and acquisition, and make choices and selections in tandem.

Most substantially, these approaches should dramatically change the way that we value data that is provided to us within DMPs and other platforms supporting programmatic implementations. It is almost certain that behind the scenes, platforms like Facebook, Google and other delivery gateways, use machine learning algorithms that already utilise the strategies outlined above. Currently they are known as “reinforcement learning”, the platforms are able to use these methods to optimise there own use of A/B testing and other strategies.  At the same time, we marketeers, continue to value sub-optimal targeted categories and persist in optimising ROI. This is an ideal opportunity for the platforms to make a substantial amount of money, there profit margins can remain enormous because we are happy to pay for a perceived value that they are able to support at minimal costs. ROI and data based targeting is currently a very expensive and sub-optimal approach that is diverting vast amounts of revenue from brands into the online platforms, without the opportunity to go elsewhere.

If we can adjust our own approaches to communication within the complex world that our consumers and prospects inhabit, then we might well be able to value the approaches and selections provided online more effectively and in a way that is more closely aligned to the real costs that underpin them.