Model Forms and Why They Matter

Introduction

Over the last few years there has been a lot of discussion about new types of modelling technologies. Variously known as 'Data Science', 'Advanced Analytics' or (most commonly) 'AI', these approaches all chiefly refer to new methods for building predictive models, particularly where the new method relies less on manual creation of the model by an analyst.

This does not mean that these approaches are fully automated (although some can be) - for instance an analyst may still be involved to set constraints on the overall level of complexity that that model is allowed to have or other similar characteristics.

These constraints are typically known as 'Hyperparameters' and can still have a very significant impact on the qualities of the final model, even if the actual model itself is then built by an algorithm.

The proliferation of these technologies has had both advantages and disadvantages.

On the plus side, there are now more choices available to modelling teams to help them develop the best model, and arguably the level of understanding around model development is at the highest level it has ever been.

On the negative side, this flexibility does also come with additional decisions that need to be made, and the choice of which algorithm and approach is best is not always obvious, which may lead to substantial overhead if multiple approaches need to be tested.

Assessing Models

As a general rule, the choice of model form can at least be narrowed down by considering the most important characteristics of the finished model and then evaluating how well each option can perform against these. As an example, we would typically want to consider factors such as:

Model Transparency - How important is it for a person to be able to review the model and fully understand all logic behind how the model is arriving at the predictions it generates?

Linear based modelling techniques such as Generalised Linear Models (GLMs), Elasticnets and EARTH models all tend to be comparatively easy to review and understand, whilst tree-based modelling techniques such as Gradient Boosted Machines (GBMs) or Random Forests tend to be much harder, or even impossible to fully understand at a practical level

Model Accuracy - How important is it to minimise the total error of the predictions of the model?

The tempting answer is that accuracy should always be the single paramount concern when developing a model, but this is often an oversimplification.

Improving model accuracy will typically require trade-offs in one or more of the other factors, and these trade-offs may cease to be beneficial beyond a certain point.

Additionally, we also must consider the type of accuracy that is most important - do we care more about minimising the total amount of error that the model makes, or is it more acceptable (for example) for the model to generally be more accurate for a majority of cases but make larger errors for less common cases

Model Development Time - How quickly can the model be developed and deployed?

Generally faster is better, but we may also want to consider what proportion of the time requires hands on analyst time vs. how much can be run automated. It may be acceptable to choose a fully automated but objectively slower approach if this approach could be left to run overnight or over the weekend rather than something that is faster but more requiring of input

Model Evaluation Time - Once the model is completed, how quickly can the model generate a prediction?

To some extent this can be modified based on the hardware the model is run on, but on similar hardware some model types will be faster to generate predictions than others. How important this is depends on how many predictions you are expecting to generate and whether there is a need for the model to generate predictions in a real-time or 'live' environment

Combining Models

As an additional complication to this, it may be necessary to consider not only the optimal choices for this model, but also for the model when used alongside other models - as would usually be the case if deployed in a 'Lifetime Value' (LTV) calculation or other similar analyses that look at the multi-year impact of a proposed change.

For example, a model evaluation time that might be acceptable in isolation may be too slow if the model is intended to be iterated many times, or if the model is to be run in conjunction with other slower running models.

Similarly, it may be possible to develop a model with an extremely high level of accuracy, but when this model is incorporated into a wider LTV calculation the model ultimately has very little impact on business strategy etc. - in this case it may potentially be better to use a less accurate model and save time and resources to deploy elsewhere.

Within the Quill Systems Portfolio Management System, we have included all common modelling approaches to ensure that your analysts have the maximum flexibility when developing your predictive models.

This is combined with specific sections of our Knowledge Hub that clearly explain each model form, how it works, its pros and cons and when it is best used.

New Developments

We are also excited to include a number of revolutionary model forms. Unavailable elsewhere, these algorithms include adaptations and developments that make them a significant improvement compared to existing approaches. Among the new developments we are able to offer:

Model Compression - once a given model is developed, our algorithm is able to review the model structure and optimise the underlying code to reduce the amount of time the model will take to generate predictions.

Getting faster predictions from your models gives you more customer insights without effecting your customer experience

Interpretable GBMs (iGBMs) - GBMs represent probably the most popular machine learning technique currently in use across most industries.

Prized for its ease of use and high levels of accuracy, GBM models have struggled to gain use in some areas owing to the lack of transparency in the models.

Using our iGBM algorithm, you are able to obtain all of the benefits of a standard GBM, whilst the output is much easier to understand (similar in complexity to a simple linear model)

Easier to understand models can be used more widely without incurring regulatory or business risk

Model Anchoring - a common complaint among managers is that when their teams develop related sets of models there will often be a wide variation in the variables and model forms chosen.

Using our model anchoring technique, analysts can link models together, and use a 'Lead Model' to inform the development of subsequent related models.

Combined with our integreated version control, it becomes much easier to understand when introducing differences between related models provides an outsized benefits vs. when consistency would be a stronger approach for your company

Teamwork - the ability for several teams to work on a single project simultaneously, coupled with the version control trees and model anchoring mentioned above leads to a different and more collaborative style of development.

Analysts can see in real-time how approaches and changes considered by other teams are impacting model quality and can work together in a much more collaborative and holistic fashion to develop your models.

By combining efforts instead of working in isolation, completed models are of higher quality and are much more readily understood across the teams

These are only a few of the new approaches that we bring to the market. If any of these sound like they could be of benefit to your company, or if you just like to have a chat and learn more, we would love to hear from you.

We want to hear from you

If you'd like to learn more, we'd be pleased to hear from you and schedule an introduction.