Monday, October 20, 2025
HomeBusiness IntelligenceDiscover What Drives Your Metrics: Evaluating Key Driver Evaluation Approaches for BI

Discover What Drives Your Metrics: Evaluating Key Driver Evaluation Approaches for BI


With regards to ad-hoc choices based mostly on Enterprise Intelligence (BI) there are often two main components that make it easier to perceive the underlying story in your knowledge: What has modified in your knowledge and why.

To know what has modified, we will often make the most of quite a few confirmed techniques. From easy thresholds to anomaly detection algorithms. These algorithms may be surprisingly tough, whenever you wish to see sudden change, slightly than some threshold, however on that in one other article.

The method to know what drives these adjustments in your corporation metrics (income, churn, conversions, …) is often referred to as Key Driver Evaluation (KDA). It helps uncover why issues change and make it easier to create extra knowledgeable choices.

At GoodData, we explored other ways to implement KDA (particularly period-over-period change evaluation) effectively, precisely, and at scale. We in contrast three totally different approaches generally utilized in BI software program:

  • Attribute-level Aggregation
  • Linear Regression
  • Gradient Boosting with SHAP

We in contrast them based mostly on their interpretability, accuracy and scalability. We additionally tried them on a public knowledge set (e-commerce gross sales knowledge) consultant of what a smaller e-commerce might have.

As an example the factors, let’s take into account this story:

You might have a sudden change in gross sales this month and also you’d like to know what drove this variation. Your corporation is worldwide and also you resell electronics. This enterprise case has doubtlessly plenty of key drivers comparable to development/decline specifically international locations, product segments, campaigns, merchandise… you get the purpose. And also you need to have the ability to uncover probably the most impactful drivers of that development/decline to make changes to maximise your income.

Right here’s what we discovered.

Attribute-level Aggregation

Most likely probably the most simple method to do KDA is to slice the metric by every attainable dimension independently, calculate how a lot the metric modified for every worth of every attribute, and type the adjustments by magnitude.

That’s an method many enterprise folks would most likely do in Excel in the event that they had been requested to seek out key drivers of a metric enhance/lower. They’d plot bar charts with the metric aggregated by every dimension and attempt to discover the values with highest enhance/lower after which type them.

Let’s have a look at the instance from the introduction. You’d first have a look at the income by nation, discover out that there was a big enhance in income within the US, after which have a look at the income by product class and discover out that there was a big enhance in income of cellphones. These can be your key drivers of the rise.

This method is simple to clarify to enterprise folks, clear, straightforward to visualise (with bar charts), straightforward to implement, and at last quick to calculate.

However it is vitally simplified and has many disadvantages.

The primary drawback is that it double-counts contributions of the drivers. Let’s take an excellent nearer have a look at the earlier instance. Let’s say {that a} new telephone was introduced this month which led to a rise in gross sales in all international locations however the US is the most important market (for the corporate) so there shall be a giant spike (in comparison with different international locations) as nicely. So in case you do the attribute-level evaluation, which seems to be at every dimension independently, each the US and telephones will seem like key drivers. Nonetheless, the US enhance was pushed by the telephone gross sales and different product classes didn’t develop there. So the US shouldn’t be recognized as a key driver. Solely the telephone class ought to.

This downside is known as double counting of drivers or confounded driver attribution and is attributable to every attribute independently and never taking into consideration their dependence.

One other shortcoming of this methodology is that it ignores attribute interactions. Let’s say there was a giant advert marketing campaign and low cost on laptops in Germany in a given month. Since Germany is a big market globally and laptops have massive market share there, in case you combination over international locations, Germany will seem like a giant driver, and in case you combination over classes, laptops will seem like a giant driver as nicely (globally). However gross sales of different classes in Germany stayed the identical (so German economic system was not the motive force) and in addition laptop gross sales stayed the identical in different international locations, so computer systems had been additionally not the motive force by itself.

It was the interplay/mixture of the marketing campaign in Germany for computer systems. Single attribute-level evaluation received’t uncover this and can make Germany and laptops seem like drivers although they weren’t (and it’ll additionally double depend them).

There are different points/results that this methodology doesn’t deal with nicely, comparable to combine/composition impact situation, the place if for instance laptop market is means larger in Germany relative to telephone market (let’s say 90/10) in comparison with different international locations (let’s say there it often is 50/50) then e.g. world laptop campaigns/reductions will drive gross sales for the entire class globally, however it’s going to enhance gross sales in Germany disproportionately as a result of the pc market share is bigger there so it’s going to make Germany seem like the motive force although the pc class was the motive force.

Regardless of all these limitations, this methodology can nonetheless be very helpful. It principally automates what an analyst would do in the event that they needed to slender down potential key drivers in an Excel or an identical BI device. But it surely does it means quicker. And paired with some further measures to filter out apparent/uninteresting potential key drivers it might probably save plenty of work and assist analysts select the precise space to deal with and dig deeper.

It’s vital to concentrate on the constraints and interpret the outcomes appropriately. It principally provides you an inventory of variables (attribute values comparable to particular international locations or product classes) the place the goal metric modified probably the most so you recognize the place to look. However an analyst nonetheless has to undergo them and appropriately assign/credit score the contribution based mostly on some area data or additional evaluation. With the univariate evaluation, the contribution will not be assigned proportionally amongst dependent variables based mostly on their true contribution. Due to that, the sum of all these contributions shall be bigger than the entire contribution.

Linear Regression Fashions

A extra superior method are linear regression fashions, comparable to linear or logistic regression. The primary benefit over the univariate evaluation is that they bear in mind the already talked about relationships between the size and distribute the entire contribution between the drivers so they don’t seem to be double counted. So within the first instance, it might have the ability to decide that the important thing driver was the telephone class and never the US. They will additionally clear up the problem with interactions by together with so-called interplay phrases within the mannequin.

An enormous benefit can be that the ensuing drivers are simply interpretable and acquainted to enterprise analysts and it’s constructed on prime of a stable statistical basis.

Then again, with the rising variety of dimensions, cardinality, and/or variety of interplay phrases the dimensionality blows up fairly rapidly (quadratic operate) so it takes lengthy to calculate and in addition the outcomes can turn out to be noisy.

Linear regression fashions additionally make sturdy assumptions concerning the knowledge which if not met can result in incorrect and deceptive outcomes. And the standard of the outcomes will depend on how nicely the mannequin is ready to match the information.

One other drawback, within the context of BI software program, is {that a} separate mannequin needs to be computed for every time interval (for period-over-period change evaluation) and every filter mixture. This makes it infeasible to precalculate them for if there’s numerous attainable filter combos.

Gradient Boosting with SHAP values

Non-linear fashions, comparable to gradient boosting or random forests, along with calculating SHAP values deal with a lot of the issues of the earlier two approaches.

To start with, it handles all the problems talked about earlier with double counting, interactions, and blend/composition results because of utilizing multivariate and non-linear fashions that may mannequin dependencies between variables and SHAP values that pretty distributes the entire contribution amongst all of the components. And it additionally doesn’t make any assumptions concerning the underlying knowledge so it may be used on arbitrary knowledge.

Additionally, in comparison with the linear regression fashions, it might probably deal with (based mostly on the underlying mannequin that’s used) categorical attributes natively and received’t explode in complexity with attributes with excessive cardinality.

Lastly, the SHAP values are additive so they are often calculated as soon as after which aggregated for various ranges/attributes (by nation, product, and many others.) and totally different filter combos. And the underlying mannequin may be educated on the entire knowledge set (not simply the in contrast durations) so it might probably present each native and world explanations (that’s each drivers in a given interval and long-term pattern drivers) assuming the mannequin has massive sufficient capability to seize these insights.

Then again, approaches based mostly on non-linear fashions and SHAP values are fairly a black-box and tough to interpret, visualize, and clarify. That makes them much less clear and reliable.

There are additionally plenty of knobs that should be fine-tuned on every particular area and knowledge set so it’s tough to make it work mechanically on any area or knowledge set with none prior data. Usually, some guide characteristic engineering and parameter tuning is required, though it may be automated to some extent.

The standard of the outcomes will depend on how nicely the underlying mannequin can match the information, so if all of the knobs are usually not appropriately set it’s going to result in incorrect and deceptive outcomes.

Lastly, this methodology is computationally costly. However then again, it may be simply precomputed and parallelized (not like the linear regression fashions) so it may be sped up with extra assets. Additionally, having one mannequin forever durations and the additive nature of SHAP values makes it straightforward to precompute and cache the mannequin.

Conclusion

We reviewed three paths to Key Driver Evaluation:

  • easy attribute-level aggregation,
  • linear regression,
  • non-linear fashions with SHAP.

For the primary launch, we selected attribute-level aggregation as a result of it aligns with how analysts motive about knowledge, it’s straightforward to clarify, quick to compute, and it really works throughout domains with out fragile mannequin assumptions. When used thoughtfully, it highlights credible candidates for additional investigation as an alternative of pretending to ship excellent attribution.

To boost the sign and minimize the noise, we added two upgrades. First, we detect solely statistically significant shifts inside every dimension, which limits false positives. Second, we rank and choose probably the most promising enterprise dimensions earlier than we run the evaluation, which retains the outcomes centered even in complicated environments with many potential drivers.

This method units a reliable baseline that groups can belief. Even with many filters and frequent updates. It avoids the danger of assured however deceptive outcomes that may happen when a generic mannequin doesn’t match a selected dataset. And it creates a clear runway for the long run. If a buyer wants deeper precision or desires to shorten the trail from anomaly to perception, our skilled companies can ship a tailor-made ML answer based mostly on linear or boosted fashions with SHAP, calibrated to the shopper’s knowledge and context.

Tl;DR: Begin clearly, construct belief and scale to superior strategies when the worth is confirmed.

Wish to be taught extra?

Keep tuned, if you would like to be taught why we weren’t the one one who selected attribute-level aggregation because the default algorithm for KDA, as a result of we’ll quickly launch a product-first POV on the matter.

RELATED ARTICLES

Most Popular

Recent Comments