Statistical methodology for property value trend experiments

Last updated:

|Edit this page

Trends experiments for property values use Bayesian statistics with a log-normal model and Normal-Inverse-Gamma prior to evaluate the win probabilities and credible intervals for an experiment. Read the statistics primer for an overview if you haven't already.

What the heck is a log-normal model with Normal-Inverse-Gamma prior?

The log-normal model is great for analyzing metrics like revenue or other property values that are always positive and often have a "long tail" of high values.

Imagine you're looking at daily revenue from your customers:

  • Most customers might spend $20-100.
  • Some customers spend $200-500.
  • A few customers spend $1000+.

This creates what we call a "right-skewed" distribution - lots of smaller values, with a long tail stretching to the right. This is where the log-normal model shines:

  • When we take the logarithm of these values, they follow a nice bell curve (normal distribution).
  • This makes it much easier to analyze the data mathematically.
  • We can transform back to regular dollars for our final results.

The "Normal-Inverse-Gamma prior" part helps us handle uncertainty:

  • When we have very little data, it keeps our estimates reasonable.
  • As we collect more data, it lets the actual data drive our conclusions.
  • It accounts for uncertainty in both the average value AND how spread out the values are.
  • We use a fixed log-space variance (LOG_VARIANCE = 0.75) based on typical patterns in revenue data.

For example:

  • Day 1: 5 customers spend an average of $50, but we're very uncertain about whether this represents the true average spending.
  • Day 30: 500 customers spend an average of $50, and we're much more confident about this average value.

Our model uses minimally informative priors (MU_0 = 0.0, KAPPA_0 = 1.0, ALPHA_0 = 1.0, BETA_0 = 1.0) to let the data speak for itself while maintaining mathematical stability.

Win probabilities

The win probability tells you how likely it is that a given variant has the highest value compared to all other variants in the experiment. It helps you determine whether the experiment shows a statistically significant real effect vs. simply random chance.

Let's say you're testing a new pricing page and have these results:

  • Control: $50 average revenue per user (500 users)
  • Test: $60 average revenue per user (500 users)

To calculate the win probabilities for the experiment, our methodology:

  1. Models each variant's value using a log-normal distribution (which works well for metrics like revenue that are always positive and often right-skewed):

    • We transform the data to log-space where it follows a normal distribution.
    • We use a Normal-Inverse-Gamma prior to handle uncertainty about both the mean and variance.
  2. Takes 10,000 random samples from each variant's posterior distribution.

  3. Checks which variant had the higher value for each sample.

  4. Calculates the final win probabilities:

    • Control wins in 5 out of 10,000 samples = 0.5% probability.
    • Test wins in 9,995 out of 10,000 samples = 99.5% probability.

These results tell us we can be 98.5% confident that the test variant performs better than the control.

Credible intervals

A credible interval tells you the range where the true value lies with 95% probability. Unlike traditional confidence intervals, credible intervals give you a direct probability statement about the metric value.

For example, if you have these results:

  • Control: $50 average revenue per user (500 users)
  • Test: $60 average revenue per user (500 users)

To calculate the credible intervals for the experiment, our methodology will:

  1. Transform the data to log-space and model each variant using a t-distribution:

    • We use log transformation because metrics like revenue are often right-skewed
    • The t-distribution parameters come from our Normal-Inverse-Gamma model
    • This handles uncertainty about both the mean and variance
  2. Find the 2.5th and 97.5% percentiles of each distribution:

    • Control: [45.98, 55.1] = "You can be 95% confident the true average revenue is between $45.98 and $53.53"
    • Test: [55.15, 64.22] = "You can be 95% confident the true average revenue is between $55.15 and $64.22"

Since these intervals don't overlap, you can be quite confident that the test variant performs better than the control. The intervals will become narrower as you collect more data, reflecting your increasing certainty about the true values.

Questions?

Was this page useful?

Next article

Traffic allocation

By default, we use PostHog's multivariate feature flags to evenly assign people to variations (unless you choose to run an experiment without feature flags ). The experiment feature flag is initialized automatically when you create your experiment. In any experiment, there is one control group and up to nine test groups. Each user is randomly assigned to one of these groups based on their distinctId . This assignment is stable, meaning the same user will remain in the same group even across…

Read next article