Contribution Analysis using Index Mutation in REPS

Introduction

The index_mutation option in calculate_hedonic_index() provides a deeper analysis of a price index movement. It shows how much observations or groups of observations contribute to the index mutation in one selected period.

An index mutation is the period-over-period change in the index. The contribution analysis helps identify which observations or units have the largest influence on that change.

The analysis is activated by setting:

index_mutation = TRUE

When this option is used, calculate_hedonic_index() returns a list with two elements:

  • Index: the regular price index;
  • Index_mutation: the contribution-to-index-mutation table.

How the Method Works

The contribution-to-index-mutation calculation uses a leave-one-out approach.

For a selected period, REPS:

  1. calculates the original index;
  2. removes one observation or one unit group from the selected period;
  3. recalculates the index without that observation or group;
  4. compares the recalculated index with the original index;
  5. repeats this for all observations or groups in the selected period.

The difference between the original index and the recalculated index indicates the contribution of the excluded observation or group.


Example Data

The input data are the same as for a regular call to calculate_hedonic_index().

For this vignette, we use a small sample of hedonic_data to keep rendering fast. The index mutation calculation recalculates the index many times, so using the full dataset would make the vignette slower than necessary.

dataset <- hedonic_data
dataset$floor_area <- log(dataset$floor_area)

set.seed(123)

dataset <- dataset |>
  dplyr::group_by(period) |>
  dplyr::slice_sample(n = 25) |>
  dplyr::ungroup()

head(dataset)
#> # A tibble: 6 × 6
#>   period  price floor_area dist_trainstation neighbourhood_code dummy_large_city
#>   <chr>   <int>      <dbl>             <dbl> <chr>                         <int>
#> 1 2008Q1 6.82e5       4.55             0.216 E                                 0
#> 2 2008Q1 8.74e5       4.73             0.999 D                                 1
#> 3 2008Q1 1.69e6       5.04             2.32  D                                 0
#> 4 2008Q1 8.39e5       4.75             2.50  C                                 1
#> 5 2008Q1 5.43e5       4.37             1.39  D                                 0
#> 6 2008Q1 6.40e5       4.51             2.47  A                                 0

Basic Use

The example below calculates a Fisher index and adds contribution-to-index-mutation analysis.

The important arguments are:

  • index_mutation = TRUE, which activates the analysis;
  • index_mutation_period, which selects the period to analyse;
  • unit_variable, which defines whether observations are excluded individually or by group.
result <- calculate_hedonic_index(
  dataset = dataset,
  method = "fisher",
  period_variable = "period",
  dependent_variable = "price",
  numerical_variables = c("floor_area", "dist_trainstation"),
  categorical_variables = c("neighbourhood_code", "dummy_large_city"),
  reference_period = "2015",
  number_of_observations = FALSE,
  index_mutation = TRUE,
  index_mutation_period = "2019Q1",
  unit_variable = "neighbourhood_code"
)

head(result$Index)
#>   period     Index
#> 1 2008Q1 101.27033
#> 2 2008Q2  99.89943
#> 3 2008Q3 105.92769
#> 4 2008Q4 103.22324
#> 5 2009Q1  98.38427
#> 6 2009Q2  99.83063
head(result$Index_mutation)
#>   neighbourhood_code period Index_excl_observation Index_original
#> 1                  B 2019Q1              100.39420       99.34528
#> 2                  D 2019Q1              100.15980       99.34528
#> 3                  A 2019Q1               98.84730       99.34528
#> 4                  C 2019Q1               98.03411       99.34528
#> 5                  E 2019Q1               97.92741       99.34528
#>   Index_difference PoP_excl_observation PoP_original PoP_difference
#> 1       -1.0489226            -4.172429    -5.173639      1.0012103
#> 2       -0.8145198            -4.396169    -5.173639      0.7774698
#> 3        0.4979707            -5.648958    -5.173639     -0.4753195
#> 4        1.3111694            -6.425167    -5.173639     -1.2515283
#> 5        1.4178636            -6.527008    -5.173639     -1.3533693

Selecting the Period: index_mutation_period

The index_mutation_period parameter selects the period for which the contribution analysis is performed.

This matters because index mutation analysis is calculated for one period at a time. The function removes observations or groups only from the selected period. Observations in other periods remain in the dataset during each recalculation.

In the example above, this is done with:

index_mutation_period = "2019Q1"

If index_mutation_period = NULL, REPS automatically selects the latest available period.

Use index_mutation_period when a specific period needs closer inspection, for example when the index movement is unusually large or when the latest published period needs to be explained.


Observation-Level and Grouped Contribution

The unit_variable argument controls whether the contribution analysis is performed at observation level or group level.

If unit_variable = NULL, each row in the selected period is removed once:

unit_variable = NULL

If unit_variable is supplied, REPS removes all observations belonging to one group at a time. In the example above, contributions are calculated by neighbourhood:

unit_variable = "neighbourhood_code"

Grouped output is often easier to interpret than observation-level output, because it shows which units, such as neighbourhoods, contributed most to the index mutation.


Interpreting the Output

The Index_mutation table contains the following main columns:

  • Index_excl_observation: index value after excluding the observation or group;
  • Index_original: original index value;
  • Index_difference: difference between the original index and the recalculated index;
  • PoP_excl_observation: period-over-period growth after exclusion;
  • PoP_original: original period-over-period growth;
  • PoP_difference: difference between the recalculated and original period-over-period growth.

The main column for identifying influence on the index level is:

Index_difference = Index_original - Index_excl_observation

A positive Index_difference means the excluded observation or group increased the original index. A negative Index_difference means it lowered the original index.

For analysing the period-over-period mutation, inspect PoP_difference.

mutation_table <- result$Index_mutation

head(mutation_table[order(-mutation_table$Index_difference), ])
#>   neighbourhood_code period Index_excl_observation Index_original
#> 5                  E 2019Q1               97.92741       99.34528
#> 4                  C 2019Q1               98.03411       99.34528
#> 3                  A 2019Q1               98.84730       99.34528
#> 2                  D 2019Q1              100.15980       99.34528
#> 1                  B 2019Q1              100.39420       99.34528
#>   Index_difference PoP_excl_observation PoP_original PoP_difference
#> 5        1.4178636            -6.527008    -5.173639     -1.3533693
#> 4        1.3111694            -6.425167    -5.173639     -1.2515283
#> 3        0.4979707            -5.648958    -5.173639     -0.4753195
#> 2       -0.8145198            -4.396169    -5.173639      0.7774698
#> 1       -1.0489226            -4.172429    -5.173639      1.0012103
head(mutation_table[order(mutation_table$Index_difference), ])
#>   neighbourhood_code period Index_excl_observation Index_original
#> 1                  B 2019Q1              100.39420       99.34528
#> 2                  D 2019Q1              100.15980       99.34528
#> 3                  A 2019Q1               98.84730       99.34528
#> 4                  C 2019Q1               98.03411       99.34528
#> 5                  E 2019Q1               97.92741       99.34528
#>   Index_difference PoP_excl_observation PoP_original PoP_difference
#> 1       -1.0489226            -4.172429    -5.173639      1.0012103
#> 2       -0.8145198            -4.396169    -5.173639      0.7774698
#> 3        0.4979707            -5.648958    -5.173639     -0.4753195
#> 4        1.3111694            -6.425167    -5.173639     -1.2515283
#> 5        1.4178636            -6.527008    -5.173639     -1.3533693

The results should be read as a sensitivity analysis: they show what happens to the selected-period index movement when a specific observation or group is removed.

References