--- title: "Contribution Analysis using Index Mutation in REPS" author: "" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Contribution Analysis using Index Mutation in REPS} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: ./REFERENCES.bib --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, include=FALSE} library(REPS) data("hedonic_data") ``` ## Introduction The `index_mutation` option in `calculate_hedonic_index()` provides a deeper analysis of a price index movement. It shows how much observations or groups of observations contribute to the index mutation in one selected period. An index mutation is the period-over-period change in the index. The contribution analysis helps identify which observations or units have the largest influence on that change. The analysis is activated by setting: ```r index_mutation = TRUE ``` When this option is used, `calculate_hedonic_index()` returns a list with two elements: - `Index`: the regular price index; - `Index_mutation`: the contribution-to-index-mutation table. --- ## How the Method Works The contribution-to-index-mutation calculation uses a leave-one-out approach. For a selected period, REPS: 1. calculates the original index; 2. removes one observation or one unit group from the selected period; 3. recalculates the index without that observation or group; 4. compares the recalculated index with the original index; 5. repeats this for all observations or groups in the selected period. The difference between the original index and the recalculated index indicates the contribution of the excluded observation or group. --- ## Example Data The input data are the same as for a regular call to `calculate_hedonic_index()`. For this vignette, we use a small sample of `hedonic_data` to keep rendering fast. The index mutation calculation recalculates the index many times, so using the full dataset would make the vignette slower than necessary. ```{r} dataset <- hedonic_data dataset$floor_area <- log(dataset$floor_area) set.seed(123) dataset <- dataset |> dplyr::group_by(period) |> dplyr::slice_sample(n = 25) |> dplyr::ungroup() head(dataset) ``` --- ## Basic Use The example below calculates a Fisher index and adds contribution-to-index-mutation analysis. The important arguments are: - `index_mutation = TRUE`, which activates the analysis; - `index_mutation_period`, which selects the period to analyse; - `unit_variable`, which defines whether observations are excluded individually or by group. ```{r} result <- calculate_hedonic_index( dataset = dataset, method = "fisher", period_variable = "period", dependent_variable = "price", numerical_variables = c("floor_area", "dist_trainstation"), categorical_variables = c("neighbourhood_code", "dummy_large_city"), reference_period = "2015", number_of_observations = FALSE, index_mutation = TRUE, index_mutation_period = "2019Q1", unit_variable = "neighbourhood_code" ) head(result$Index) head(result$Index_mutation) ``` --- ## Selecting the Period: `index_mutation_period` The `index_mutation_period` parameter selects the period for which the contribution analysis is performed. This matters because index mutation analysis is calculated for **one period at a time**. The function removes observations or groups only from the selected period. Observations in other periods remain in the dataset during each recalculation. In the example above, this is done with: ```r index_mutation_period = "2019Q1" ``` If `index_mutation_period = NULL`, REPS automatically selects the latest available period. Use `index_mutation_period` when a specific period needs closer inspection, for example when the index movement is unusually large or when the latest published period needs to be explained. --- ## Observation-Level and Grouped Contribution The `unit_variable` argument controls whether the contribution analysis is performed at observation level or group level. If `unit_variable = NULL`, each row in the selected period is removed once: ```r unit_variable = NULL ``` If `unit_variable` is supplied, REPS removes all observations belonging to one group at a time. In the example above, contributions are calculated by neighbourhood: ```r unit_variable = "neighbourhood_code" ``` Grouped output is often easier to interpret than observation-level output, because it shows which units, such as neighbourhoods, contributed most to the index mutation. --- ## Interpreting the Output The `Index_mutation` table contains the following main columns: - `Index_excl_observation`: index value after excluding the observation or group; - `Index_original`: original index value; - `Index_difference`: difference between the original index and the recalculated index; - `PoP_excl_observation`: period-over-period growth after exclusion; - `PoP_original`: original period-over-period growth; - `PoP_difference`: difference between the recalculated and original period-over-period growth. The main column for identifying influence on the index level is: ```r Index_difference = Index_original - Index_excl_observation ``` A positive `Index_difference` means the excluded observation or group increased the original index. A negative `Index_difference` means it lowered the original index. For analysing the period-over-period mutation, inspect `PoP_difference`. ```{r} mutation_table <- result$Index_mutation head(mutation_table[order(-mutation_table$Index_difference), ]) head(mutation_table[order(mutation_table$Index_difference), ]) ``` The results should be read as a sensitivity analysis: they show what happens to the selected-period index movement when a specific observation or group is removed. ## References