---
title: "Contribution Analysis using Index Mutation in REPS"
author: ""
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Contribution Analysis using Index Mutation in REPS}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: ./REFERENCES.bib
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, include=FALSE}
library(REPS)
data("hedonic_data")
```

## Introduction

The `index_mutation` option in `calculate_hedonic_index()` provides a deeper analysis of a price index movement. It shows how much observations or groups of observations contribute to the index mutation in one selected period.

An index mutation is the period-over-period change in the index. The contribution analysis helps identify which observations or units have the largest influence on that change.

The analysis is activated by setting:

```r
index_mutation = TRUE
```

When this option is used, `calculate_hedonic_index()` returns a list with two elements:

- `Index`: the regular price index;
- `Index_mutation`: the contribution-to-index-mutation table.

---

## How the Method Works

The contribution-to-index-mutation calculation uses a leave-one-out approach.

For a selected period, REPS:

1. calculates the original index;
2. removes one observation or one unit group from the selected period;
3. recalculates the index without that observation or group;
4. compares the recalculated index with the original index;
5. repeats this for all observations or groups in the selected period.

The difference between the original index and the recalculated index indicates the contribution of the excluded observation or group.

---

## Example Data

The input data are the same as for a regular call to `calculate_hedonic_index()`.

For this vignette, we use a small sample of `hedonic_data` to keep rendering fast. The index mutation calculation recalculates the index many times, so using the full dataset would make the vignette slower than necessary.

```{r}
dataset <- hedonic_data
dataset$floor_area <- log(dataset$floor_area)

set.seed(123)

dataset <- dataset |>
  dplyr::group_by(period) |>
  dplyr::slice_sample(n = 25) |>
  dplyr::ungroup()

head(dataset)
```

---

## Basic Use

The example below calculates a Fisher index and adds contribution-to-index-mutation analysis.

The important arguments are:

- `index_mutation = TRUE`, which activates the analysis;
- `index_mutation_period`, which selects the period to analyse;
- `unit_variable`, which defines whether observations are excluded individually or by group.

```{r}
result <- calculate_hedonic_index(
  dataset = dataset,
  method = "fisher",
  period_variable = "period",
  dependent_variable = "price",
  numerical_variables = c("floor_area", "dist_trainstation"),
  categorical_variables = c("neighbourhood_code", "dummy_large_city"),
  reference_period = "2015",
  number_of_observations = FALSE,
  index_mutation = TRUE,
  index_mutation_period = "2019Q1",
  unit_variable = "neighbourhood_code"
)

head(result$Index)
head(result$Index_mutation)
```

---

## Selecting the Period: `index_mutation_period`

The `index_mutation_period` parameter selects the period for which the contribution analysis is performed.

This matters because index mutation analysis is calculated for **one period at a time**. The function removes observations or groups only from the selected period. Observations in other periods remain in the dataset during each recalculation.

In the example above, this is done with:

```r
index_mutation_period = "2019Q1"
```

If `index_mutation_period = NULL`, REPS automatically selects the latest available period.

Use `index_mutation_period` when a specific period needs closer inspection, for example when the index movement is unusually large or when the latest published period needs to be explained.

---

## Observation-Level and Grouped Contribution

The `unit_variable` argument controls whether the contribution analysis is performed at observation level or group level.

If `unit_variable = NULL`, each row in the selected period is removed once:

```r
unit_variable = NULL
```

If `unit_variable` is supplied, REPS removes all observations belonging to one group at a time. In the example above, contributions are calculated by neighbourhood:

```r
unit_variable = "neighbourhood_code"
```

Grouped output is often easier to interpret than observation-level output, because it shows which units, such as neighbourhoods, contributed most to the index mutation.

---

## Interpreting the Output

The `Index_mutation` table contains the following main columns:

- `Index_excl_observation`: index value after excluding the observation or group;
- `Index_original`: original index value;
- `Index_difference`: difference between the original index and the recalculated index;
- `PoP_excl_observation`: period-over-period growth after exclusion;
- `PoP_original`: original period-over-period growth;
- `PoP_difference`: difference between the recalculated and original period-over-period growth.

The main column for identifying influence on the index level is:

```r
Index_difference = Index_original - Index_excl_observation
```

A positive `Index_difference` means the excluded observation or group increased the original index. A negative `Index_difference` means it lowered the original index.

For analysing the period-over-period mutation, inspect `PoP_difference`.

```{r}
mutation_table <- result$Index_mutation

head(mutation_table[order(-mutation_table$Index_difference), ])
head(mutation_table[order(mutation_table$Index_difference), ])
```

The results should be read as a sensitivity analysis: they show what happens to the selected-period index movement when a specific observation or group is removed.

## References