Getting Started with Price Index Calculation using REPS

Introduction

The calculate_hedonic_index() function is the central entry point in REPS for computing price indices using various hedonic-based methods. It supports six commonly used approaches:

  • Laspeyres - hedonic double imputation base-weighted index
  • Paasche - hedonic double imputation current-weighted index
  • Fisher - geometric average of Laspeyres and Paasche (both hedonic double imputation)
  • HMTS - Hedonic Multilateral Time Series re-estimation Splicing
  • Time Dummy - single-regression hedonic index using time dummies
  • Rolling Time Dummy - chained index based on overlapping time-dummy regressions
  • Repricing - quasi-repeat-sales method comparing observed vs predicted price changes

This vignette demonstrates how to apply each method using a consistent interface, making it easy to compare results across approaches.

The HMTS method implemented in REPS is a multilateral, time-series-based index that balances stability, limited revision, and early detection of turning points in the context of property price indices (Ishaak et al. 2024).

For broader context and international guidelines on the compilation of property price indices, including traditional methods such as hedonic double imputation Laspeyres, Paasche, Fisher and Repricing, we refer to Eurostat’s Handbook on Residential Property Price Indices (RPPIs) (Eurostat 2013). For (Rolling) Time Dummy we refer to Hill et al. (Hill et al. 2018, 2022).


Required Data

Before running any calculations, ensure that your dataset is available and contains the necessary variables:

# Example dataset (you should already have this loaded)
head(hedonic_data)
#>   period   price floor_area dist_trainstation neighbourhood_code
#> 1 2008Q1 1142226  127.41917       2.887992985                  E
#> 2 2008Q1  667664   88.70604       2.903955192                  D
#> 3 2008Q1  636207  107.26257       8.250659447                  B
#> 4 2008Q1  777841  112.65725       0.005760792                  E
#> 5 2008Q1  795527  108.08537       1.842145127                  E
#> 6 2008Q1  539206   97.87751       6.375981360                  D
#>   dummy_large_city
#> 1                0
#> 2                1
#> 3                1
#> 4                0
#> 5                0
#> 6                1

The required variables include:

  • period_variable: the time period
  • dependent_variable: usually price
  • numerical_variables: e.g., floor_area
  • categorical_variables: e.g., neighbourhood_code

Typically, for some numerical variables you may want to apply a log transformation. For example, floor_area is often log-transformed to improve linearity, stabilize variance, and reduce the impact of extreme values. Log-transforming variables can help meet regression assumptions by making relationships between variables more linear and residuals more homoscedastic (constant variance).

Example of log-transforming floor_area:

dataset <- hedonic_data
dataset$floor_area <- log(dataset$floor_area)

Using calculate_hedonic_index()

The calculate_hedonic_index() function provides a unified interface for estimating hedonic price indices. You only need to specify the method via the method argument - the function handles the rest.

Example: Single Index Method - Time Dummy

Tbl_TD <- REPS::calculate_hedonic_index(
  dataset = dataset,
  method = "timedummy",
  period_variable = "period",
  dependent_variable = "price",
  numerical_variables = c("floor_area", "dist_trainstation"),
  categorical_variables = c("dummy_large_city", "neighbourhood_code"),
  reference_period = 2015,
  number_of_observations = FALSE
)

head(Tbl_TD)
#>   period    Index
#> 1 2008Q1 99.61795
#> 2 2008Q2 98.23230
#> 3 2008Q3 99.21145
#> 4 2008Q4 98.57565
#> 5 2009Q1 98.70791
#> 6 2009Q2 98.33412

Example: Multiple Index Methods

multi_result <- REPS::calculate_hedonic_index(
  method = c("fisher","hmts", "laspeyres", "paasche", "repricing", "timedummy", "rolling_timedummy"),
  dataset = dataset,
  period_variable = "period",
  dependent_variable = "price",
  numerical_variables = c("floor_area", "dist_trainstation"),
  categorical_variables = c("neighbourhood_code", "dummy_large_city"),
  reference_period = "2015",
  number_of_observations = FALSE,
  periods_in_year = 4,
  number_preliminary_periods = 1,
  window_length = 4,
  production_since = NULL,
  resting_points = FALSE,
  imputation = FALSE
)

head(multi_result$fisher)
#>   period    Index
#> 1 2008Q1 99.61436
#> 2 2008Q2 98.33742
#> 3 2008Q3 98.98908
#> 4 2008Q4 98.02118
#> 5 2009Q1 98.49971
#> 6 2009Q2 98.26416

Chaining Price Indices (Annual Overlap Method)

While standard calls to calculate_hedonic_index() compute multilateral or direct indices over an entire dataset, long-term pooled models can suffer from structural market changes over time. To account for shifting consumer preferences, indices are often calculated over shorter periods and linked together.

You can easily implement the Annual Overlap Method directly inside the hub function by setting the chained = TRUE argument. This tells the function to automatically split the data by year, calculate a short-term index using the final period of the previous year as the overlap base, and chain them together into a continuous series.

Example: Chained Fisher Index

You can use the chained = TRUE parameter with any underlying hedonic method supported by the package. Here is an example using the Fisher index.

# Calculate the chained index
chained_fisher <- calculate_hedonic_index(
  dataset = dataset,
  method = "fisher",
  chained = TRUE,
  period_variable = "period",
  dependent_variable = "price",
  numerical_variables = c("floor_area", "dist_trainstation"),
  categorical_variables = c("dummy_large_city", "neighbourhood_code"),
  reference_period = "2015"
)

head(chained_fisher)
#>   period    Index
#> 1 2008Q1 99.32046
#> 2 2008Q2 98.04729
#> 3 2008Q3 98.69702
#> 4 2008Q4 97.73199
#> 5 2009Q1 98.18607
#> 6 2009Q2 97.30401

Visualizing the Index

For quick and clear visualizations, the plot_price_index() utility function can be used to generate time-series plots of the calculated indices. Because the output of both chained and unchained indices mirrors the standard REPS dataframe structure, it is fully compatible with this built-in plotting utility.

While we encourage users to create custom visualizations suited to their analytical needs, this function provides a convenient starting point for simple and consistent line plots.

plot_price_index(multi_result)

Summary

The calculate_hedonic_index() function streamlines access to multiple hedonic index methods via a consistent interface, including robust support for chained indices. This allows analysts to easily compare outputs and select the most appropriate method for their context without switching between different wrapper functions.

References

Eurostat. 2013. Handbook on Residential Property Price Indices (RPPIs). Publications Office of the European Union. https://doi.org/10.2785/34007.
Hill, Robert J., Michael Scholz, Chihiro Shimizu, and Michael Steurer. 2018. “An Evaluation of the Methods Used by European Countries to Compute Their Official House Price Indices.” Economie Et Statistique 2018 (500–502): 221–38. https://doi.org/10.24187/ECOSTAT.2018.500T.1953.
Hill, Robert J., Michael Scholz, Chihiro Shimizu, and Michael Steurer. 2022. “Rolling-Time-Dummy House Price Indexes: Window Length, Linking and Options for Dealing with Low Transaction Volume.” Journal of Official Statistics 38 (1): 127–51. https://doi.org/10.2478/JOS-2022-0007.
Ishaak, F. F., Pim Ouwehand, and H. T. Remøy. 2024. “Constructing Limited-Revisable and Stable CPPIs for Small Domains.” Journal of Official Statistics 40 (3): 380–408. https://doi.org/10.1177/0282423X241246617.