Nowcasting with 'JDemetra+ v3.x'
Corentin Lemasson
Source:vignettes/rjd3nowcasting.Rmd
rjd3nowcasting.Rmd
Abstract
Nowcasting is often defined as the prediction of the present, the very near future and the very recent past. This R package relies on JDemetra+ v3.x algorithms to help operationalizing the process of nowcasting. It can be used to specify and estimate Dynamic Factor Models and visualize how the real-time dataflow updates expectations, as for instance in Banbura and Modugno (2010)Introduction
This package can be use to specify and estimate Dynamic Factor Models in a very efficient way to provide consistent forecasts. Recent version of the package also includes news analysis. Analyzing news, which are defined as the discrepancy between the newly released figures and its forecasts, helps to interpret forecast revisions. As mentioned by Banbura and Modugno (2010), it enables us to produce statements like “the forecast was revised up by … because of higher than expected release of …”.
This R package uses the efficient libraries of JDemetra+ v3. The way the package was conceived is inspired by the GUI add-in developed for JDemetra+ V2 and it provides about the same functionality (except for the real-time simulation), but in the flexible R environment.
Installation settings
This package relies on the specific Java libraries of JDemetra+ v3 and on the package rjd3toolkit of rjdverse. Prior the installation, you must ensure to have a Java version >= 17.0 on your computer. If you need to use a portable version of Java to fill this request, you can follow the instructions in the installation manual.
In addition to a Java version >= 17.0, you must have a recent version of the R packages rJava (>= 1.0.6) and RProtobuf (>=0.4.17) that you can download from CRAN.
The package rjd3nowcasting depends on the package rjd3toolkit that you must install from GitHub beforehand.
# To get the current stable version (from the latest release):
### install.packages("remotes")
remotes::install_github("rjdverse/rjd3toolkit@*release")
remotes::install_github("rjdverse/rjd3nowcasting@*release", build_vignettes = TRUE)
# or to get the current development version from GitHub:
remotes::install_github("rjdverse/rjd3nowcasting")
Note that depending on the R packages that are already installed on your computer, you might also be asked to install or re-install some other packages from CRAN.
Usage
Once the package is loaded, there are four steps to follow:
- Prepare and import data
- Create or update the model
- Estimate the model
- Get results
Detailed information concerning each step follows below the example.
# Quick start example
## 1. Data
set.seed(100)
data0 <- stats::ts(
data = matrix(rnorm(500), 100, 5),
frequency = 12,
start = c(2010, 1)
)
data0[100, 1] <- data0[99:100, 2] <- data0[(1:100)[-seq(3, 100, 3)], 5] <- NA
data1 <- stats::ts(
data = rbind(data0, c(NA, NA, 1, 1, NA)),
frequency = 12,
start = c(2010, 1)
)
data1[100,1] <- data1[99,2] <- 1
## 2. Create or update the model
### Create model from scratch
dfm0 <- create_model(nfactors=2,
nlags=2,
factors_type = c("M", "M", "YoY", "M", "Q"),
factors_loading = matrix(data = TRUE, 5, 2),
var_init = "Unconditional")
### Update model
# ! Recall: due to potential presence of local minimum and lack of
# identification issue, it is always better to start from a previously
# estimated model when available.
est0 <- estimate_em(dfm0, data0) # cfr. next step
dfm1 <- est0$dfm # R object (list) to potentially save from one time to another
# or, equivalently,
dfm1 <- create_model(nfactors=2,
nlags=2,
factors_type = c("M", "M", "YoY", "M", "Q"),
factors_loading = matrix(data = TRUE, 5, 2),
var_init = "Unconditional",
var_coefficients = est0$dfm$var_coefficients,
var_errors_variance = est0$dfm$var_errors_variance,
measurement_coefficients = est0$dfm$measurement_coefficients,
measurement_errors_variance = est0$dfm$measurement_errors_variance)
## 3. Estimate the model
est1 <- estimate_ml(dfm1, data1)
# or est1<-estimate_em(dfm1, data1)
# or est1<-estimate_pca(dfm1, data1)
## 4. Get results
rslt1 <- get_results(est1)
print(rslt1)
fcst1 <- get_forecasts(est1, n_fcst = 2)
print(fcst1)
plot(fcst1)
news1 <- get_news(est0, data1, target_series = "Series 1", n_fcst = 2)
print(news1)
plot(news1)
1. Prepare and import data
This step is external to the package. Recall that DFM require all
input data to be stationary. Once the data have been prepared
accordingly and imported in R, it is required to create a time-series
object with the data by using the well-known stats::ts()
function like in the example.
In case of dynamic work, the columns of the dataset should remain the same from one time to another and in the same order. Only additional rows can be added reflecting the new data coming in.
2. Create/Update model
2.1. Create a new model
The function create_model()
enables you to build a new
model.
The state-space representation of Dynamic Factor Model can be written
as follows
where the measurement equation links
the observations to the underlying factors. Those factors, as shown in
the second equation, follow a VAR process of order p. The number of
factors to consider and the order p of the VAR process are to be defined
in the first two arguments of the function
create_model()
.
The third argument factors_type
defines the link between
the series and the factors (Z matrix). This link can be more or less
sophisticated depending on the variables. Three options are possible for
the moment:
A variable expressed in terms of monthly growth rates can be linked to a factor representing the underlying monthly growth rate of the economy by defining the factor type as “M” for this variable (default).
A monthly or quarterly variable that is correlated with the the underlying quarterly growth rate of the economy can be linked to a weighted average of the factors representing the underlying monthly growth rate of the economy. Such a weighted average is meant to represent quarterly growth rates, and it can be implemented by defining the factor type as “Q” for this variable.
A variable can also be linked to the cumulative sum of the last 12 monthly factors. If the model is designed in such a way that the monthly factors represent monthly growth rates, the resulting cumulative sum boils down to the year-on-year growth rate. Thus, variables expressed in terms of year-on-year growth rates or surveys that are correlated with the year-on-year growth rates of the reference series should be linked to the factors in this way. The factor type should be defined as “YoY” in this case.
The fourth and last compulsory argument refers to the factors loading that can incorporate zero restrictions. Users must mention there which factors load on which variables.
The argument var_init
tells whether the first unobserved
factors values should be defined considering the unconditional
distribution (recommended) or should be set equal to zero.
The last four arguments var_coefficients
,
var_errors_variance
, measurement_coefficients
and measurement_errors_variance
can be used to create a
model based on a previous estimate of the model (see section Update an
existing model). The default value of those four arguments is NULL
meaning that the model will be created from scratch.
2.2. Update an existing model
In case of dynamic work, a similar model was previously estimated
based on an older version of the data. In that case, it is recommended
not to create a new model from scratch but to start from the previously
estimated model. For that, it must be made recoverable from the previous
time. One option is to save the required information from one time to
another using the base function saveRDS()
(see section 3 to
know what exactly should be saved). Reasons for starting from a
previously estimated model when available are faster convergence during
the estimation step and the possibility to avoid running into another
local minimum, resulting in parameters estimates that could potentially
be very different from the previous time (especially since the model is
not fully identifiable).
To generate a new model from a previously estimated one, there are two possibilities:
Set the new R object directly from the previous one, or
Use the function
create_model()
while filling the argumentsvar_coefficients
,var_errors_variance
,measurement_coefficients
andmeasurement_errors_variance
with their previously estimated values.
2.3. Composition of the created object
The function create_model()
returns a R object called
‘JD3_DfmModel’. This is just a list of six elements that fully
characterize the model. The list includes the estimated coefficient of
the VAR equation and the variance-covariance matrix of the error terms,
the estimated coefficient of the measurement equation and the
idiosyncratic variance of the error terms, the type of initialisation
and the link to consider between the series and the factor (i.e. the
argument factors_type
). This is a R list of matrices and
vectors that can easily be saved from one time to another using for
example the function saveRDS()
.
3. Estimation
3.1. Different algorithms/functions
Parameters can be estimated using different algorithms. One of the three available functions should be picked for the purpose of estimation:
- The function
estimate_pca()
estimates the model parameters using only Principal Component Analysis (PCA). Although this is fast, this approach is not recommended, especially if some series are quarterly series or series associated to year-on-year growth rates (see section 2.1).
- The function
estimate_em()
estimates the model parameters using the EM algorithm (with initial values given by PCA by default). The function includes a few optional arguments which can be used to tune the estimation process.
- The function
estimate_ml()
estimates the model parameters by Maximum Likelihood (by default, with initial values given by the EM algorithm whose initial values are given by PCA). The function includes several optional arguments which can be used to tune the estimation process. The functionestimate_ml()
is recommended, although it can be argued that the functionestimate_em()
, which is somewhat faster, also constitutes a good solution.
The three functions have two compulsory arguments which are necessary
to estimate parameters: the model, i.e. an object of class
‘JD3_DfmModel’ typically generated by the create_model()
function, and the dataset which must be a mts
object. All
three functions return the same R object, an object of class
‘JD3_DfmEstimates’ that can be used as input for the results functions
(see section 4). Note that the returned object is just a R list
containing various elements.
In addition to the selected algorithm, estimation speed depends on the size of the model. Models with one or two factors will be fastly estimated (in a few seconds), also when the number of variables is large. However, the estimation of more complex models may take minutes to converge.
3.2. Prior standardization of the data
Dynamic factor models require a prior standardization of the data. This is an essential step which can lead to confusion in certain situations. The usual mechanism is quite simple and is divided into three stages:
- Standardization of each variables (i.e., subtract the mean and divide by the standard deviation)
- Model estimation based on standardized data
- Convert results (including forecasts) for raw data
This means that both the likelihood of the model and the estimates of the parameters, will be given by the transformed data. However, final results like the forecasts and the forecasts errors variance of the transformed series will be converted for the raw data.
By default, the data are standardized. If, for some reasons, your
dataset already contains standardized data, the standardization step can
be skipped by defining standardized = TRUE
in the
estimation function.
We need to pay particular attention to the standardization step when
working dynamically. For instance, if you do not wish to re-estimate the
model (see section 3.3), you must also provide the initial mean and
standard deviation of each variables calculated at the time of the last
estimation of the model. The argument input_standardization
in each estimation function is for that purpose. Note that for news
analysis (see section 4.3), the mean and standard deviation considered
for the standardization step must be the same for the old and the new
datasets. In practice, they are calculated based on the old dataset.
3.3. Fixed parameters
The three estimation functions include a boolean argument
re_estimate
that indicate whether the model should be
re-estimated (default) or not.
Note that for news analysis (see section 4.3), the model is kept
unchanged between the previous and the current period to track the
impact of news. Hence, to retrieve the same forecasts as those return by
the get_news()
function, we should consider
re_estimate = FALSE
and the previous standardization input
should be added in the argument input_standardization
(see
section 3.2).
3.4. Save R object from one time to another
In case of dynamic work, some R object should be passed from one time
to another (see section 2.2). To do that, the user is invited to use the
functions saveRDS()
and readRDS()
from base
R.
What to save depends whether the intention of the user is to perform news analysis.
If the intention is not to perform news analysis and just to
re-estimate the model each time and update the forecasts, only the
estimated model should be saved from one time to another. This is an
object of class ‘JD3_DfmModel’, generated as part of the output of the
function estimate_pca()
, estimate_em()
or
estimate_ml()
, where the default/previous estimates of the
parameters are replaced by the new ones. The updated model is the
element referred to as ‘dfm’ in the list returned by the estimation
functions.
If the intention is to perform news analysis, the entire object/list
returned by the function estimate_pca()
,
estimate_em()
or estimate_ml()
, i.e. an R
object of class ‘JD3_DfmEstimates’, should be saved. Optionally, a
matrix with the standardization input used at the time of the initial
estimate (i.e. the mean and standard deviation used to standardize data)
can be saved as well. At the time of the initial estimate, the formatted
matrix containing this information can be found in the preprocessing
section of the output of the function get_results()
(see
section 4.1). This could be used for instance to retrieve the
concordance of the forecasts between the functions
get_forecasts()
and get_news()
.
4. Results
Results are split in three parts.
4.1. Estimates
The function get_results()
can be used to obtain results
related to
- pre-processing: including the standardization input (see section 3.2, 3.3 and 3.4)
- parameters estimates
- factors
- residuals
- likelihood
The function get_results()
has a single argument which
is an object of class ‘JD3_DfmEstimates’ typically generated by the
function estimate_pca()
, estimate_em()
or
estimate_ml()
. It returns an object of class
‘JD3_DfmResults’ which is a list of the aforementioned output. A generic
print()
function can be applied on its output and returns
(by default) nicely formatted results related to the parameters
estimates.
4.2. Forecasts
The function get_forecasts()
can be used to obtain
forecasts of the variables, as well as the forecast errors standard
deviation. You have access to both the forecasts of the transformed
series (see section 3.2) and the raw series. As part of the output list,
there is also extra output referred to as ‘forecasts_only’. Those are
just an extract of the forecasts of the raw series which contains only
the forecasts, i.e. where the rest of the series does not appear
together with the forecasts. Note that for quarterly series (factor type
“Q”), the forecast at the last month of the quarter should be the one
considered. For instance, if the variable under consideration is made of
quarterly growth rates, each forecast figure corresponds to the growth
rate of the last three months compared with the three previous months
(e.g. in August, it is the estimate of the growth rate between
June-July-August and March-April-May).
The function get_forecasts()
has two arguments. One is
an object of class ‘JD3_DfmEstimates’ typically generated by the
function estimate_pca()
, estimate_em()
or
estimate_ml()
. The other is the number of forecasting
periods to consider, starting from the most up-to-date variable.
Two generic functions can be applied to the object returned by the
function get_forecasts()
. A print()
function
will return (by default) the forecasts only. A plot
function can be used to visualize the series and the forecasts as well a
80% prediction interval around the forecasts.
4.3. News analysis
There are two kind of differences between two consecutive updates of a dataset:
- Newly released figures
- Revision of past data
The purpose of news analysis is to monitor the impact of (1) on the
forecasts. Those impacts can be scrutinized in details by using the
get_news()
function. This function displays the impact of
the difference between the newly released figures and their forecast
based on the revised figures (i.e. the old data + (2)).
The function get_news()
has four arguments:
- The estimated model which is an object of class ‘JD3_DfmEstimates’
generated by the function
estimate_pca()
,estimate_em()
orestimate_ml()
. As the purpose of news analysis is to monitor the impact of newly released figures on forecasts, the model is kept unchanged between the previous and the new release. Hence, the previously estimated model should be the one specified here. Note that the pre-standardization of the data (see section 3.2) is also calculated based on the previous release.
- The newly released data which should be a
mts
object - The variable of interest
- The number of forecasts to consider
The list of output returned by the function get_news()
contains the weights of the news, their impact and the forecasts for
both the transformed (see section 3.2) and the raw series. The weights
given to each news represent their relevance for the variable of
interest. The impacts are the weights of the news times their size. They
give the impact of each piece of news on the forecast revisions of the
variable of interest. Therefore they allow users to understand how the
revisions can be decomposed in terms of the news components for the
various series. The generic plot()
function can be used
directly on object of class ‘JD3_DfmNews’ (i.e. object generated by the
function get_news()
) to quickly visualize all impacts with
a nicely formatted barchart. This is similar to what was included in the
GUI add-in of
JDemetra+ V2.x.
Finally, the forecasts returned by the function
get_news()
include:
- The old forecasts which are the forecasts based on the previous data
- The revised forecasts which are the forecasts based on the previous data but where past data were revised based on the new data. Hence, the global impact of revisions in past data are also provided by considering the difference between the revised forecasts and the old forecasts.
- The new forecasts which are the forecasts based on the new data. The difference between the new and the revised forecasts corresponds to the sum of the impacts of each news.
In addition to the plot()
function, there are two more
generic functions that can be applied to an object of class
‘JD3_DfmNews’. The function summary()
will give you a
summary of the weights and impacts of each news on the variable of
interest for each forecasting period. The print()
function
returns the same table as the summary()
function together
with the information related to the forecasts.