Generalized Method of Moments

In general, a GMM estimator chooses the point estimate $\theta$ by minimizing the following criterion function

\[Q(\theta) = \left[\frac{1}{N}\sum_{i=1}^N \mathbf{g}_i(\theta)\right]'\mathbf{W}\left[\frac{1}{N}\sum_{i=1}^N \mathbf{g}_i(\theta)\right]\]

where $\mathbf{g}_i(\theta)$ is a vector of residuals from evaluating the moment conditions with parameter vector $\theta$ and observation i of a sample consisting of $N$ observations. When the number of moment conditions (length of $\mathbf{g}_i(\theta)$) is greater than the number of parameters (length of $\theta$), the parameters are over-identified. In this case, $\mathbf{W}$ is a weight matrix affecting the relative importance of each moment condition for the criterion function. When the number of moment conditions matches the number of parameters, the weight matrix is irrelevant and the parameters are just-identified.^[1]

Nonlinear Iterated GMM Estimator

An iterated GMM estimator is implemented for some arbitrary moment conditions provided by users. This allows estimating nonlinear GMM via the iterative method considered in Hansen et al. (1996). In particular, the two-step GMM estimator is a special case where the number of iterations is restricted to two.

Starting from an initial weight matrix $\mathbf{W}^{(1)}$, the iterated GMM estimator iterates through the following steps for $k\geq 1$:

Given a weight matrix $\mathbf{W}^{(k)}$, find $\theta^{(k)}$ that minimizes the criterion function.
Given the minimizer $\theta^{(k)}$, compute a new weight matrix $\mathbf{W}^{(k+1)}$ using the inverse of the variance-covariance matrix estimated from the moment conditions $\mathbf{g}_i(\theta^{(k)})$ for each $i$.

The above iterations continue until one of the two conditions are met:

$\theta^{(k+1)}$ is sufficiently close to $\theta^{(k)}$.
A maximum number of iterations is reached.

Example: Exponential Regression with Instruments

To illustrate the use of such an estimator, we replicate Example 8 from Stata manual for gmm. The data for this example are included in MethodOfMoments.jl for convenience and can be loaded as below:

using MethodOfMoments, CSV, DataFrames, TypedTables
# exampledata loads data from CSV files bundled with MethodOfMoments.jl
exampledata(name::Union{Symbol,String}) =
    DataFrame(CSV.read(MethodOfMoments.datafile(name), DataFrame), copycols=true)

data = Table(exampledata(:docvisits))

Table with 10 columns and 4412 rows:
      docvis  age  income   female  black  hispanic  married  physlim  ⋯
    ┌───────────────────────────────────────────────────────────────────
 1  │ 0       3.9  30.0     1       0      1         1        0        ⋯
 2  │ 1       4.7  10.0     0       0      1         1        0        ⋯
 3  │ 15      2.7  27.0     1       0      0         0        0        ⋯
 4  │ 0       3.0  11.25    0       0      0         1        0        ⋯
 5  │ 2       5.4  76.33    1       0      0         1        0        ⋯
 6  │ 2       6.0  185.557  0       0      0         1        0        ⋯
 7  │ 0       2.8  7.6      0       0      0         0        0        ⋯
 8  │ 0       4.2  14.0     1       0      0         0        0        ⋯
 9  │ 1       3.2  59.897   0       0      0         1        0        ⋯
 10 │ 0       4.8  17.1     0       0      0         1        0        ⋯
 11 │ 0       2.6  14.04    0       0      1         1        0        ⋯
 12 │ 0       3.9  23.92    0       0      1         1        0        ⋯
 13 │ 5       2.6  20.26    1       0      1         1        0        ⋯
 14 │ 2       3.4  75.0     0       0      0         1        0        ⋯
 15 │ 0       4.1  23.791   1       0      1         0        0        ⋯
 16 │ 1       3.7  43.5     0       0      1         1        0        ⋯
 17 │ 3       2.7  60.905   1       0      0         0        0        ⋯
 ⋮  │   ⋮      ⋮      ⋮       ⋮       ⋮       ⋮         ⋮        ⋮     ⋱

Above, the data frame is converted to a Table from TypedTables.jl for a convenient syntax of evaluating moment conditions by row while being fast.

Warning

It is important to clean the sample before conducting estimation. The estimator does not handle invalid values such as missing and NaN when evaluating the moment conditions.

For this example, the moment conditions can be written as

\[\mathrm{E}[\mathbf{g}_i(\theta)] = \mathrm{E}\left[ (Y_i - \exp(\theta'\mathbf{X}_i)) \mathbf{Z}_i \right] = \mathbf{0}\]

where the residual from the structural equation

\[Y_i = \exp(\theta'\mathbf{X}_i) + \varepsilon_i\]

is assumed to be orthogonal to a vector of instruments $\mathbf{Z}_i$.

Specifying Moment Conditions and Their Derivatives

The moment conditions and their derivatives with respect to the parameters need to be provided as Julia functions that are going to be evaluated for individual observations. They are required to accept the parameters as the first argument and the row index of the data frame as the second argument. For the moment conditions, the function should return the residuals from evaluating the moment conditions for a specific observation as an iterable object such as a Tuple or a static vector SVector. For the derivatives, the function should return a Jacobian matrix with rows corresponding to the moment conditions and columns corresponding to the parameters.

For example, the moment conditions could be defined as follows:

using StaticArrays

struct g_stata_gmm_ex8{D}
    data::D
end

function (g::g_stata_gmm_ex8)(θ, r)
    @inbounds d = g.data[r]
    x = SVector{5,Float64}((d.private, d.chronic, d.female, d.income, 1.0))
    z = SVector{7,Float64}((d.private, d.chronic, d.female, d.age, d.black, d.hispanic, 1.0))
    return (d.docvis - exp(θ'x)) .* z
end

Notice that since such functions typically need to retrieve values from a data frame that is not an argument of the functions, a struct wrapping the data frame has been defined and a method is attached to the struct following the requirement for the arguments.

Info

Illustrations for adding a method to a struct is available here from Julia documentation.

To simplify the above definition, a macro @dgd is provided. This allows defining a struct as in the above example by prepending @dgd to the function definition without typing the definition for the struct:

using StaticArrays

@gdg function (g::g_stata_gmm_ex8)(θ, r)
    @inbounds d = g.data[r]
    x = SVector{5,Float64}((d.private, d.chronic, d.female, d.income, 1.0))
    z = SVector{7,Float64}((d.private, d.chronic, d.female, d.age, d.black, d.hispanic, 1.0))
    return (d.docvis - exp(θ'x)) .* z
end

g = g_stata_gmm_ex8(data)

g_stata_gmm_ex8 (functor defined with @gdg for moment conditions)

Info

Using SVector from StaticArrays.jl instead of Vector avoids memory allocations while preserving the syntax for array operations.

For minimizing the criterion function and computing the variance-covariance matrix, we additionally require the derivatives of the moment conditions with respect to each parameter. They can be defined in a similar fashion as below:

@gdg function (dg::dg_stata_gmm_ex8)(θ, r)
    @inbounds d = dg.data[r]
    x = SVector{5,Float64}((d.private, d.chronic, d.female, d.income, 1.0))
    z = SVector{7,Float64}((d.private, d.chronic, d.female, d.age, d.black, d.hispanic, 1.0))
    return z .* (- exp(θ'x) .* x')
end

dg = dg_stata_gmm_ex8(data)

dg_stata_gmm_ex8 (functor defined with @gdg for moment conditions)

Warning

The users are expected to order the parameters and moment conditions consistently. An index k for a parameter should always refer to the same parameter whenever a vector of parameters is involved. A similar requirement holds for the moment conditions.

Note

The performance of these functions evaluating the moment conditions and their derivatives is important for the run time of the estimator, especially when the sample size is large. The users are recommended to profile these functions to ensure they have acceptable performance. Ideally, these functions should be non-allocating and type stable.

Specifying Variance-Covariance Estimator

A variance-covariance estimator (VCE) provides information for inference. With the iterated GMM estimator, the VCE additionally instructs how the weight matrix $\mathbf{W}^{(k)}$ is updated in each iteration.

MethodOfMoments.jl implements two VCEs:

The Eicker-Huber-White heteroskedasticity-robust VCE.
The multiway cluster-robust VCE as in Cameron et al. (2011).

Here, we specify the heteroskedasticity-robust VCE by constructing a RobustVCE, specifying the numbers of parameters, moment conditions and observations as arguments:

vce = RobustVCE(5, 7, length(data))

Heteroskedasticity-robust covariance estimator

Obtaining the Estimation Results

We are now ready to conduct the estimation:

# Specify parameter names
params = (:private, :chronic, :female, :income, :cons)
# Conduct the estimation
r = fit(IteratedGMM, Hybrid, vce, g, dg, params, 7, length(data), maxiter=2, ntasks=2)

NonlinearGMM with 7 moments and 5 parameters over 4412 observations:
  Iterated GMM estimator:
    iter   2  =>  Q(θ) = 1.58235e-03  max|θ-θlast| = 1.50185e-01
                  Jstat = 6.98        Pr(>J) = 0.0305
  Heteroskedasticity-robust covariance estimator
───────────────────────────────────────────────────────────────────────
           Estimate  Std. Error      z  Pr(>|z|)   Lower 95%  Upper 95%
───────────────────────────────────────────────────────────────────────
private   0.481712   0.161302     2.99    0.0028   0.165565    0.797858
chronic   1.07941    0.0639269   16.89    <1e-63   0.954116    1.2047
female    0.694048   0.102188     6.79    <1e-10   0.493762    0.894334
income    0.0152984  0.00267279   5.72    <1e-07   0.0100598   0.020537
cons     -0.619071   0.139927    -4.42    <1e-05  -0.893322   -0.34482
───────────────────────────────────────────────────────────────────────

We have specified the estimator by providing IteratedGMM as the first argument. For minimizing the criterion function, we have used the Hybrid solver from NonlinearSystems.jl, which is a native Julia implementation of the hybrid method from MINPACK with minor revisions. With the keyword argument maxiter=2, we only conduct two steps for the iteration. The moment conditions across observations are evaluated in parallel with two threads specified with ntasks=2. If ntasks is not specified, a default value will be determined based on the sample size.

Info

The Hybrid solver from NonlinearSystems.jl is the only solver bundled with MethodOfMoments.jl at this moment. It is possible to swap the solver, although that should be unnecessary in typical use cases.

Interface for retrieving the estimation results is defined following StatsAPI.jl. For example, to obtain the point estimates:

coef(r)

5-element Vector{Float64}:
  0.4817116025899292
  1.0794101667721612
  0.6940481040440498
  0.015298386177060077
 -0.6190709065124277

To obtain the standard errors:

stderror(r)

5-element Vector{Float64}:
 0.16130239107187994
 0.0639268930566725
 0.10218843457177068
 0.00267278817868127
 0.13992674433494878

To obtain 90% confidence intervals:

lb, ub = confint(r; level=0.9)

([0.21639277959940278, 0.9742597848681546, 0.5259630868061796, 0.010902040847283172, -0.8492299194392795], [0.7470304255804556, 1.184560548676168, 0.86213312128192, 0.019694731506836982, -0.3889118935855759])

When the parameters are over-identified, the Hansen's $J$ statistic can be retrieved:

Jstat(r)

6.981309043786015

Iterating Step by Step

Results of iterating the GMM estimator for a small number of steps can be obtained by setting the keyword argument maxiter. Furthermore, there is no need to restart the entire estimation procedure for different numbers of steps. Suppose we are interested in the results for both the one-step and two-step GMM estimators, we can first obtain the results from the first step:

# One-step GMM
r = fit(IteratedGMM, Hybrid, vce, g, dg, params, 7, length(data), maxiter=1)

NonlinearGMM with 7 moments and 5 parameters over 4412 observations:
  Iterated GMM estimator:
    iter   1  =>  Q(θ) = 7.89338e-03  max|θ-θlast| = 1.06672e+00
                  Jstat = NaN        Pr(>J) = NaN
  Heteroskedasticity-robust covariance estimator
────────────────────────────────────────────────────────────────────────
           Estimate  Std. Error      z  Pr(>|z|)   Lower 95%   Upper 95%
────────────────────────────────────────────────────────────────────────
private   0.331526   0.194234     1.71    0.0879  -0.049165    0.712218
chronic   1.06672    0.0723333   14.75    <1e-48   0.924953    1.20849
female    0.714663   0.112992     6.32    <1e-09   0.493202    0.936124
income    0.0170238  0.00268611   6.34    <1e-09   0.0117591   0.0222885
cons     -0.57512    0.1557      -3.69    0.0002  -0.880287   -0.269954
────────────────────────────────────────────────────────────────────────

After copying the estimates of interest (e.g., copy(coef(r))), we proceed to the next step using fit! that updates the results in-place:

# Two-step GMM without repeating the first step
fit!(r, maxiter=2)

NonlinearGMM with 7 moments and 5 parameters over 4412 observations:
  Iterated GMM estimator:
    iter   2  =>  Q(θ) = 1.58235e-03  max|θ-θlast| = 1.50185e-01
                  Jstat = 6.98        Pr(>J) = 0.0305
  Heteroskedasticity-robust covariance estimator
───────────────────────────────────────────────────────────────────────
           Estimate  Std. Error      z  Pr(>|z|)   Lower 95%  Upper 95%
───────────────────────────────────────────────────────────────────────
private   0.481712   0.161302     2.99    0.0028   0.165565    0.797858
chronic   1.07941    0.0639269   16.89    <1e-63   0.954116    1.2047
female    0.694048   0.102188     6.79    <1e-10   0.493762    0.894334
income    0.0152984  0.00267279   5.72    <1e-07   0.0100598   0.020537
cons     -0.619071   0.139927    -4.42    <1e-05  -0.893322   -0.34482
───────────────────────────────────────────────────────────────────────

Notice that we have reused the same result object r and set a different value for maxiter.

Info

The Hansen's $J$ statistic is not reported for one-step GMM estimators (Jstat returns NaN). This is because the initial weight matrix for the GMM criterion can be arbitrary, resulting in meaningless scaling for $Q(\theta)$.

Implementation Details

Instead of literally solving a minimization problem defined with the scalar-valued criterion function $Q(\theta)$, the estimator solves a least squares problem for the vector-valued moment conditions after adjusting the weights using the Hybrid solver designed for systems of nonlinear equations. This relies on the assumption that the weight matrix should always be positive definite and hence can be decomposed as a product of a triangular matrix and its conjugate transpose with Cholesky decomposition. This approach tends to be much faster than solving the scalar-valued problem, as it exploits more information from the Jacobian matrix of the moment conditions relative to the gradient vector for the scalar-valued problem.

1A suitable rank condition is assumed for discussing the number of moment conditions to be meaningful.