Aplicación 3.4: Heteroscedasticidad
Función de demanda con varianza no constante
Empleando datos de corte transversal, en este ejemplo se va a estimar un modelo de demanda para un determinado producto A:
\[log(Q_{A,i}) = \beta_1 + \beta_2 log(P_{A,i}) + \beta_3 log(P_{B,i}) + \beta_4 log(P_{C,i}) + \beta_5 log(Y_{i}) + e_{i}\]
donde \(Q_{A}\) es la cantidad demandada del bien A analizado por el individuo i-ésimo, \(P_{A}\), \(P_{B}\) y \(P_{C}\) son, respectivamente, los precios de los bienes A, B y C en el mercado, e \(Y\) es el nivel de renta.
Una vez chequeada la hipótesis de varianza constante (homoscedasticidad), se plantea la estimación por mínimos cuadrados ponderados (MCP) de la función de demanda bajo la hipótesis de un modelo heteroscedástico asociad al logaritmo de la renta del tipo siguiente:
\[{σ_i}^2 = e^{α_0 + α_1 log(Y_i)}\]
Esta especificación para la varianza residual equivale a suponer que \({σ_i}^2 = σ^2 (Y_i)^{α_1}\), donde \(σ^2 = e^{α_0}\), y de ahí que este tipo de especificación reciba el nombre de heteroscedasticidad multiplicativa.
De forma complementaria, compararemos los resultados obtenidos por MCP con los que se consiguen al estimar la función de demanda por MCO, pero corrigiendo la matriz de covarianzas usando el método de White (MCO-HC).
Code
# Lectura de librerías
library(tidyverse)
library(car)
library(skedastic)
library(lmtest)
library(sandwich)
# Lectura de datos
<- read_csv("data/DEM_HET.csv")
DEM_HET dim(DEM_HET)
[1] 30 5
Code
summary(DEM_HET)
PA PB PC QA
Min. : 89.15 Min. : 51.04 Min. : 85.85 Min. : 93.01
1st Qu.:102.52 1st Qu.: 74.72 1st Qu.:103.85 1st Qu.: 99.65
Median :111.77 Median : 81.49 Median :130.84 Median :105.93
Mean :109.02 Mean : 81.91 Mean :127.68 Mean :109.59
3rd Qu.:114.53 3rd Qu.: 90.13 3rd Qu.:145.07 3rd Qu.:117.28
Max. :124.85 Max. :111.66 Max. :182.04 Max. :140.97
Y
Min. :100.0
1st Qu.:139.5
Median :188.4
Mean :180.6
3rd Qu.:221.0
Max. :254.9
Code
# Estimación MCO
<- lm(log(QA) ~ log(PA) + log(PB) + log(PC)+ log(Y),
modelo_MCO data = DEM_HET)
summary(modelo_MCO)
Call:
lm(formula = log(QA) ~ log(PA) + log(PB) + log(PC) + log(Y),
data = DEM_HET)
Residuals:
Min 1Q Median 3Q Max
-0.11826 -0.04692 -0.01407 0.03830 0.20191
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.2747 1.2052 3.547 0.001570 **
log(PA) -0.8901 0.2157 -4.126 0.000358 ***
log(PB) 0.5542 0.1300 4.262 0.000252 ***
log(PC) -0.3491 0.1831 -1.906 0.068154 .
log(Y) 0.7440 0.1352 5.503 1.02e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.07619 on 25 degrees of freedom
Multiple R-squared: 0.6643, Adjusted R-squared: 0.6106
F-statistic: 12.37 on 4 and 25 DF, p-value: 1.105e-05
Code
# Chequeo de la hipótesis de homoscedasticidad
# Contrastes estándar (automáticos): Z's -> variables explicativas del modelo
# Test de White
white(modelo_MCO)
# A tibble: 1 × 5
statistic p.value parameter method alternative
<dbl> <dbl> <dbl> <chr> <chr>
1 7.90 0.443 8 White's Test greater
Code
white(modelo_MCO, interactions=TRUE)
# A tibble: 1 × 5
statistic p.value parameter method alternative
<dbl> <dbl> <dbl> <chr> <chr>
1 12.9 0.535 14 White's Test greater
Code
# Test de Glejser
glejser(modelo_MCO)
# A tibble: 1 × 4
statistic p.value parameter alternative
<dbl> <dbl> <dbl> <chr>
1 6.65 0.156 4 greater
Code
# Test de Breusch-Pagan
breusch_pagan(modelo_MCO)
# A tibble: 1 × 5
statistic p.value parameter method alternative
<dbl> <dbl> <dbl> <chr> <chr>
1 4.38 0.357 4 Koenker (studentised) greater
Code
# Contrastes específicos: Z's seleccionadas
# Test de Breusch-Pagan (manual)
$resid2 = resid(modelo_MCO)^2
DEM_HET# Análisis gráfico
ggplot(DEM_HET, aes(x = log(Y), y = resid2)) +
geom_point(color = "blue") +
geom_smooth(method='lm', formula = y~x, se = TRUE,
color = "blue", linetype = "dashed") +
xlab("log(Y)") +
ylab("resid2") +
theme_minimal()
Code
# Regresión auxiliar
summary(modelo_resid2 <- lm(resid2 ~ log(Y), data=DEM_HET))
Call:
lm(formula = resid2 ~ log(Y), data = DEM_HET)
Residuals:
Min 1Q Median 3Q Max
-0.007864 -0.004221 -0.002092 0.001574 0.032720
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.05395 0.02832 -1.905 0.0671 .
log(Y) 0.01139 0.00548 2.078 0.0469 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.008199 on 28 degrees of freedom
Multiple R-squared: 0.1337, Adjusted R-squared: 0.1027
F-statistic: 4.32 on 1 and 28 DF, p-value: 0.04695
Code
<- nobs(modelo_resid2)
N <- 1 # Número Z's de la regresión auxiliar, sin incluir la constante
p <- summary(modelo_resid2)
sm_resid2 <- sm_resid2$r.squared
R2_m_resid2 <- N*R2_m_resid2
BP # Contraste Chi-cuadrado
<- 1-pchisq(BP, p)
pval BP ; pval
[1] 4.009848
[1] 0.04523524
Code
# Test de Breusch-Pagan (automático)
bptest(modelo_MCO, varformula = ~ log(Y), data=DEM_HET)
studentized Breusch-Pagan test
data: modelo_MCO
BP = 4.0098, df = 1, p-value = 0.04524
Code
# Test de Breusch-Pagan robusto (variante de Koenker)
bptest(modelo_MCO, varformula = ~ log(Y), data=DEM_HET, studentize = FALSE)
Breusch-Pagan test
data: modelo_MCO
BP = 6.2053, df = 1, p-value = 0.01274
Code
# Corrección básica de la heteroscedasticidad: matriz de covarianzas robusta
# Corrección de White (librería sandwich)
coeftest(modelo_MCO, vcov. = vcovHC(modelo_MCO,type="HC1"))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.27468 0.89931 4.7533 7.062e-05 ***
log(PA) -0.89012 0.13561 -6.5639 7.083e-07 ***
log(PB) 0.55419 0.08331 6.6521 5.704e-07 ***
log(PC) -0.34909 0.15058 -2.3182 0.02891 *
log(Y) 0.74395 0.11879 6.2627 1.494e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
# Otra versión, con idéntico resultado (librería car)
coeftest(modelo_MCO, vcov.=hccm(modelo_MCO, type = "hc1"))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.27468 0.89931 4.7533 7.062e-05 ***
log(PA) -0.89012 0.13561 -6.5639 7.083e-07 ***
log(PB) 0.55419 0.08331 6.6521 5.704e-07 ***
log(PC) -0.34909 0.15058 -2.3182 0.02891 *
log(Y) 0.74395 0.11879 6.2627 1.494e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
# Corrección avanzada: mínimos cuadrados ponderados (MCP <-> FGLS)
# Regresión auxiliar para el logaritmo de la varianza estimada
summary(modelo_l_resid2 <- lm(log(resid2) ~ log(Y), data=DEM_HET))
Call:
lm(formula = log(resid2) ~ log(Y), data = DEM_HET)
Residuals:
Min 1Q Median 3Q Max
-5.5616 -0.6447 0.1990 1.7185 3.2886
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -26.658 7.565 -3.524 0.00148 **
log(Y) 3.846 1.464 2.628 0.01379 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.19 on 28 degrees of freedom
Multiple R-squared: 0.1978, Adjusted R-squared: 0.1692
F-statistic: 6.905 on 1 and 28 DF, p-value: 0.01379
Code
# Estimación de la varianza residual (no constante)
<- exp(fitted(modelo_l_resid2))
sigma2 # MCO con ponderaciones
summary(modelo_MCP <- lm(log(QA) ~ log(PA) + log(PB) + log(PC)+ log(Y),
weights = 1/sigma2, data = DEM_HET))
Call:
lm(formula = log(QA) ~ log(PA) + log(PB) + log(PC) + log(Y),
data = DEM_HET, weights = 1/sigma2)
Weighted Residuals:
Min 1Q Median 3Q Max
-5.2922 -0.9431 -0.1967 1.4063 3.8662
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.48077 0.87248 5.136 2.62e-05 ***
log(PA) -0.90716 0.14931 -6.076 2.38e-06 ***
log(PB) 0.54594 0.10265 5.318 1.64e-05 ***
log(PC) -0.24882 0.12394 -2.007 0.0556 .
log(Y) 0.63130 0.08844 7.138 1.76e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.026 on 25 degrees of freedom
Multiple R-squared: 0.7734, Adjusted R-squared: 0.7372
F-statistic: 21.33 on 4 and 25 DF, p-value: 9.299e-08
Code
# Lectura de librerías
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
import statsmodels.stats as smstats
import statsmodels.stats.diagnostic as smsdiag
import matplotlib.pyplot as plt
import seaborn as sns
# Lectura de datos
= pd.read_csv('data/DEM_HET.csv')
DEM_HET DEM_HET.describe()
PA PB PC QA Y
count 30.000000 30.000000 30.000000 30.000000 30.000000
mean 109.024334 81.911667 127.679001 109.594333 180.644667
std 8.747229 12.729947 24.977371 13.852708 46.540796
min 89.150002 51.040001 85.849998 93.010002 100.000000
25% 102.519999 74.720001 103.850002 99.652502 139.475002
50% 111.770000 81.489998 130.834999 105.930000 188.370003
75% 114.527502 90.132498 145.067497 117.282501 221.032501
max 124.849998 111.660004 182.039993 140.970001 254.949997
Code
# Estimación MCO
= smf.ols(formula =
modelo1 "np.log(QA) ~ np.log(PA) + np.log(PB) + np.log(PC) + np.log(Y)",
= DEM_HET)
data = modelo1.fit()
modelo_MCO print(modelo_MCO.summary())
OLS Regression Results
==============================================================================
Dep. Variable: np.log(QA) R-squared: 0.664
Model: OLS Adj. R-squared: 0.611
Method: Least Squares F-statistic: 12.37
Date: Sun, 09 Feb 2025 Prob (F-statistic): 1.10e-05
Time: 14:12:15 Log-Likelihood: 37.403
No. Observations: 30 AIC: -64.81
Df Residuals: 25 BIC: -57.80
Df Model: 4
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 4.2747 1.205 3.547 0.002 1.792 6.757
np.log(PA) -0.8901 0.216 -4.126 0.000 -1.334 -0.446
np.log(PB) 0.5542 0.130 4.262 0.000 0.286 0.822
np.log(PC) -0.3491 0.183 -1.906 0.068 -0.726 0.028
np.log(Y) 0.7440 0.135 5.503 0.000 0.466 1.022
==============================================================================
Omnibus: 8.115 Durbin-Watson: 2.471
Prob(Omnibus): 0.017 Jarque-Bera (JB): 6.392
Skew: 0.989 Prob(JB): 0.0409
Kurtosis: 4.095 Cond. No. 840.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Code
# Contrastes de heteroscedasticidad
= ['LM statistic', 'Chi^2 p-val', 'F statistic', 'F p-val']
name # Test de White
= smsdiag.het_white(modelo_MCO.resid, modelo_MCO.model.exog)
W_test print(pd.DataFrame([np.round(W_test, 8)], columns=name))
LM statistic Chi^2 p-val F statistic F p-val
0 12.888845 0.535294 0.807045 0.653204
Code
# Test de Breusch-Pagan
= smsdiag.het_breuschpagan(modelo_MCO.resid, modelo_MCO.model.exog)
BP_test print(pd.DataFrame([np.round(BP_test, 8)], columns=name))
LM statistic Chi^2 p-val F statistic F p-val
0 4.382204 0.356744 1.069131 0.392569
Code
# Estimación MCO-HC
= smf.ols(formula =
modelo1 "np.log(QA) ~ np.log(PA) + np.log(PB) + np.log(PC) + np.log(Y)",
= DEM_HET)
data # HC1 -> corrige la fórmula de White por grados de libertad
= modelo1.fit(cov_type='HC1')
modelo_MCO_HC1 print(modelo_MCO_HC1.summary())
OLS Regression Results
==============================================================================
Dep. Variable: np.log(QA) R-squared: 0.664
Model: OLS Adj. R-squared: 0.611
Method: Least Squares F-statistic: 26.61
Date: Sun, 09 Feb 2025 Prob (F-statistic): 1.09e-08
Time: 14:12:15 Log-Likelihood: 37.403
No. Observations: 30 AIC: -64.81
Df Residuals: 25 BIC: -57.80
Df Model: 4
Covariance Type: HC1
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 4.2747 0.899 4.753 0.000 2.512 6.037
np.log(PA) -0.8901 0.136 -6.564 0.000 -1.156 -0.624
np.log(PB) 0.5542 0.083 6.652 0.000 0.391 0.717
np.log(PC) -0.3491 0.151 -2.318 0.020 -0.644 -0.054
np.log(Y) 0.7440 0.119 6.263 0.000 0.511 0.977
==============================================================================
Omnibus: 8.115 Durbin-Watson: 2.471
Prob(Omnibus): 0.017 Jarque-Bera (JB): 6.392
Skew: 0.989 Prob(JB): 0.0409
Kurtosis: 4.095 Cond. No. 840.
==============================================================================
Notes:
[1] Standard Errors are heteroscedasticity robust (HC1)
Code
# HC2 -> corrige la fórmula de White por grados de libertad y tamaño muestral pequeño
= modelo1.fit(cov_type='HC2')
modelo_MCO_HC2 print(modelo_MCO_HC2.summary())
OLS Regression Results
==============================================================================
Dep. Variable: np.log(QA) R-squared: 0.664
Model: OLS Adj. R-squared: 0.611
Method: Least Squares F-statistic: 25.98
Date: Sun, 09 Feb 2025 Prob (F-statistic): 1.38e-08
Time: 14:12:15 Log-Likelihood: 37.403
No. Observations: 30 AIC: -64.81
Df Residuals: 25 BIC: -57.80
Df Model: 4
Covariance Type: HC2
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 4.2747 0.904 4.727 0.000 2.502 6.047
np.log(PA) -0.8901 0.138 -6.470 0.000 -1.160 -0.620
np.log(PB) 0.5542 0.085 6.488 0.000 0.387 0.722
np.log(PC) -0.3491 0.151 -2.308 0.021 -0.646 -0.053
np.log(Y) 0.7440 0.119 6.275 0.000 0.512 0.976
==============================================================================
Omnibus: 8.115 Durbin-Watson: 2.471
Prob(Omnibus): 0.017 Jarque-Bera (JB): 6.392
Skew: 0.989 Prob(JB): 0.0409
Kurtosis: 4.095 Cond. No. 840.
==============================================================================
Notes:
[1] Standard Errors are heteroscedasticity robust (HC2)
Code
# Estimación MCP (FGLS)
# Análisis gráfico
'resid2'] = (modelo_MCO.resid)**2
DEM_HET[=np.log(DEM_HET['Y']), y = DEM_HET['resid2'])
sns.regplot(x plt.show()
Code
# Regresión auxiliar para el logaritmo de la varianza estimada
= smf.ols(formula = "np.log(resid2) ~ np.log(Y)", data = DEM_HET)
reg_aux_resid2 = reg_aux_resid2.fit()
modelo_l_resid2 print(modelo_l_resid2.summary())
OLS Regression Results
==============================================================================
Dep. Variable: np.log(resid2) R-squared: 0.198
Model: OLS Adj. R-squared: 0.169
Method: Least Squares F-statistic: 6.905
Date: Sun, 09 Feb 2025 Prob (F-statistic): 0.0138
Time: 14:12:15 Log-Likelihood: -65.049
No. Observations: 30 AIC: 134.1
Df Residuals: 28 BIC: 136.9
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept -26.6579 7.565 -3.524 0.001 -42.154 -11.161
np.log(Y) 3.8463 1.464 2.628 0.014 0.848 6.845
==============================================================================
Omnibus: 3.869 Durbin-Watson: 2.164
Prob(Omnibus): 0.144 Jarque-Bera (JB): 2.972
Skew: -0.771 Prob(JB): 0.226
Kurtosis: 3.035 Cond. No. 101.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Code
# Estimación de la varianza residual (no constante)
= np.exp(modelo_l_resid2.fittedvalues)
sigma2 =list(1/sigma2)
w# MCO con ponderaciones
= smf.wls(formula =
modelo2 "np.log(QA) ~ np.log(PA) + np.log(PB) + np.log(PC) + np.log(Y)",
= w , data = DEM_HET)
weights = modelo2.fit()
modelo_MCP print(modelo_MCP.summary())
WLS Regression Results
==============================================================================
Dep. Variable: np.log(QA) R-squared: 0.773
Model: WLS Adj. R-squared: 0.737
Method: Least Squares F-statistic: 21.33
Date: Sun, 09 Feb 2025 Prob (F-statistic): 9.30e-08
Time: 14:12:15 Log-Likelihood: 41.087
No. Observations: 30 AIC: -72.17
Df Residuals: 25 BIC: -65.17
Df Model: 4
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 4.4808 0.872 5.136 0.000 2.684 6.278
np.log(PA) -0.9072 0.149 -6.076 0.000 -1.215 -0.600
np.log(PB) 0.5459 0.103 5.318 0.000 0.335 0.757
np.log(PC) -0.2488 0.124 -2.007 0.056 -0.504 0.006
np.log(Y) 0.6313 0.088 7.138 0.000 0.449 0.813
==============================================================================
Omnibus: 2.395 Durbin-Watson: 2.584
Prob(Omnibus): 0.302 Jarque-Bera (JB): 1.170
Skew: -0.196 Prob(JB): 0.557
Kurtosis: 3.884 Cond. No. 899.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.