Aplicación 3.1: Evaluación y validación del modelo de regresión lineal
Ventas en una cadena de supermercados (datos de corte transversal)
En esta aplicación se analizarán las ventas de una cadena de 75 supermercados ubicados en distintas ciudades españolas. Las ventas totales en cada tienda dependen básicamente de dos variables, el precio medio de los productos que se venden en la misma y el gasto en publicidad que se realice en cada ciudad. El objetivo de la regresión estimada consiste en determinar el efecto que tendrán sobre las ventas de cada supermercado distintas políticas de precios, así como distintas opciones de gasto en publicidad.
El modelo econométrico que se utilizará para el análisis de simulación viene dado por la siguiente ecuación:
\[ V_{i} = \beta_1 + \beta_2 P_{i} + \beta_3 A_{i} + e_{i} \]
donde \(V_{i}\) representan los ingresos mensuales por ventas en el supermercado de cada ciudad, \(P_{i}\) el índice ponderado de precios de todos los productos que se venden en el mismo, y \(A_{i}\) los gastos mensuales en publicidad en la ciudad para promocionar la tienda. \(V\) y \(A\) se miden en miles de euros, mientras que \(P\) se mide en euros.
Code
# Lectura de librerías
library(tidyverse)
library(alr4)
library(gvlma)
library(performance)
library(lmtest)
library(car)
# Lectura de datos
<- read_csv("data/VENTAS_SUPER.csv")
VENTAS # Estructura de la base de datos
str(VENTAS)
spc_tbl_ [75 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ A: num [1:75] 1.3 2.9 0.8 0.7 1.5 1.3 1.8 2.4 0.7 3 ...
$ P: num [1:75] 5.69 6.49 5.63 6.22 5.02 6.41 5.85 5.41 6.24 6.2 ...
$ V: num [1:75] 73.2 71.8 62.4 67.4 89.3 70.3 73.2 86.1 81 76.4 ...
- attr(*, "spec")=
.. cols(
.. A = col_double(),
.. P = col_double(),
.. V = col_double()
.. )
- attr(*, "problems")=<externalptr>
Code
dim(VENTAS)
[1] 75 3
Code
head(VENTAS)
# A tibble: 6 × 3
A P V
<dbl> <dbl> <dbl>
1 1.3 5.69 73.2
2 2.9 6.49 71.8
3 0.8 5.63 62.4
4 0.7 6.22 67.4
5 1.5 5.02 89.3
6 1.3 6.41 70.3
Code
tail(VENTAS)
# A tibble: 6 × 3
A P V
<dbl> <dbl> <dbl>
1 2.2 6.02 73.7
2 1.7 5.73 82.2
3 0.7 5.11 74.2
4 0.7 5.71 75.4
5 2 5.45 81.3
6 2.2 6.05 75
Code
summary(VENTAS)
A P V
Min. :0.500 Min. :4.830 Min. :62.40
1st Qu.:1.100 1st Qu.:5.220 1st Qu.:73.20
Median :1.800 Median :5.690 Median :76.50
Mean :1.844 Mean :5.687 Mean :77.37
3rd Qu.:2.700 3rd Qu.:6.210 3rd Qu.:82.20
Max. :3.100 Max. :6.490 Max. :91.20
Code
# Matriz de diagramas de puntos y gráficas parciales
scatterplotMatrix(~ V + P + A, data=VENTAS)
Code
scatterplot(V ~ P, data=VENTAS,
smooth=list(smoother=loessLine, var=FALSE, lwd.smooth=3),
col="blue", regLine=list(lwd=3))
Code
scatterplot(V ~ A, data=VENTAS,
smooth=list(smoother=loessLine, var=FALSE, lwd.smooth=3),
col="blue", regLine=list(lwd=3))
Code
# Modelo de ventas
<- lm(V ~ P + A, data=VENTAS)
modelo_ventas_1 summary(modelo_ventas_1)
Call:
lm(formula = V ~ P + A, data = VENTAS)
Residuals:
Min 1Q Median 3Q Max
-13.4825 -3.1434 -0.3456 2.8754 11.3049
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 118.9136 6.3516 18.722 < 2e-16 ***
P -7.9079 1.0960 -7.215 4.42e-10 ***
A 1.8626 0.6832 2.726 0.00804 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.886 on 72 degrees of freedom
Multiple R-squared: 0.4483, Adjusted R-squared: 0.4329
F-statistic: 29.25 on 2 and 72 DF, p-value: 5.041e-10
Code
confint(modelo_ventas_1, level=.95)
2.5 % 97.5 %
(Intercept) 106.251852 131.575368
P -10.092676 -5.723032
A 0.500659 3.224510
Code
# Gráficos de efectos (Effects plots)
plot(allEffects(modelo_ventas_1), grid=TRUE, rug=TRUE)
Code
# Diagnósticos de la regresión
# Validación global de las hipótesis básicas del MRL
<- gvlma(modelo_ventas_1)
gvmodel summary(gvmodel)
Call:
lm(formula = V ~ P + A, data = VENTAS)
Residuals:
Min 1Q Median 3Q Max
-13.4825 -3.1434 -0.3456 2.8754 11.3049
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 118.9136 6.3516 18.722 < 2e-16 ***
P -7.9079 1.0960 -7.215 4.42e-10 ***
A 1.8626 0.6832 2.726 0.00804 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.886 on 72 degrees of freedom
Multiple R-squared: 0.4483, Adjusted R-squared: 0.4329
F-statistic: 29.25 on 2 and 72 DF, p-value: 5.041e-10
ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
Level of Significance = 0.05
Call:
gvlma(x = modelo_ventas_1)
Value p-value Decision
Global Stat 1.63722 0.8021 Assumptions acceptable.
Skewness 0.06397 0.8003 Assumptions acceptable.
Kurtosis 0.09499 0.7579 Assumptions acceptable.
Link Function 0.79527 0.3725 Assumptions acceptable.
Heteroscedasticity 0.68299 0.4086 Assumptions acceptable.
Code
# Chequeo general del modelo estimado
model_performance(modelo_ventas_1)
# Indices of model performance
AIC | AICc | BIC | R2 | R2 (adj.) | RMSE | Sigma
---------------------------------------------------------------
455.739 | 456.310 | 465.009 | 0.448 | 0.433 | 4.787 | 4.886
Code
check_model(modelo_ventas_1)
Code
# Especificación del modelo
# Adecuación de la forma funcional: test RESET de Ramsey
resettest(modelo_ventas_1, power=2, type="fitted")
RESET test
data: modelo_ventas_1
RESET = 0.76092, df1 = 1, df2 = 71, p-value = 0.386
Code
resettest(modelo_ventas_1, power=2:3, type="fitted")
RESET test
data: modelo_ventas_1
RESET = 0.53369, df1 = 2, df2 = 70, p-value = 0.5888
Code
# Análisis gráfico de la hipótesis de linealidad
# Residuos estandarizados (Pearson)
residualPlots(modelo_ventas_1)
Test stat Pr(>|Test stat|)
P 1.0183 0.312016
A -2.9427 0.004393 **
Tukey test 0.8723 0.383040
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
# Gráficos de variables añadidas (AV plots)
# (también denominados de regresiones parciales)
avPlots(modelo_ventas_1)
Code
# Gráficos componente+residuos
# (Component-plus-Residual o Partial-residual plots)
crPlots(modelo_ventas_1)
Code
# Modelo generalizado: especificación cuadrática en A
<- lm(V ~ P + A + I(A^2), data=VENTAS)
modelo_ventas_2 summary(modelo_ventas_2)
Call:
lm(formula = V ~ P + A + I(A^2), data = VENTAS)
Residuals:
Min 1Q Median 3Q Max
-12.2553 -3.1430 -0.0117 2.8513 11.8050
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 109.7190 6.7990 16.137 < 2e-16 ***
P -7.6400 1.0459 -7.304 3.24e-10 ***
A 12.1512 3.5562 3.417 0.00105 **
I(A^2) -2.7680 0.9406 -2.943 0.00439 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.645 on 71 degrees of freedom
Multiple R-squared: 0.5082, Adjusted R-squared: 0.4875
F-statistic: 24.46 on 3 and 71 DF, p-value: 5.6e-11
Code
# Gráficos de efectos
plot(Effect("P", modelo_ventas_2))
Code
plot(Effect("A", modelo_ventas_2))
Code
# Comparación con el modelo básico
compareCoefs(modelo_ventas_1, modelo_ventas_2)
Calls:
1: lm(formula = V ~ P + A, data = VENTAS)
2: lm(formula = V ~ P + A + I(A^2), data = VENTAS)
Model 1 Model 2
(Intercept) 118.91 109.72
SE 6.35 6.80
P -7.91 -7.64
SE 1.10 1.05
A 1.863 12.151
SE 0.683 3.556
I(A^2) -2.768
SE 0.941
Code
anova(modelo_ventas_1, modelo_ventas_2)
Analysis of Variance Table
Model 1: V ~ P + A
Model 2: V ~ P + A + I(A^2)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 72 1718.9
2 71 1532.1 1 186.86 8.6594 0.004393 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
compare_performance(modelo_ventas_1, modelo_ventas_2, rank = TRUE)
# Comparison of Model Performance Indices
Name | Model | R2 | R2 (adj.) | RMSE | Sigma | AIC weights
-------------------------------------------------------------------------
modelo_ventas_2 | lm | 0.508 | 0.487 | 4.520 | 4.645 | 0.965
modelo_ventas_1 | lm | 0.448 | 0.433 | 4.787 | 4.886 | 0.035
Name | AICc weights | BIC weights | Performance-Score
----------------------------------------------------------------
modelo_ventas_2 | 0.960 | 0.896 | 100.00%
modelo_ventas_1 | 0.040 | 0.104 | 0.00%
Code
plot(compare_performance(modelo_ventas_1, modelo_ventas_2, rank = TRUE))
Code
test_performance(modelo_ventas_1, modelo_ventas_2)
Name | Model | BF | Omega2 | p (Omega2) | LR | p (LR)
--------------------------------------------------------------------
modelo_ventas_1 | lm | | | | |
modelo_ventas_2 | lm | 8.64 | 0.09 | 0.005 | 8.63 | 0.002
Models were detected as nested (in terms of fixed parameters) and are compared in sequential order.
Code
test_wald(modelo_ventas_1, modelo_ventas_2)
Name | Model | df | df_diff | F | p
-----------------------------------------------------
modelo_ventas_1 | lm | 72 | | |
modelo_ventas_2 | lm | 71 | 1.00 | 8.66 | 0.004
Models were detected as nested (in terms of fixed parameters) and are compared in sequential order.
Code
test_bf(modelo_ventas_1, modelo_ventas_2)
Bayes Factors for Model Comparison
Model BF
[modelo_ventas_2] P + A + I(A^2) 8.64
* Against Denominator: [modelo_ventas_1] P + A
* Bayes Factor Type: BIC approximation
Code
test_vuong(modelo_ventas_1, modelo_ventas_2)
Name | Model | Omega2 | p (Omega2) | LR | p (LR)
-------------------------------------------------------------
modelo_ventas_1 | lm | | | |
modelo_ventas_2 | lm | 0.09 | 0.005 | 8.63 | 0.002
Models were detected as nested (in terms of fixed parameters) and are compared in sequential order.
Code
# Lectura de librerías
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
import statsmodels.stats as smstats
import statsmodels.stats.diagnostic as smsdiag
from statsmodels.stats.outliers_influence import reset_ramsey
from statsmodels.compat import lzip
import scipy.stats as scs
# Lectura de datos
= pd.read_csv("data/VENTAS_SUPER.csv")
VENTAS # Estructura de la base de datos
VENTAS.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 75 entries, 0 to 74
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 75 non-null float64
1 P 75 non-null float64
2 V 75 non-null float64
dtypes: float64(3)
memory usage: 1.9 KB
Code
VENTAS.head()
A P V
0 1.3 5.69 73.2
1 2.9 6.49 71.8
2 0.8 5.63 62.4
3 0.7 6.22 67.4
4 1.5 5.02 89.3
Code
VENTAS.tail()
A P V
70 1.7 5.73 82.2
71 0.7 5.11 74.2
72 0.7 5.71 75.4
73 2.0 5.45 81.3
74 2.2 6.05 75.0
Code
VENTAS.describe()
A P V
count 75.000000 75.000000 75.000000
mean 1.844000 5.687200 77.374667
std 0.831677 0.518432 6.488537
min 0.500000 4.830000 62.400000
25% 1.100000 5.220000 73.200000
50% 1.800000 5.690000 76.500000
75% 2.700000 6.210000 82.200000
max 3.100000 6.490000 91.200000
Code
# Modelo de ventas
= 'V ~ P + A'
formula = smf.ols(formula, VENTAS).fit()
modelo_ventas_1 print(modelo_ventas_1.summary())
OLS Regression Results
==============================================================================
Dep. Variable: V R-squared: 0.448
Model: OLS Adj. R-squared: 0.433
Method: Least Squares F-statistic: 29.25
Date: Sun, 09 Feb 2025 Prob (F-statistic): 5.04e-10
Time: 13:46:40 Log-Likelihood: -223.87
No. Observations: 75 AIC: 453.7
Df Residuals: 72 BIC: 460.7
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 118.9136 6.352 18.722 0.000 106.252 131.575
P -7.9079 1.096 -7.215 0.000 -10.093 -5.723
A 1.8626 0.683 2.726 0.008 0.501 3.225
==============================================================================
Omnibus: 0.535 Durbin-Watson: 2.183
Prob(Omnibus): 0.765 Jarque-Bera (JB): 0.159
Skew: -0.072 Prob(JB): 0.924
Kurtosis: 3.174 Cond. No. 69.5
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Code
# Especificación del modelo
# Adecuación de la forma funcional: test RESET de Ramsey
=2) reset_ramsey(modelo_ventas_1,degree
<class 'statsmodels.stats.contrast.ContrastResults'>
<F test: F=0.7609212886079043, p=0.3859807963132884, df_denom=71, df_num=1>
Code
=3) reset_ramsey(modelo_ventas_1,degree
<class 'statsmodels.stats.contrast.ContrastResults'>
<F test: F=0.5336864235712575, p=0.5888060456090185, df_denom=70, df_num=2>
Code
# Análisis gráfico de la hipótesis de linealidad
# Gráficos de variables añadidas (AV plots)
= sm.graphics.plot_partregress_grid(modelo_ventas_1)
fig =1.0)
fig.tight_layout(pad plt.show()
Code
# Gráficos componente+residuos
= sm.graphics.plot_ccpr_grid(modelo_ventas_1)
fig =1.0)
fig.tight_layout(pad plt.show()
Code
# Diagnósticos de regresión de una sola variable
= sm.graphics.plot_regress_exog(modelo_ventas_1, "P")
fig =1.0)
fig.tight_layout(pad plt.show()
Code
= sm.graphics.plot_regress_exog(modelo_ventas_1, "A")
fig =1.0)
fig.tight_layout(pad plt.show()
Code
# Modelo generalizado: especificación cuadrática en A
= 'V ~ P + A + + I(A**2)'
formula = smf.ols(formula, VENTAS).fit()
modelo_ventas_2 print(modelo_ventas_2.summary())
OLS Regression Results
==============================================================================
Dep. Variable: V R-squared: 0.508
Model: OLS Adj. R-squared: 0.487
Method: Least Squares F-statistic: 24.46
Date: Sun, 09 Feb 2025 Prob (F-statistic): 5.60e-11
Time: 13:46:42 Log-Likelihood: -219.55
No. Observations: 75 AIC: 447.1
Df Residuals: 71 BIC: 456.4
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 109.7190 6.799 16.137 0.000 96.162 123.276
P -7.6400 1.046 -7.304 0.000 -9.726 -5.554
A 12.1512 3.556 3.417 0.001 5.060 19.242
I(A ** 2) -2.7680 0.941 -2.943 0.004 -4.644 -0.892
==============================================================================
Omnibus: 1.004 Durbin-Watson: 2.043
Prob(Omnibus): 0.605 Jarque-Bera (JB): 0.455
Skew: -0.088 Prob(JB): 0.797
Kurtosis: 3.339 Cond. No. 101.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
El modelo APT de valoración de activos (datos de series temporales)
En esta aplicación se estimará un modelo de regresión basado en la teoría de los precios de arbitraje. Desde el punto de vista econométrico, se trata de una generalización de la especificación CAPM expuesta en la aplicación 2.1, a la que se le añaden factores macroeconómicos como variables explicativas adicionales.
Definamos la rentabilidad de un activo financiero (en porcentaje) como \(R=100×log[(p_{1}+d)/p_{0}\), donde \(p_{1}\) y \(p_{0}\) son, respectivamente, los precios de cotización del valor (título u obligación) al final y al principio de un período de tiempo y \(d\) es el dividendo cobrado (si lo hay) durante ese período. Denominemos por \(Rf\) al rendimiento de un activo libre de riesgo. Finalmente, denotemos por \(Rm\) a la rentabilidad que ofrece la cartera de mercado.
En este ejercicio se va a estimar el modelo APT con datos de series temporales de carácter mensual para el período que va desde marzo de 1986 hasta marzo de 2018. Se utilizará como rentabilidad de referencia la correspondiente a las acciones de la empresa Microsoft, que denotaremos por \(R\_MICROSOFT\). Como rentabilidad sin riesgo, se usará el tipo de interés correspondiente a las letras del tesoro estadounidenses a tres meses, \(R\_USTB3M\), Y, finalmente, como rentabilidad del mercado se tomará la asociada al índice Standard & Poor’s 500, \(R\_SP500\). Las variables básicas que deben utilizarse en el modelo de regresión son, entonces, los ‘excesos de rentabilidad’, \(erMSOFT\)=(\(R\_MICROSOFT\)-\(R\_USTB3M\)) y \(erSP\)=(\(R\_SP500\)-\(R\_USTB3M\)).
La ecuación de valoración básica del modelo APT puede definirse del siguiente modo:
\[ erMSOFT_{t} = \beta_1 + \beta_2 erSP_{t} + \gamma_1 F_{1,t} + \gamma_2 F_{2,t} + ...+ \gamma_m F_{m,t} + e_{t} \]
donde las variables \(F_{j}\) representan distintos factores macroeconómicos que pueden afectar a la rentabilidad observada del título cada período.
Code
# Lectura de librerías
library(tidyverse)
library(readxl)
# Lectura de datos
<- read_excel("data/APT_MICROSOFT.xls")
apt # Estructura de la base de datos
str(apt)
tibble [385 × 10] (S3: tbl_df/tbl/data.frame)
$ Date : POSIXct[1:385], format: "1986-03-01" "1986-04-01" ...
$ MICROSOFT: num [1:385] 0.0955 0.112 0.1215 0.1068 0.099 ...
$ SANDP : num [1:385] 239 236 247 251 236 ...
$ CPI : num [1:385] 109 109 109 110 110 ...
$ INDPRO : num [1:385] 56.5 56.6 56.7 56.5 56.8 ...
$ M1SUPPLY : num [1:385] 624 647 646 663 673 ...
$ CCREDIT : num [1:385] 607 614 622 628 634 ...
$ BMINUSA : num [1:385] 1.5 1.4 1.2 1.21 1.28 ...
$ USTB3M : num [1:385] 6.76 6.24 6.33 6.4 6 5.69 5.35 5.32 5.5 5.68 ...
$ USTB10Y : num [1:385] 7.78 7.3 7.71 7.8 7.3 7.17 7.45 7.43 7.25 7.11 ...
Code
dim(apt)
[1] 385 10
Code
head(apt)
# A tibble: 6 × 10
Date MICROSOFT SANDP CPI INDPRO M1SUPPLY CCREDIT BMINUSA
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1986-03-01 00:00:00 0.0955 239. 109. 56.5 624. 607. 1.5
2 1986-04-01 00:00:00 0.112 236. 109. 56.6 647 614. 1.4
3 1986-05-01 00:00:00 0.122 247. 109. 56.7 646. 622. 1.2
4 1986-06-01 00:00:00 0.107 251. 110. 56.5 663. 628. 1.21
5 1986-07-01 00:00:00 0.0990 236. 110. 56.8 673. 634. 1.28
6 1986-08-01 00:00:00 0.0990 253. 110. 56.7 678. 641. 1.46
# ℹ 2 more variables: USTB3M <dbl>, USTB10Y <dbl>
Code
tail(apt)
# A tibble: 6 × 10
Date MICROSOFT SANDP CPI INDPRO M1SUPPLY CCREDIT BMINUSA
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2017-10-01 00:00:00 83.2 2575. 247. 105. 3582. 3771. 0.72
2 2017-11-01 00:00:00 84.2 2585. 247. 105. 3581. 3805. 0.7
3 2017-12-01 00:00:00 85.5 2674. 247. 106. 3636. 3835. 0.71
4 2018-01-01 00:00:00 95.0 2824. 248. 105. 3635. 3843. 0.71
5 2018-02-01 00:00:00 93.8 2714. 249. 106. 3557. 3827. 0.69
6 2018-03-01 00:00:00 91.3 2641. 250. 107. 3685. 3824. 0.77
# ℹ 2 more variables: USTB3M <dbl>, USTB10Y <dbl>
Code
summary(apt[,2:10])
MICROSOFT SANDP CPI INDPRO
Min. : 0.09549 Min. : 230.3 Min. :108.6 Min. : 56.50
1st Qu.: 2.89453 1st Qu.: 459.3 1st Qu.:147.2 1st Qu.: 69.48
Median :25.72000 Median :1104.5 Median :178.8 Median : 93.00
Mean :23.30104 Mean :1066.0 Mean :181.1 Mean : 86.63
3rd Qu.:30.86000 3rd Qu.:1385.6 3rd Qu.:218.2 3rd Qu.:100.72
Max. :95.01000 Max. :2823.8 Max. :249.6 Max. :106.66
M1SUPPLY CCREDIT BMINUSA USTB3M
Min. : 624.3 Min. : 606.8 Min. :0.5500 Min. :0.010
1st Qu.:1069.3 1st Qu.: 886.2 1st Qu.:0.7200 1st Qu.:0.450
Median :1191.8 Median :1891.8 Median :0.9000 Median :3.440
Mean :1514.7 Mean :1897.8 Mean :0.9746 Mean :3.297
3rd Qu.:1716.0 3rd Qu.:2620.5 3rd Qu.:1.1300 3rd Qu.:5.290
Max. :3684.7 Max. :3843.4 Max. :3.3800 Max. :9.140
USTB10Y
Min. :1.500
1st Qu.:3.330
Median :4.910
Mean :5.075
3rd Qu.:6.740
Max. :9.520
Code
# Transformación de variables
$dspread = c(NA,diff(apt$BMINUSA))
apt$dcredit = c(NA,diff(apt$CCREDIT))
apt$dprod = c(NA,diff(apt$INDPRO))
apt$dmoney = c(NA,diff(apt$M1SUPPLY))
apt$inflation = c(NA,100*diff(log(apt$CPI)))
apt$rterm = c(NA,diff(apt$USTB10Y-apt$USTB3M))
apt$dinflation = c(NA,100*diff(apt$inflation))
apt$r_msoft = c(NA,100*diff(log(apt$MICROSOFT)))
apt$r_sp = c(NA,100*diff(log(apt$SANDP)))
apt$er_sp = c(NA,100*diff(log(apt$SANDP)))-apt$USTB3M/12
apt$er_msoft = c(NA,100*diff(log(apt$MICROSOFT)))-apt$USTB3M/12
apt# Formato a la fechas
$Date = as.Date(apt$Date)
apt# Estimación MCO del modelo APT
<- lm(er_msoft ~ er_sp + dprod + dcredit + dinflation +
APT_msft + dspread + rterm, data = apt)
dmoney # Formulación alternativa del modelo de series temporales
# apt_ts <- ts(apt, start=c(1986,3), end=c(2018,3), frequency = 12)
# APT_msft_ts <- lm(er_msoft ~ er_sp + dprod + dcredit + dinflation +
# dmoney + dspread + rterm, data = apt_ts)
# summary(APT_msft_ts)
#
# Validación global de las hipótesis básicas del MRL
library(gvlma)
<- gvlma(APT_msft)
gvmodel summary(gvmodel)
Call:
lm(formula = er_msoft ~ er_sp + dprod + dcredit + dinflation +
dmoney + dspread + rterm, data = apt)
Residuals:
Min 1Q Median 3Q Max
-36.075 -4.440 -0.403 4.616 24.480
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.326002 0.475481 2.789 0.00556 **
er_sp 1.280799 0.094354 13.574 < 2e-16 ***
dprod -0.303032 0.736881 -0.411 0.68113
dcredit -0.025364 0.027149 -0.934 0.35078
dinflation 0.021947 0.012643 1.736 0.08341 .
dmoney -0.006871 0.015568 -0.441 0.65919
dspread 2.260064 4.140284 0.546 0.58548
rterm 4.733069 1.715814 2.758 0.00609 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.845 on 375 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.3452, Adjusted R-squared: 0.333
F-statistic: 28.24 on 7 and 375 DF, p-value: < 2.2e-16
ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
Level of Significance = 0.05
Call:
gvlma(x = APT_msft)
Value p-value Decision
Global Stat 98.208657 0.000e+00 Assumptions NOT satisfied!
Skewness 0.002011 9.642e-01 Assumptions acceptable.
Kurtosis 63.503462 1.554e-15 Assumptions NOT satisfied!
Link Function 1.601550 2.057e-01 Assumptions acceptable.
Heteroscedasticity 33.101634 8.747e-09 Assumptions NOT satisfied!
Code
# Chequeo general del modelo estimado
library(performance)
model_performance(APT_msft)
# Indices of model performance
AIC | AICc | BIC | R2 | R2 (adj.) | RMSE | Sigma
------------------------------------------------------------------
2674.673 | 2675.155 | 2710.205 | 0.345 | 0.333 | 7.762 | 7.845
Code
check_model(APT_msft)
Code
# Especificación del modelo
# Contraste de significación de variables irrelevantes
library(car)
linearHypothesis(APT_msft,c("dprod=0","dcredit=0","dmoney=0","dspread=0"))
Linear hypothesis test:
dprod = 0
dcredit = 0
dmoney = 0
dspread = 0
Model 1: restricted model
Model 2: er_msoft ~ er_sp + dprod + dcredit + dinflation + dmoney + dspread +
rterm
Res.Df RSS Df Sum of Sq F Pr(>F)
1 379 23180
2 375 23078 4 101.88 0.4139 0.7986
Code
# Modelo restringido, sin las variables no significativas -> MCR
<- lm(er_msoft ~ er_sp + dinflation + rterm, data = apt)
APT_msft_r summary(APT_msft_r)
Call:
lm(formula = er_msoft ~ er_sp + dinflation + rterm, data = apt)
Residuals:
Min 1Q Median 3Q Max
-36.260 -4.330 -0.217 4.496 24.781
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.02089 0.40097 2.546 0.01129 *
er_sp 1.26628 0.09210 13.750 < 2e-16 ***
dinflation 0.02187 0.01208 1.810 0.07101 .
rterm 4.73880 1.70873 2.773 0.00582 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.821 on 379 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.3423, Adjusted R-squared: 0.3371
F-statistic: 65.75 on 3 and 379 DF, p-value: < 2.2e-16
Code
# Adecuación de la forma funcional: test RESET de Ramsey
library(lmtest)
resettest(APT_msft_r, power=2, type="fitted")
RESET test
data: APT_msft_r
RESET = 1.7054, df1 = 1, df2 = 378, p-value = 0.1924
Code
resettest(APT_msft_r, power=2:3, type="fitted")
RESET test
data: APT_msft_r
RESET = 1.373, df1 = 2, df2 = 377, p-value = 0.2546
Code
# Lectura de librerías
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
import statsmodels.stats as smstats
import statsmodels.stats.diagnostic as smsdiag
from statsmodels.stats.outliers_influence import reset_ramsey
from statsmodels.compat import lzip
import scipy.stats as scs
import matplotlib.pyplot as plt
import seaborn as sns
# Lectura de datos
= pd.read_excel('data/APT_MICROSOFT.xls', parse_dates=['Date'], index_col='Date')
apt # Estructura de la base de datos
apt.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 385 entries, 1986-03-01 to 2018-03-01
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 MICROSOFT 385 non-null float64
1 SANDP 385 non-null float64
2 CPI 385 non-null float64
3 INDPRO 385 non-null float64
4 M1SUPPLY 385 non-null float64
5 CCREDIT 385 non-null float64
6 BMINUSA 385 non-null float64
7 USTB3M 385 non-null float64
8 USTB10Y 385 non-null float64
dtypes: float64(9)
memory usage: 30.1 KB
Code
apt.head()
MICROSOFT SANDP CPI ... BMINUSA USTB3M USTB10Y
Date ...
1986-03-01 0.095486 238.899994 108.8 ... 1.50 6.76 7.78
1986-04-01 0.111979 235.520004 108.6 ... 1.40 6.24 7.30
1986-05-01 0.121528 247.350006 108.9 ... 1.20 6.33 7.71
1986-06-01 0.106771 250.839996 109.5 ... 1.21 6.40 7.80
1986-07-01 0.098958 236.119995 109.5 ... 1.28 6.00 7.30
[5 rows x 9 columns]
Code
apt.tail()
MICROSOFT SANDP CPI ... BMINUSA USTB3M USTB10Y
Date ...
2017-11-01 84.169998 2584.840088 246.669 ... 0.70 1.25 2.35
2017-12-01 85.540001 2673.610107 246.524 ... 0.71 1.34 2.40
2018-01-01 95.010002 2823.810059 247.867 ... 0.71 1.43 2.58
2018-02-01 93.769997 2713.830078 248.991 ... 0.69 1.59 2.86
2018-03-01 91.269997 2640.870117 249.554 ... 0.77 1.73 2.84
[5 rows x 9 columns]
Code
apt.describe()
MICROSOFT SANDP CPI ... BMINUSA USTB3M USTB10Y
count 385.000000 385.000000 385.000000 ... 385.000000 385.000000 385.000000
mean 23.301038 1066.036103 181.062083 ... 0.974623 3.296909 5.075403
std 19.255768 602.397165 41.136433 ... 0.382047 2.589801 2.173512
min 0.095486 230.300003 108.600000 ... 0.550000 0.010000 1.500000
25% 2.894531 459.269989 147.200000 ... 0.720000 0.450000 3.330000
50% 25.719999 1104.489990 178.800000 ... 0.900000 3.440000 4.910000
75% 30.860001 1385.589966 218.178000 ... 1.130000 5.290000 6.740000
max 95.010002 2823.810059 249.554000 ... 3.380000 9.140000 9.520000
[8 rows x 9 columns]
Code
# Transformación de variables
def LogDiff(x):
= 100*np.log(x/x.shift(1))
x_diff return x_diff
'dspread'] = apt['BMINUSA'] - apt['BMINUSA'].shift(1)
apt['dcredit'] = apt['CCREDIT'] - apt['CCREDIT'].shift(1)
apt['dprod'] = apt['INDPRO'] - apt['INDPRO'].shift(1)
apt['r_msft'] = LogDiff(apt['MICROSOFT'])
apt['r_sp'] = LogDiff(apt['SANDP'])
apt['dmoney'] = apt['M1SUPPLY'] - apt['M1SUPPLY'].shift(1)
apt['inflation'] = LogDiff(apt['CPI'])
apt['term'] = apt['USTB10Y'] - apt['USTB3M']
apt['dinflation'] = LogDiff(apt['CPI']) - LogDiff(apt['CPI']).shift(1)
apt['mustb3m'] = apt['USTB3M']/12
apt['rterm'] = (apt['USTB10Y'] - apt['USTB3M']) - (apt['USTB10Y'] - apt['USTB3M']).shift(1)
apt['er_msoft'] = LogDiff(apt['MICROSOFT']) - apt['USTB3M']/12
apt['er_sp'] = LogDiff(apt['SANDP']) - apt['USTB3M']/12
apt[# Eliminación de observaciones no disponibles (NA)
= apt.dropna()
data # Estimación MCO del modelo APT
= 'er_msoft ~ er_sp + dprod + dcredit + dinflation + dmoney + dspread + rterm'
formula = smf.ols(formula, data).fit()
APT_msft print(APT_msft.summary())
OLS Regression Results
==============================================================================
Dep. Variable: er_msoft R-squared: 0.345
Model: OLS Adj. R-squared: 0.333
Method: Least Squares F-statistic: 28.24
Date: Sun, 09 Feb 2025 Prob (F-statistic): 3.52e-31
Time: 13:46:45 Log-Likelihood: -1328.3
No. Observations: 383 AIC: 2673.
Df Residuals: 375 BIC: 2704.
Df Model: 7
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 1.3260 0.475 2.789 0.006 0.391 2.261
er_sp 1.2808 0.094 13.574 0.000 1.095 1.466
dprod -0.3030 0.737 -0.411 0.681 -1.752 1.146
dcredit -0.0254 0.027 -0.934 0.351 -0.079 0.028
dinflation 2.1947 1.264 1.736 0.083 -0.291 4.681
dmoney -0.0069 0.016 -0.441 0.659 -0.037 0.024
dspread 2.2601 4.140 0.546 0.585 -5.881 10.401
rterm 4.7331 1.716 2.758 0.006 1.359 8.107
==============================================================================
Omnibus: 21.147 Durbin-Watson: 2.097
Prob(Omnibus): 0.000 Jarque-Bera (JB): 63.505
Skew: -0.006 Prob(JB): 1.62e-14
Kurtosis: 4.995 Cond. No. 293.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Code
# Especificación del modelo
# Contraste de significación de variables irrelevantes
= 'dprod = dcredit = dmoney = dspread = 0'
H_0 = APT_msft.f_test(H_0)
F_test print(F_test)
<F test: F=0.41387855952255137, p=0.7986453783395882, df_denom=375, df_num=4>
Code
# Modelo restringido, sin las variables no significativas -> MCR
= 'er_msoft ~ er_sp + dinflation + rterm'
formula = smf.ols(formula, data).fit()
APT_msft_r print(APT_msft_r.summary())
OLS Regression Results
==============================================================================
Dep. Variable: er_msoft R-squared: 0.342
Model: OLS Adj. R-squared: 0.337
Method: Least Squares F-statistic: 65.75
Date: Sun, 09 Feb 2025 Prob (F-statistic): 2.99e-34
Time: 13:46:45 Log-Likelihood: -1329.2
No. Observations: 383 AIC: 2666.
Df Residuals: 379 BIC: 2682.
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 1.0209 0.401 2.546 0.011 0.232 1.809
er_sp 1.2663 0.092 13.750 0.000 1.085 1.447
dinflation 2.1874 1.208 1.810 0.071 -0.188 4.563
rterm 4.7388 1.709 2.773 0.006 1.379 8.099
==============================================================================
Omnibus: 21.806 Durbin-Watson: 2.084
Prob(Omnibus): 0.000 Jarque-Bera (JB): 67.158
Skew: 0.007 Prob(JB): 2.61e-15
Kurtosis: 5.051 Cond. No. 18.7
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Code
# Adecuación de la forma funcional: test RESET de Ramsey
=2) reset_ramsey(APT_msft_r,degree
<class 'statsmodels.stats.contrast.ContrastResults'>
<F test: F=1.70537226335334, p=0.19238105612636122, df_denom=378, df_num=1>
Code
=3) reset_ramsey(APT_msft_r,degree
<class 'statsmodels.stats.contrast.ContrastResults'>
<F test: F=1.3730181364024425, p=0.2546050361892768, df_denom=377, df_num=2>