Aplicación 3.1: Evaluación y validación del modelo de regresión lineal

Ventas en una cadena de supermercados (datos de corte transversal)

En esta aplicación se analizarán las ventas de una cadena de 75 supermercados ubicados en distintas ciudades españolas. Las ventas totales en cada tienda dependen básicamente de dos variables, el precio medio de los productos que se venden en la misma y el gasto en publicidad que se realice en cada ciudad. El objetivo de la regresión estimada consiste en determinar el efecto que tendrán sobre las ventas de cada supermercado distintas políticas de precios, así como distintas opciones de gasto en publicidad.

El modelo econométrico que se utilizará para el análisis de simulación viene dado por la siguiente ecuación:

\[ V_{i} = \beta_1 + \beta_2 P_{i} + \beta_3 A_{i} + e_{i} \]

donde \(V_{i}\) representan los ingresos mensuales por ventas en el supermercado de cada ciudad, \(P_{i}\) el índice ponderado de precios de todos los productos que se venden en el mismo, y \(A_{i}\) los gastos mensuales en publicidad en la ciudad para promocionar la tienda. \(V\) y \(A\) se miden en miles de euros, mientras que \(P\) se mide en euros.

Code
# Lectura de librerías
library(tidyverse)
library(alr4)
library(gvlma)
library(performance)
library(lmtest)
library(car)
# Lectura de datos
VENTAS <- read_csv("data/VENTAS_SUPER.csv")
# Estructura de la base de datos
str(VENTAS)
spc_tbl_ [75 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ A: num [1:75] 1.3 2.9 0.8 0.7 1.5 1.3 1.8 2.4 0.7 3 ...
 $ P: num [1:75] 5.69 6.49 5.63 6.22 5.02 6.41 5.85 5.41 6.24 6.2 ...
 $ V: num [1:75] 73.2 71.8 62.4 67.4 89.3 70.3 73.2 86.1 81 76.4 ...
 - attr(*, "spec")=
  .. cols(
  ..   A = col_double(),
  ..   P = col_double(),
  ..   V = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 
Code
dim(VENTAS)
[1] 75  3
Code
head(VENTAS)
# A tibble: 6 × 3
      A     P     V
  <dbl> <dbl> <dbl>
1   1.3  5.69  73.2
2   2.9  6.49  71.8
3   0.8  5.63  62.4
4   0.7  6.22  67.4
5   1.5  5.02  89.3
6   1.3  6.41  70.3
Code
tail(VENTAS)
# A tibble: 6 × 3
      A     P     V
  <dbl> <dbl> <dbl>
1   2.2  6.02  73.7
2   1.7  5.73  82.2
3   0.7  5.11  74.2
4   0.7  5.71  75.4
5   2    5.45  81.3
6   2.2  6.05  75  
Code
summary(VENTAS)
       A               P               V        
 Min.   :0.500   Min.   :4.830   Min.   :62.40  
 1st Qu.:1.100   1st Qu.:5.220   1st Qu.:73.20  
 Median :1.800   Median :5.690   Median :76.50  
 Mean   :1.844   Mean   :5.687   Mean   :77.37  
 3rd Qu.:2.700   3rd Qu.:6.210   3rd Qu.:82.20  
 Max.   :3.100   Max.   :6.490   Max.   :91.20  
Code
# Matriz de diagramas de puntos y gráficas parciales
scatterplotMatrix(~ V + P + A, data=VENTAS)

Code
scatterplot(V ~ P, data=VENTAS, 
            smooth=list(smoother=loessLine, var=FALSE, lwd.smooth=3), 
            col="blue", regLine=list(lwd=3))

Code
scatterplot(V ~ A, data=VENTAS, 
            smooth=list(smoother=loessLine, var=FALSE, lwd.smooth=3), 
            col="blue", regLine=list(lwd=3))

Code
# Modelo de ventas
modelo_ventas_1 <- lm(V ~ P + A, data=VENTAS)
summary(modelo_ventas_1)

Call:
lm(formula = V ~ P + A, data = VENTAS)

Residuals:
     Min       1Q   Median       3Q      Max 
-13.4825  -3.1434  -0.3456   2.8754  11.3049 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 118.9136     6.3516  18.722  < 2e-16 ***
P            -7.9079     1.0960  -7.215 4.42e-10 ***
A             1.8626     0.6832   2.726  0.00804 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.886 on 72 degrees of freedom
Multiple R-squared:  0.4483,    Adjusted R-squared:  0.4329 
F-statistic: 29.25 on 2 and 72 DF,  p-value: 5.041e-10
Code
confint(modelo_ventas_1, level=.95)
                 2.5 %     97.5 %
(Intercept) 106.251852 131.575368
P           -10.092676  -5.723032
A             0.500659   3.224510
Code
# Gráficos de efectos (Effects plots)
plot(allEffects(modelo_ventas_1), grid=TRUE, rug=TRUE)

Code
# Diagnósticos de la regresión
# Validación global de las hipótesis básicas del MRL
gvmodel <- gvlma(modelo_ventas_1)
summary(gvmodel)

Call:
lm(formula = V ~ P + A, data = VENTAS)

Residuals:
     Min       1Q   Median       3Q      Max 
-13.4825  -3.1434  -0.3456   2.8754  11.3049 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 118.9136     6.3516  18.722  < 2e-16 ***
P            -7.9079     1.0960  -7.215 4.42e-10 ***
A             1.8626     0.6832   2.726  0.00804 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.886 on 72 degrees of freedom
Multiple R-squared:  0.4483,    Adjusted R-squared:  0.4329 
F-statistic: 29.25 on 2 and 72 DF,  p-value: 5.041e-10


ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
Level of Significance =  0.05 

Call:
 gvlma(x = modelo_ventas_1) 

                     Value p-value                Decision
Global Stat        1.63722  0.8021 Assumptions acceptable.
Skewness           0.06397  0.8003 Assumptions acceptable.
Kurtosis           0.09499  0.7579 Assumptions acceptable.
Link Function      0.79527  0.3725 Assumptions acceptable.
Heteroscedasticity 0.68299  0.4086 Assumptions acceptable.
Code
# Chequeo general del modelo estimado
model_performance(modelo_ventas_1)
# Indices of model performance

AIC     |    AICc |     BIC |    R2 | R2 (adj.) |  RMSE | Sigma
---------------------------------------------------------------
455.739 | 456.310 | 465.009 | 0.448 |     0.433 | 4.787 | 4.886
Code
check_model(modelo_ventas_1)

Code
# Especificación del modelo
# Adecuación de la forma funcional: test RESET de Ramsey
resettest(modelo_ventas_1, power=2, type="fitted")

    RESET test

data:  modelo_ventas_1
RESET = 0.76092, df1 = 1, df2 = 71, p-value = 0.386
Code
resettest(modelo_ventas_1, power=2:3, type="fitted")

    RESET test

data:  modelo_ventas_1
RESET = 0.53369, df1 = 2, df2 = 70, p-value = 0.5888
Code
# Análisis gráfico de la hipótesis de linealidad
# Residuos estandarizados (Pearson) 
residualPlots(modelo_ventas_1)

           Test stat Pr(>|Test stat|)   
P             1.0183         0.312016   
A            -2.9427         0.004393 **
Tukey test    0.8723         0.383040   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
# Gráficos de variables añadidas (AV plots)
# (también denominados de regresiones parciales)
avPlots(modelo_ventas_1) 

Code
# Gráficos componente+residuos
# (Component-plus-Residual o Partial-residual plots)
crPlots(modelo_ventas_1) 

Code
# Modelo generalizado: especificación cuadrática en A
modelo_ventas_2 <- lm(V ~ P + A + I(A^2), data=VENTAS)
summary(modelo_ventas_2)

Call:
lm(formula = V ~ P + A + I(A^2), data = VENTAS)

Residuals:
     Min       1Q   Median       3Q      Max 
-12.2553  -3.1430  -0.0117   2.8513  11.8050 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 109.7190     6.7990  16.137  < 2e-16 ***
P            -7.6400     1.0459  -7.304 3.24e-10 ***
A            12.1512     3.5562   3.417  0.00105 ** 
I(A^2)       -2.7680     0.9406  -2.943  0.00439 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.645 on 71 degrees of freedom
Multiple R-squared:  0.5082,    Adjusted R-squared:  0.4875 
F-statistic: 24.46 on 3 and 71 DF,  p-value: 5.6e-11
Code
# Gráficos de efectos
plot(Effect("P", modelo_ventas_2))

Code
plot(Effect("A", modelo_ventas_2))

Code
# Comparación con el modelo básico
compareCoefs(modelo_ventas_1, modelo_ventas_2)
Calls:
1: lm(formula = V ~ P + A, data = VENTAS)
2: lm(formula = V ~ P + A + I(A^2), data = VENTAS)

            Model 1 Model 2
(Intercept)  118.91  109.72
SE             6.35    6.80
                           
P             -7.91   -7.64
SE             1.10    1.05
                           
A             1.863  12.151
SE            0.683   3.556
                           
I(A^2)               -2.768
SE                    0.941
                           
Code
anova(modelo_ventas_1, modelo_ventas_2)
Analysis of Variance Table

Model 1: V ~ P + A
Model 2: V ~ P + A + I(A^2)
  Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
1     72 1718.9                                
2     71 1532.1  1    186.86 8.6594 0.004393 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
compare_performance(modelo_ventas_1, modelo_ventas_2, rank = TRUE)
# Comparison of Model Performance Indices

Name            | Model |    R2 | R2 (adj.) |  RMSE | Sigma | AIC weights
-------------------------------------------------------------------------
modelo_ventas_2 |    lm | 0.508 |     0.487 | 4.520 | 4.645 |       0.965
modelo_ventas_1 |    lm | 0.448 |     0.433 | 4.787 | 4.886 |       0.035

Name            | AICc weights | BIC weights | Performance-Score
----------------------------------------------------------------
modelo_ventas_2 |        0.960 |       0.896 |           100.00%
modelo_ventas_1 |        0.040 |       0.104 |             0.00%
Code
plot(compare_performance(modelo_ventas_1, modelo_ventas_2, rank = TRUE))

Code
test_performance(modelo_ventas_1, modelo_ventas_2)
Name            | Model |   BF | Omega2 | p (Omega2) |   LR | p (LR)
--------------------------------------------------------------------
modelo_ventas_1 |    lm |      |        |            |      |       
modelo_ventas_2 |    lm | 8.64 |   0.09 |      0.005 | 8.63 |  0.002
Models were detected as nested (in terms of fixed parameters) and are compared in sequential order.
Code
test_wald(modelo_ventas_1, modelo_ventas_2)
Name            | Model | df | df_diff |    F |     p
-----------------------------------------------------
modelo_ventas_1 |    lm | 72 |         |      |      
modelo_ventas_2 |    lm | 71 |    1.00 | 8.66 | 0.004
Models were detected as nested (in terms of fixed parameters) and are compared in sequential order.
Code
test_bf(modelo_ventas_1, modelo_ventas_2)
Bayes Factors for Model Comparison

                  Model            BF
[modelo_ventas_2] P + A + I(A^2) 8.64

* Against Denominator: [modelo_ventas_1] P + A
*   Bayes Factor Type: BIC approximation
Code
test_vuong(modelo_ventas_1, modelo_ventas_2)
Name            | Model | Omega2 | p (Omega2) |   LR | p (LR)
-------------------------------------------------------------
modelo_ventas_1 |    lm |        |            |      |       
modelo_ventas_2 |    lm |   0.09 |      0.005 | 8.63 |  0.002
Models were detected as nested (in terms of fixed parameters) and are compared in sequential order.
Code
# Lectura de librerías
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
import statsmodels.stats as smstats
import statsmodels.stats.diagnostic as smsdiag
from statsmodels.stats.outliers_influence import reset_ramsey
from statsmodels.compat import lzip
import scipy.stats as scs

# Lectura de datos
VENTAS = pd.read_csv("data/VENTAS_SUPER.csv")
# Estructura de la base de datos
VENTAS.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 75 entries, 0 to 74
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   A       75 non-null     float64
 1   P       75 non-null     float64
 2   V       75 non-null     float64
dtypes: float64(3)
memory usage: 1.9 KB
Code
VENTAS.head()
     A     P     V
0  1.3  5.69  73.2
1  2.9  6.49  71.8
2  0.8  5.63  62.4
3  0.7  6.22  67.4
4  1.5  5.02  89.3
Code
VENTAS.tail()
      A     P     V
70  1.7  5.73  82.2
71  0.7  5.11  74.2
72  0.7  5.71  75.4
73  2.0  5.45  81.3
74  2.2  6.05  75.0
Code
VENTAS.describe()
               A          P          V
count  75.000000  75.000000  75.000000
mean    1.844000   5.687200  77.374667
std     0.831677   0.518432   6.488537
min     0.500000   4.830000  62.400000
25%     1.100000   5.220000  73.200000
50%     1.800000   5.690000  76.500000
75%     2.700000   6.210000  82.200000
max     3.100000   6.490000  91.200000
Code
# Modelo de ventas
formula = 'V ~ P + A'
modelo_ventas_1 = smf.ols(formula, VENTAS).fit()
print(modelo_ventas_1.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      V   R-squared:                       0.448
Model:                            OLS   Adj. R-squared:                  0.433
Method:                 Least Squares   F-statistic:                     29.25
Date:                Sun, 09 Feb 2025   Prob (F-statistic):           5.04e-10
Time:                        13:46:40   Log-Likelihood:                -223.87
No. Observations:                  75   AIC:                             453.7
Df Residuals:                      72   BIC:                             460.7
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    118.9136      6.352     18.722      0.000     106.252     131.575
P             -7.9079      1.096     -7.215      0.000     -10.093      -5.723
A              1.8626      0.683      2.726      0.008       0.501       3.225
==============================================================================
Omnibus:                        0.535   Durbin-Watson:                   2.183
Prob(Omnibus):                  0.765   Jarque-Bera (JB):                0.159
Skew:                          -0.072   Prob(JB):                        0.924
Kurtosis:                       3.174   Cond. No.                         69.5
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Code
# Especificación del modelo
# Adecuación de la forma funcional: test RESET de Ramsey
reset_ramsey(modelo_ventas_1,degree=2)
<class 'statsmodels.stats.contrast.ContrastResults'>
<F test: F=0.7609212886079043, p=0.3859807963132884, df_denom=71, df_num=1>
Code
reset_ramsey(modelo_ventas_1,degree=3)
<class 'statsmodels.stats.contrast.ContrastResults'>
<F test: F=0.5336864235712575, p=0.5888060456090185, df_denom=70, df_num=2>
Code
# Análisis gráfico de la hipótesis de linealidad
# Gráficos de variables añadidas (AV plots)
fig = sm.graphics.plot_partregress_grid(modelo_ventas_1)
fig.tight_layout(pad=1.0)
plt.show()

Code
# Gráficos componente+residuos
fig = sm.graphics.plot_ccpr_grid(modelo_ventas_1)
fig.tight_layout(pad=1.0)
plt.show()

Code
# Diagnósticos de regresión de una sola variable
fig = sm.graphics.plot_regress_exog(modelo_ventas_1, "P")
fig.tight_layout(pad=1.0)
plt.show()

Code
fig = sm.graphics.plot_regress_exog(modelo_ventas_1, "A")
fig.tight_layout(pad=1.0)
plt.show()

Code
# Modelo generalizado: especificación cuadrática en A
formula = 'V ~ P + A + + I(A**2)'
modelo_ventas_2 = smf.ols(formula, VENTAS).fit()
print(modelo_ventas_2.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      V   R-squared:                       0.508
Model:                            OLS   Adj. R-squared:                  0.487
Method:                 Least Squares   F-statistic:                     24.46
Date:                Sun, 09 Feb 2025   Prob (F-statistic):           5.60e-11
Time:                        13:46:42   Log-Likelihood:                -219.55
No. Observations:                  75   AIC:                             447.1
Df Residuals:                      71   BIC:                             456.4
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    109.7190      6.799     16.137      0.000      96.162     123.276
P             -7.6400      1.046     -7.304      0.000      -9.726      -5.554
A             12.1512      3.556      3.417      0.001       5.060      19.242
I(A ** 2)     -2.7680      0.941     -2.943      0.004      -4.644      -0.892
==============================================================================
Omnibus:                        1.004   Durbin-Watson:                   2.043
Prob(Omnibus):                  0.605   Jarque-Bera (JB):                0.455
Skew:                          -0.088   Prob(JB):                        0.797
Kurtosis:                       3.339   Cond. No.                         101.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

El modelo APT de valoración de activos (datos de series temporales)

En esta aplicación se estimará un modelo de regresión basado en la teoría de los precios de arbitraje. Desde el punto de vista econométrico, se trata de una generalización de la especificación CAPM expuesta en la aplicación 2.1, a la que se le añaden factores macroeconómicos como variables explicativas adicionales.

Definamos la rentabilidad de un activo financiero (en porcentaje) como \(R=100×log[⁡(p_{1}+d)/p_{0}\), donde \(p_{1}\) y \(p_{0}\) son, respectivamente, los precios de cotización del valor (título u obligación) al final y al principio de un período de tiempo y \(d\) es el dividendo cobrado (si lo hay) durante ese período. Denominemos por \(Rf\) al rendimiento de un activo libre de riesgo. Finalmente, denotemos por \(Rm\) a la rentabilidad que ofrece la cartera de mercado.

En este ejercicio se va a estimar el modelo APT con datos de series temporales de carácter mensual para el período que va desde marzo de 1986 hasta marzo de 2018. Se utilizará como rentabilidad de referencia la correspondiente a las acciones de la empresa Microsoft, que denotaremos por \(R\_MICROSOFT\). Como rentabilidad sin riesgo, se usará el tipo de interés correspondiente a las letras del tesoro estadounidenses a tres meses, \(R\_USTB3M\), Y, finalmente, como rentabilidad del mercado se tomará la asociada al índice Standard & Poor’s 500, \(R\_SP500\). Las variables básicas que deben utilizarse en el modelo de regresión son, entonces, los ‘excesos de rentabilidad’, \(erMSOFT\)=(\(R\_MICROSOFT\)-\(R\_USTB3M\)) y \(erSP\)=(\(R\_SP500\)-\(R\_USTB3M\)).

La ecuación de valoración básica del modelo APT puede definirse del siguiente modo:

\[ erMSOFT_{t} = \beta_1 + \beta_2 erSP_{t} + \gamma_1 F_{1,t} + \gamma_2 F_{2,t} + ...+ \gamma_m F_{m,t} + e_{t} \]

donde las variables \(F_{j}\) representan distintos factores macroeconómicos que pueden afectar a la rentabilidad observada del título cada período.

Code
# Lectura de librerías
library(tidyverse)
library(readxl)
# Lectura de datos
apt <- read_excel("data/APT_MICROSOFT.xls")
# Estructura de la base de datos
str(apt)
tibble [385 × 10] (S3: tbl_df/tbl/data.frame)
 $ Date     : POSIXct[1:385], format: "1986-03-01" "1986-04-01" ...
 $ MICROSOFT: num [1:385] 0.0955 0.112 0.1215 0.1068 0.099 ...
 $ SANDP    : num [1:385] 239 236 247 251 236 ...
 $ CPI      : num [1:385] 109 109 109 110 110 ...
 $ INDPRO   : num [1:385] 56.5 56.6 56.7 56.5 56.8 ...
 $ M1SUPPLY : num [1:385] 624 647 646 663 673 ...
 $ CCREDIT  : num [1:385] 607 614 622 628 634 ...
 $ BMINUSA  : num [1:385] 1.5 1.4 1.2 1.21 1.28 ...
 $ USTB3M   : num [1:385] 6.76 6.24 6.33 6.4 6 5.69 5.35 5.32 5.5 5.68 ...
 $ USTB10Y  : num [1:385] 7.78 7.3 7.71 7.8 7.3 7.17 7.45 7.43 7.25 7.11 ...
Code
dim(apt)
[1] 385  10
Code
head(apt)
# A tibble: 6 × 10
  Date                MICROSOFT SANDP   CPI INDPRO M1SUPPLY CCREDIT BMINUSA
  <dttm>                  <dbl> <dbl> <dbl>  <dbl>    <dbl>   <dbl>   <dbl>
1 1986-03-01 00:00:00    0.0955  239.  109.   56.5     624.    607.    1.5 
2 1986-04-01 00:00:00    0.112   236.  109.   56.6     647     614.    1.4 
3 1986-05-01 00:00:00    0.122   247.  109.   56.7     646.    622.    1.2 
4 1986-06-01 00:00:00    0.107   251.  110.   56.5     663.    628.    1.21
5 1986-07-01 00:00:00    0.0990  236.  110.   56.8     673.    634.    1.28
6 1986-08-01 00:00:00    0.0990  253.  110.   56.7     678.    641.    1.46
# ℹ 2 more variables: USTB3M <dbl>, USTB10Y <dbl>
Code
tail(apt)
# A tibble: 6 × 10
  Date                MICROSOFT SANDP   CPI INDPRO M1SUPPLY CCREDIT BMINUSA
  <dttm>                  <dbl> <dbl> <dbl>  <dbl>    <dbl>   <dbl>   <dbl>
1 2017-10-01 00:00:00      83.2 2575.  247.   105.    3582.   3771.    0.72
2 2017-11-01 00:00:00      84.2 2585.  247.   105.    3581.   3805.    0.7 
3 2017-12-01 00:00:00      85.5 2674.  247.   106.    3636.   3835.    0.71
4 2018-01-01 00:00:00      95.0 2824.  248.   105.    3635.   3843.    0.71
5 2018-02-01 00:00:00      93.8 2714.  249.   106.    3557.   3827.    0.69
6 2018-03-01 00:00:00      91.3 2641.  250.   107.    3685.   3824.    0.77
# ℹ 2 more variables: USTB3M <dbl>, USTB10Y <dbl>
Code
summary(apt[,2:10])
   MICROSOFT            SANDP             CPI            INDPRO      
 Min.   : 0.09549   Min.   : 230.3   Min.   :108.6   Min.   : 56.50  
 1st Qu.: 2.89453   1st Qu.: 459.3   1st Qu.:147.2   1st Qu.: 69.48  
 Median :25.72000   Median :1104.5   Median :178.8   Median : 93.00  
 Mean   :23.30104   Mean   :1066.0   Mean   :181.1   Mean   : 86.63  
 3rd Qu.:30.86000   3rd Qu.:1385.6   3rd Qu.:218.2   3rd Qu.:100.72  
 Max.   :95.01000   Max.   :2823.8   Max.   :249.6   Max.   :106.66  
    M1SUPPLY         CCREDIT          BMINUSA           USTB3M     
 Min.   : 624.3   Min.   : 606.8   Min.   :0.5500   Min.   :0.010  
 1st Qu.:1069.3   1st Qu.: 886.2   1st Qu.:0.7200   1st Qu.:0.450  
 Median :1191.8   Median :1891.8   Median :0.9000   Median :3.440  
 Mean   :1514.7   Mean   :1897.8   Mean   :0.9746   Mean   :3.297  
 3rd Qu.:1716.0   3rd Qu.:2620.5   3rd Qu.:1.1300   3rd Qu.:5.290  
 Max.   :3684.7   Max.   :3843.4   Max.   :3.3800   Max.   :9.140  
    USTB10Y     
 Min.   :1.500  
 1st Qu.:3.330  
 Median :4.910  
 Mean   :5.075  
 3rd Qu.:6.740  
 Max.   :9.520  
Code
# Transformación de variables
apt$dspread = c(NA,diff(apt$BMINUSA))
apt$dcredit = c(NA,diff(apt$CCREDIT))
apt$dprod = c(NA,diff(apt$INDPRO))
apt$dmoney = c(NA,diff(apt$M1SUPPLY))
apt$inflation = c(NA,100*diff(log(apt$CPI)))
apt$rterm = c(NA,diff(apt$USTB10Y-apt$USTB3M))
apt$dinflation = c(NA,100*diff(apt$inflation))
apt$r_msoft = c(NA,100*diff(log(apt$MICROSOFT)))
apt$r_sp = c(NA,100*diff(log(apt$SANDP)))
apt$er_sp = c(NA,100*diff(log(apt$SANDP)))-apt$USTB3M/12
apt$er_msoft = c(NA,100*diff(log(apt$MICROSOFT)))-apt$USTB3M/12
# Formato a la fechas
apt$Date = as.Date(apt$Date)
# Estimación MCO del modelo APT
APT_msft  <-  lm(er_msoft ~ er_sp + dprod + dcredit + dinflation +
                   dmoney + dspread + rterm, data = apt)
# Formulación alternativa del modelo de series temporales
# apt_ts <- ts(apt, start=c(1986,3), end=c(2018,3), frequency = 12)
# APT_msft_ts  <-  lm(er_msoft ~ er_sp + dprod + dcredit + dinflation +
# dmoney + dspread + rterm, data = apt_ts)
# summary(APT_msft_ts)
#
# Validación global de las hipótesis básicas del MRL
library(gvlma)
gvmodel <- gvlma(APT_msft)
summary(gvmodel)

Call:
lm(formula = er_msoft ~ er_sp + dprod + dcredit + dinflation + 
    dmoney + dspread + rterm, data = apt)

Residuals:
    Min      1Q  Median      3Q     Max 
-36.075  -4.440  -0.403   4.616  24.480 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.326002   0.475481   2.789  0.00556 ** 
er_sp        1.280799   0.094354  13.574  < 2e-16 ***
dprod       -0.303032   0.736881  -0.411  0.68113    
dcredit     -0.025364   0.027149  -0.934  0.35078    
dinflation   0.021947   0.012643   1.736  0.08341 .  
dmoney      -0.006871   0.015568  -0.441  0.65919    
dspread      2.260064   4.140284   0.546  0.58548    
rterm        4.733069   1.715814   2.758  0.00609 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.845 on 375 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.3452,    Adjusted R-squared:  0.333 
F-statistic: 28.24 on 7 and 375 DF,  p-value: < 2.2e-16


ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
Level of Significance =  0.05 

Call:
 gvlma(x = APT_msft) 

                       Value   p-value                   Decision
Global Stat        98.208657 0.000e+00 Assumptions NOT satisfied!
Skewness            0.002011 9.642e-01    Assumptions acceptable.
Kurtosis           63.503462 1.554e-15 Assumptions NOT satisfied!
Link Function       1.601550 2.057e-01    Assumptions acceptable.
Heteroscedasticity 33.101634 8.747e-09 Assumptions NOT satisfied!
Code
# Chequeo general del modelo estimado
library(performance)
model_performance(APT_msft)
# Indices of model performance

AIC      |     AICc |      BIC |    R2 | R2 (adj.) |  RMSE | Sigma
------------------------------------------------------------------
2674.673 | 2675.155 | 2710.205 | 0.345 |     0.333 | 7.762 | 7.845
Code
check_model(APT_msft)

Code
# Especificación del modelo
# Contraste de significación de variables irrelevantes
library(car)
linearHypothesis(APT_msft,c("dprod=0","dcredit=0","dmoney=0","dspread=0"))

Linear hypothesis test:
dprod = 0
dcredit = 0
dmoney = 0
dspread = 0

Model 1: restricted model
Model 2: er_msoft ~ er_sp + dprod + dcredit + dinflation + dmoney + dspread + 
    rterm

  Res.Df   RSS Df Sum of Sq      F Pr(>F)
1    379 23180                           
2    375 23078  4    101.88 0.4139 0.7986
Code
# Modelo restringido, sin las variables no significativas -> MCR
APT_msft_r  <-  lm(er_msoft ~ er_sp + dinflation + rterm, data = apt)
summary(APT_msft_r)

Call:
lm(formula = er_msoft ~ er_sp + dinflation + rterm, data = apt)

Residuals:
    Min      1Q  Median      3Q     Max 
-36.260  -4.330  -0.217   4.496  24.781 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.02089    0.40097   2.546  0.01129 *  
er_sp        1.26628    0.09210  13.750  < 2e-16 ***
dinflation   0.02187    0.01208   1.810  0.07101 .  
rterm        4.73880    1.70873   2.773  0.00582 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.821 on 379 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.3423,    Adjusted R-squared:  0.3371 
F-statistic: 65.75 on 3 and 379 DF,  p-value: < 2.2e-16
Code
# Adecuación de la forma funcional: test RESET de Ramsey
library(lmtest)
resettest(APT_msft_r, power=2, type="fitted")

    RESET test

data:  APT_msft_r
RESET = 1.7054, df1 = 1, df2 = 378, p-value = 0.1924
Code
resettest(APT_msft_r, power=2:3, type="fitted")

    RESET test

data:  APT_msft_r
RESET = 1.373, df1 = 2, df2 = 377, p-value = 0.2546
Code
# Lectura de librerías
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
import statsmodels.stats as smstats
import statsmodels.stats.diagnostic as smsdiag
from statsmodels.stats.outliers_influence import reset_ramsey
from statsmodels.compat import lzip
import scipy.stats as scs
import matplotlib.pyplot as plt
import seaborn as sns
# Lectura de datos
apt = pd.read_excel('data/APT_MICROSOFT.xls', parse_dates=['Date'], index_col='Date')
# Estructura de la base de datos
apt.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 385 entries, 1986-03-01 to 2018-03-01
Data columns (total 9 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   MICROSOFT  385 non-null    float64
 1   SANDP      385 non-null    float64
 2   CPI        385 non-null    float64
 3   INDPRO     385 non-null    float64
 4   M1SUPPLY   385 non-null    float64
 5   CCREDIT    385 non-null    float64
 6   BMINUSA    385 non-null    float64
 7   USTB3M     385 non-null    float64
 8   USTB10Y    385 non-null    float64
dtypes: float64(9)
memory usage: 30.1 KB
Code
apt.head()
            MICROSOFT       SANDP    CPI  ...  BMINUSA  USTB3M  USTB10Y
Date                                      ...                          
1986-03-01   0.095486  238.899994  108.8  ...     1.50    6.76     7.78
1986-04-01   0.111979  235.520004  108.6  ...     1.40    6.24     7.30
1986-05-01   0.121528  247.350006  108.9  ...     1.20    6.33     7.71
1986-06-01   0.106771  250.839996  109.5  ...     1.21    6.40     7.80
1986-07-01   0.098958  236.119995  109.5  ...     1.28    6.00     7.30

[5 rows x 9 columns]
Code
apt.tail()
            MICROSOFT        SANDP      CPI  ...  BMINUSA  USTB3M  USTB10Y
Date                                         ...                          
2017-11-01  84.169998  2584.840088  246.669  ...     0.70    1.25     2.35
2017-12-01  85.540001  2673.610107  246.524  ...     0.71    1.34     2.40
2018-01-01  95.010002  2823.810059  247.867  ...     0.71    1.43     2.58
2018-02-01  93.769997  2713.830078  248.991  ...     0.69    1.59     2.86
2018-03-01  91.269997  2640.870117  249.554  ...     0.77    1.73     2.84

[5 rows x 9 columns]
Code
apt.describe()
        MICROSOFT        SANDP         CPI  ...     BMINUSA      USTB3M     USTB10Y
count  385.000000   385.000000  385.000000  ...  385.000000  385.000000  385.000000
mean    23.301038  1066.036103  181.062083  ...    0.974623    3.296909    5.075403
std     19.255768   602.397165   41.136433  ...    0.382047    2.589801    2.173512
min      0.095486   230.300003  108.600000  ...    0.550000    0.010000    1.500000
25%      2.894531   459.269989  147.200000  ...    0.720000    0.450000    3.330000
50%     25.719999  1104.489990  178.800000  ...    0.900000    3.440000    4.910000
75%     30.860001  1385.589966  218.178000  ...    1.130000    5.290000    6.740000
max     95.010002  2823.810059  249.554000  ...    3.380000    9.140000    9.520000

[8 rows x 9 columns]
Code
# Transformación de variables
def LogDiff(x):
    x_diff = 100*np.log(x/x.shift(1))
    return x_diff
apt['dspread'] = apt['BMINUSA'] - apt['BMINUSA'].shift(1)
apt['dcredit'] = apt['CCREDIT'] - apt['CCREDIT'].shift(1)
apt['dprod'] = apt['INDPRO'] - apt['INDPRO'].shift(1)
apt['r_msft'] = LogDiff(apt['MICROSOFT'])
apt['r_sp'] = LogDiff(apt['SANDP'])
apt['dmoney'] = apt['M1SUPPLY'] - apt['M1SUPPLY'].shift(1)
apt['inflation'] = LogDiff(apt['CPI'])
apt['term'] = apt['USTB10Y'] - apt['USTB3M']
apt['dinflation'] = LogDiff(apt['CPI']) - LogDiff(apt['CPI']).shift(1)
apt['mustb3m'] = apt['USTB3M']/12
apt['rterm'] = (apt['USTB10Y'] - apt['USTB3M']) - (apt['USTB10Y'] - apt['USTB3M']).shift(1)
apt['er_msoft'] = LogDiff(apt['MICROSOFT']) - apt['USTB3M']/12
apt['er_sp'] = LogDiff(apt['SANDP']) - apt['USTB3M']/12
# Eliminación de observaciones no disponibles (NA)
data = apt.dropna()
# Estimación MCO del modelo APT
formula = 'er_msoft ~ er_sp + dprod + dcredit + dinflation + dmoney + dspread + rterm'
APT_msft = smf.ols(formula, data).fit()
print(APT_msft.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:               er_msoft   R-squared:                       0.345
Model:                            OLS   Adj. R-squared:                  0.333
Method:                 Least Squares   F-statistic:                     28.24
Date:                Sun, 09 Feb 2025   Prob (F-statistic):           3.52e-31
Time:                        13:46:45   Log-Likelihood:                -1328.3
No. Observations:                 383   AIC:                             2673.
Df Residuals:                     375   BIC:                             2704.
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      1.3260      0.475      2.789      0.006       0.391       2.261
er_sp          1.2808      0.094     13.574      0.000       1.095       1.466
dprod         -0.3030      0.737     -0.411      0.681      -1.752       1.146
dcredit       -0.0254      0.027     -0.934      0.351      -0.079       0.028
dinflation     2.1947      1.264      1.736      0.083      -0.291       4.681
dmoney        -0.0069      0.016     -0.441      0.659      -0.037       0.024
dspread        2.2601      4.140      0.546      0.585      -5.881      10.401
rterm          4.7331      1.716      2.758      0.006       1.359       8.107
==============================================================================
Omnibus:                       21.147   Durbin-Watson:                   2.097
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               63.505
Skew:                          -0.006   Prob(JB):                     1.62e-14
Kurtosis:                       4.995   Cond. No.                         293.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Code
# Especificación del modelo
# Contraste de significación de variables irrelevantes
H_0 = 'dprod = dcredit = dmoney = dspread = 0'
F_test = APT_msft.f_test(H_0)
print(F_test)
<F test: F=0.41387855952255137, p=0.7986453783395882, df_denom=375, df_num=4>
Code
# Modelo restringido, sin las variables no significativas -> MCR
formula = 'er_msoft ~ er_sp + dinflation + rterm'
APT_msft_r = smf.ols(formula, data).fit()
print(APT_msft_r.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:               er_msoft   R-squared:                       0.342
Model:                            OLS   Adj. R-squared:                  0.337
Method:                 Least Squares   F-statistic:                     65.75
Date:                Sun, 09 Feb 2025   Prob (F-statistic):           2.99e-34
Time:                        13:46:45   Log-Likelihood:                -1329.2
No. Observations:                 383   AIC:                             2666.
Df Residuals:                     379   BIC:                             2682.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      1.0209      0.401      2.546      0.011       0.232       1.809
er_sp          1.2663      0.092     13.750      0.000       1.085       1.447
dinflation     2.1874      1.208      1.810      0.071      -0.188       4.563
rterm          4.7388      1.709      2.773      0.006       1.379       8.099
==============================================================================
Omnibus:                       21.806   Durbin-Watson:                   2.084
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               67.158
Skew:                           0.007   Prob(JB):                     2.61e-15
Kurtosis:                       5.051   Cond. No.                         18.7
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Code
# Adecuación de la forma funcional: test RESET de Ramsey
reset_ramsey(APT_msft_r,degree=2)
<class 'statsmodels.stats.contrast.ContrastResults'>
<F test: F=1.70537226335334, p=0.19238105612636122, df_denom=378, df_num=1>
Code
reset_ramsey(APT_msft_r,degree=3)
<class 'statsmodels.stats.contrast.ContrastResults'>
<F test: F=1.3730181364024425, p=0.2546050361892768, df_denom=377, df_num=2>