Aplicación 2.4: Predicciones
Curva de Engel para el gasto en alimentos
En esta aplicación se estimará una curva de Engel que relaciona el gasto de las familias en alimentación (\(GALIM\)) con la renta disponible (\(RENTA\)), usando una muestra de 235 familias americanas:
\[GALIM_{i} = \beta_1 + \beta_2 RENTA_{i} + e_{i}\]
Una vez estimado el modelo, se usa la ecuación estimada para predecir el valor esperado del gasto en alimentos de distintos tipos de familia en función de su renta familiar.
Code
# Lectura de librerías
library(tidyverse)
# Lectura de datos
<- read_delim("data/ENGEL_ALIM_USA.csv", ";",
ENGEL_ALIM escape_double = FALSE, trim_ws = TRUE)
# Diagrama de puntos (scatter plot) de las variables RENTA y GALIM
ggplot(ENGEL_ALIM, aes(x = RENTA, y = GALIM)) +
geom_point() +
scale_x_continuous(limits = c(350, 5000), expand = c(0, 0)) +
theme_bw() +
labs(x = "Renta", y = "Gasto en alimentos")
Code
ggplot(ENGEL_ALIM, aes(x = RENTA, y = GALIM)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
scale_x_continuous(limits = c(350, 5000), expand = c(0, 0)) +
theme_bw() +
labs(x = "Renta", y = "Gasto en alimentos")
Code
# Estimación de una curva de Engel lineal por MCO
<- lm(GALIM ~ RENTA, data = ENGEL_ALIM)
lin_model summary(lin_model)
Call:
lm(formula = GALIM ~ RENTA, data = ENGEL_ALIM)
Residuals:
Min 1Q Median 3Q Max
-725.70 -60.24 -4.32 53.41 515.77
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 147.47539 15.95708 9.242 <2e-16 ***
RENTA 0.48518 0.01437 33.772 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 114.1 on 233 degrees of freedom
Multiple R-squared: 0.8304, Adjusted R-squared: 0.8296
F-statistic: 1141 on 1 and 233 DF, p-value: < 2.2e-16
Code
# Predicción
# Vector que contiene los nuevos valores de las variables explicativas
<- data.frame(RENTA=c(400, 2000, 4500))
new_RENTA new_RENTA
RENTA
1 400
2 2000
3 4500
Code
# Predicción puntual
<- predict(lin_model, new_RENTA)
pred_GALIM names(pred_GALIM) <-c("Renta = 400", "2000", "4500")
pred_GALIM
Renta = 400 2000 4500
341.5468 1117.8322 2330.7783
Code
# Predicción del valor esperado con intervalo de confianza
<- predict(lin_model, new_RENTA, interval="confidence", level=0.95)
pred_GALIM_IC pred_GALIM_IC
fit lwr upr
1 341.5468 319.4814 363.6122
2 1117.8322 1085.5127 1150.1518
3 2330.7783 2230.1418 2431.4148
Code
# Lectura de librerías
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
# Lectura de datos
= pd.read_csv('data/ENGEL_ALIM_USA.csv', delimiter=';')
ENGEL_ALIM # Estimación del modelo
= smf.ols('GALIM ~ RENTA', data=ENGEL_ALIM)
model =model.fit()
lin_modelprint(lin_model.summary())
OLS Regression Results
==============================================================================
Dep. Variable: GALIM R-squared: 0.830
Model: OLS Adj. R-squared: 0.830
Method: Least Squares F-statistic: 1141.
Date: Sun, 09 Feb 2025 Prob (F-statistic): 9.92e-92
Time: 13:14:04 Log-Likelihood: -1445.7
No. Observations: 235 AIC: 2895.
Df Residuals: 233 BIC: 2902.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 147.4754 15.957 9.242 0.000 116.037 178.914
RENTA 0.4852 0.014 33.772 0.000 0.457 0.513
==============================================================================
Omnibus: 68.110 Durbin-Watson: 1.411
Prob(Omnibus): 0.000 Jarque-Bera (JB): 927.676
Skew: -0.670 Prob(JB): 3.61e-202
Kurtosis: 12.641 Cond. No. 2.38e+03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.38e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
Code
# Predicción
# Generar un vector que contiene los nuevos valores de las variables explicativas
= pd.DataFrame({'RENTA': [400, 2000, 4500]}, index=['newRENTA1', 'newRENTA2', 'newRENTA3'])
new_RENTA print(f'new_RENTA: \n{new_RENTA}\n')
new_RENTA:
RENTA
newRENTA1 400
newRENTA2 2000
newRENTA3 4500
Code
# Predicción puntual
= lin_model.predict(new_RENTA)
pred_GALIM print(f'pred_GALIM: \n{pred_GALIM}\n')
pred_GALIM:
newRENTA1 341.546758
newRENTA2 1117.832236
newRENTA3 2330.778295
dtype: float64
Code
# Predicción con intervalo de confianza
= lin_model.get_prediction(new_RENTA).summary_frame(alpha=0.05)
pred_GALIM_IC print(f'pred_GALIM_IC: \n{pred_GALIM_IC}\n')
pred_GALIM_IC:
mean mean_se ... obs_ci_lower obs_ci_upper
0 341.546758 11.199590 ... 115.651327 567.442189
1 1117.832236 16.404210 ... 890.705804 1344.958668
2 2330.778295 51.079406 ... 2084.466352 2577.090238
[3 rows x 6 columns]