Aplicación 3.7: Autocorrelación espacial
Geometría de polígonos: Estadísticas ‘morales’ en Francia en 1830
En esta aplicación se analizarán con técnicas modernas de análisis estadístico espacial algunas de las cuestiones abordadas en el estudio fundacional de las ciencias sociales de André-Michel Guerry sobre la delincuencia, el suicidio, la alfabetización y otras “estadísticas morales” en la Francia de 1830.
Referencia original: Guerry, A.M. (1833). “Essai sur la statistique morale de la France”, París: Crochard.
Fuente de los datos: https://geodacenter.github.io/data-and-lab/Guerry/
Información complementaria: https://www.datavis.ca/gallery/guerry/
Para una introducción asequible al análisis de datos espaciales puede consultarse el libro de Moraga, P. (2023): “Spatial Statistics for Data Science: Theory and Practice with R”, Chapman & Hall/CRC: https://www.paulamoraga.com/book-spatial/.
Para una revisión completa sobre los principales métodos y técnicas utilizados en ciencia de datos espaciales en R y Python las referencias básicas son los libros de Pebesma, E. y Bivand, R. (2023): “Spatial Data Science: With Applications in R”, Chapman & Hall/CRC: https://r-spatial.org/book/, y Rey, S., Arribas-Bel, D. y Wolf, L. (2023): “Geographic Data Science with Python”, Chapman & Hall/CRC: https://geographicdata.science/book/.
Code
# Lectura de librerías
library(tidyverse)
library(sf) # https://r-spatial.github.io/sf/
# Lectura de datos
<- st_read("data/GUERRY_FRANCE.shp") Guerry
Reading layer `GUERRY_FRANCE' from data source
`/Users/JulianRamajo/Documentos/R-Python/EcMtRPy-Apps/data/GUERRY_FRANCE.shp'
using driver `ESRI Shapefile'
Simple feature collection with 85 features and 29 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 47680 ymin: 1703258 xmax: 1031401 ymax: 2677441
Projected CRS: NTF (Paris) / Lambert zone II
Code
class(Guerry)
[1] "sf" "data.frame"
Code
str(Guerry)
Classes 'sf' and 'data.frame': 85 obs. of 30 variables:
$ CODE_DE : chr "01" "02" "03" "04" ...
$ COUNT : num 1 1 1 1 1 1 1 1 1 1 ...
$ AVE_ID_ : num 49 812 1418 1603 1802 ...
$ dept : num 1 2 3 4 5 7 8 9 10 11 ...
$ Region : chr "E" "N" "C" "E" ...
$ Dprtmnt : chr "Ain" "Aisne" "Allier" "Basses-Alpes" ...
$ Crm_prs : num 28870 26226 26747 12935 17488 ...
$ Crm_prp : num 15890 5521 7925 7289 8174 ...
$ Litercy : num 37 51 13 46 69 27 67 18 59 34 ...
$ Donatns : num 5098 8901 10973 2733 6962 ...
$ Infants : num 33120 14572 17044 23018 23076 ...
$ Suicids : num 35039 12831 114121 14238 16171 ...
$ MainCty : num 2 2 2 1 1 1 2 1 2 2 ...
$ Wealth : num 73 22 61 76 83 84 33 72 14 17 ...
$ Commerc : num 58 10 66 49 65 1 4 60 3 35 ...
$ Clergy : num 11 82 68 5 10 28 50 39 42 15 ...
$ Crm_prn : num 71 4 46 70 22 76 53 74 77 80 ...
$ Infntcd : num 60 82 42 12 23 47 85 28 54 35 ...
$ Dntn_cl : num 69 36 76 37 64 67 49 63 9 27 ...
$ Lottery : num 41 38 66 80 79 70 31 75 28 50 ...
$ Desertn : num 55 82 16 32 35 19 62 22 86 63 ...
$ Instrct : num 46 24 85 29 7 62 9 77 15 48 ...
$ Prsttts : num 13 327 34 2 1 1 83 3 207 1 ...
$ Distanc : num 218.4 65.9 161.9 351.4 320.3 ...
$ Area : num 5762 7369 7340 6925 5549 ...
$ Pop1831 : num 346 513 298 156 129 ...
$ TopCrm : num 1 1 1 0 0 0 1 0 0 0 ...
$ TopLit : num 0 1 0 1 1 0 1 0 1 0 ...
$ TopWealth: num 1 0 1 1 1 1 0 1 0 0 ...
$ geometry :sfc_MULTIPOLYGON of length 85; first list element: List of 1
..$ :List of 1
.. ..$ : num [1:498, 1:2] 801150 800669 800688 800780 800589 ...
..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
..- attr(*, "names")= chr [1:29] "CODE_DE" "COUNT" "AVE_ID_" "dept" ...
Code
# Análisis exploratorio básico (EDA)
summary(Guerry)
CODE_DE COUNT AVE_ID_ dept
Length:85 Min. :1.000 Min. : 49 Min. : 1.00
Class :character 1st Qu.:1.000 1st Qu.: 8612 1st Qu.:24.00
Mode :character Median :1.000 Median :16732 Median :45.00
Mean :1.059 Mean :17701 Mean :45.08
3rd Qu.:1.000 3rd Qu.:26842 3rd Qu.:66.00
Max. :4.000 Max. :36521 Max. :89.00
Region Dprtmnt Crm_prs Crm_prp
Length:85 Length:85 Min. : 5883 Min. : 1368
Class :character Class :character 1st Qu.:14790 1st Qu.: 5990
Mode :character Mode :character Median :18785 Median : 7624
Mean :19961 Mean : 7881
3rd Qu.:26221 3rd Qu.: 9190
Max. :37014 Max. :20235
Litercy Donatns Infants Suicids MainCty
Min. :12.00 Min. : 1246 Min. : 2660 Min. : 3460 Min. :1
1st Qu.:25.00 1st Qu.: 3446 1st Qu.:14281 1st Qu.: 15400 1st Qu.:2
Median :38.00 Median : 4964 Median :17044 Median : 26198 Median :2
Mean :39.14 Mean : 6723 Mean :18983 Mean : 36517 Mean :2
3rd Qu.:52.00 3rd Qu.: 9242 3rd Qu.:21981 3rd Qu.: 45180 3rd Qu.:2
Max. :74.00 Max. :27830 Max. :62486 Max. :163241 Max. :3
Wealth Commerc Clergy Crm_prn Infntcd
Min. : 1.00 Min. : 1.00 Min. : 2.00 Min. : 1.00 Min. : 1
1st Qu.:22.00 1st Qu.:21.00 1st Qu.:23.00 1st Qu.:22.00 1st Qu.:23
Median :44.00 Median :42.00 Median :44.00 Median :43.00 Median :44
Mean :43.58 Mean :42.33 Mean :43.93 Mean :43.06 Mean :44
3rd Qu.:65.00 3rd Qu.:63.00 3rd Qu.:65.00 3rd Qu.:64.00 3rd Qu.:65
Max. :86.00 Max. :86.00 Max. :86.00 Max. :86.00 Max. :86
Dntn_cl Lottery Desertn Instrct
Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. : 1.00
1st Qu.:22.00 1st Qu.:22.00 1st Qu.:23.00 1st Qu.:23.00
Median :43.00 Median :43.00 Median :44.00 Median :42.00
Mean :43.02 Mean :43.04 Mean :43.91 Mean :43.34
3rd Qu.:64.00 3rd Qu.:64.00 3rd Qu.:65.00 3rd Qu.:65.00
Max. :86.00 Max. :86.00 Max. :86.00 Max. :86.00
Prsttts Distanc Area Pop1831
Min. : 0.0 Min. : 0.0 Min. : 762 Min. :129.1
1st Qu.: 6.0 1st Qu.:119.7 1st Qu.: 5361 1st Qu.:283.8
Median : 34.0 Median :199.2 Median : 6040 Median :346.3
Mean : 143.5 Mean :204.1 Mean : 6117 Mean :380.8
3rd Qu.: 117.0 3rd Qu.:283.8 3rd Qu.: 6815 3rd Qu.:445.2
Max. :4744.0 Max. :403.4 Max. :10000 Max. :989.9
TopCrm TopLit TopWealth geometry
Min. :0.0000 Min. :0.0000 Min. :0.0000 MULTIPOLYGON :85
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 epsg:27572 : 0
Median :0.0000 Median :0.0000 Median :0.0000 +proj=lcc ...: 0
Mean :0.3294 Mean :0.3294 Mean :0.3294
3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
Max. :1.0000 Max. :1.0000 Max. :1.0000
Code
# Gráficas: representación de datos espaciales
# Mapa de cloropletas
plot(Guerry['Crm_prs'])
Code
# Mapa interactivo
library(mapview)
mapview(Guerry, zcol = "Crm_prs")
Code
# ESTADÍSTICA ESPACIAL: ANÁLISIS EXPLORATORIO DE DATOS ESPACIALES (ESDA)
# Librería spdep (https://r-spatial.github.io/spdep/)
# [Ver también la librería sfdep: https://sfdep.josiahparry.com/]
library(spdep)
# Matriz de pesos espaciales
# (www.paulamoraga.com/book-spatial/spatial-neighborhood-matrices.html)
# Cálculo de los vecinos de cada región
<- spdep::poly2nb(Guerry, queen = TRUE)
nb nb
Neighbour list object:
Number of regions: 85
Number of nonzero links: 420
Percentage nonzero weights: 5.813149
Average number of links: 4.941176
Code
plot(st_geometry(Guerry), border = "lightgrey")
plot.nb(nb, st_geometry(Guerry), add = TRUE)
Code
# ¿Cómo funciona al nivel individual la lista nb?
head(nb)
[[1]]
[1] 36 37 67 69
[[2]]
[1] 7 49 57 58 73 76
[[3]]
[1] 17 21 40 56 61 69
[[4]]
[1] 5 24 79 80
[[5]]
[1] 4 24 36
[[6]]
[1] 24 28 36 40 41 46 80
Code
<- 1 # id de la región
id $neighbors <- "Otros"
Guerry$neighbors[id] <- "Región"
Guerry$neighbors[nb[[id]]] <- "Vecinos"
Guerryggplot(Guerry) + geom_sf(aes(fill = neighbors)) + theme_bw() +
scale_fill_manual(values = c("white","gray30", "gray"))
Code
# Construcción de la matriz W
<- spdep::nb2listw(nb, style = "W")
Wnb Wnb
Characteristics of weights list object:
Neighbour list object:
Number of regions: 85
Number of nonzero links: 420
Percentage nonzero weights: 5.813149
Average number of links: 4.941176
Weights style: W
Weights constants summary:
n nn S0 S1 S2
W 85 7225 85 37.2761 347.6683
Code
head(Wnb$weights)
[[1]]
[1] 0.25 0.25 0.25 0.25
[[2]]
[1] 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
[[3]]
[1] 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
[[4]]
[1] 0.25 0.25 0.25 0.25
[[5]]
[1] 0.3333333 0.3333333 0.3333333
[[6]]
[1] 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571
Code
<- listw2mat(Wnb)
m ::levelplot(t(m)) lattice
Code
# Cálculo de los estadísticos de autocorrelación espacial
# I de Moran global
moran.plot(Guerry$Crm_prs, Wnb)
Code
<- moran.test(Guerry$Crm_prs, Wnb,
gmoran alternative = "greater")
gmoran
Moran I test under randomisation
data: Guerry$Crm_prs
weights: Wnb
Moran I statistic standard deviate = 6.0484, p-value = 7.316e-10
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.411459718 -0.011904762 0.004899501
Code
<- moran.mc(Guerry$Crm_prs, Wnb, nsim = 999)
gmoranMC gmoranMC
Monte-Carlo simulation of Moran I
data: Guerry$Crm_prs
weights: Wnb
number of simulations + 1: 1000
statistic = 0.41146, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater
Code
hist(gmoranMC$res)
abline(v = gmoranMC$statistic, col = "red")
Code
# I de Moran local (LISAs)
<- localmoran(Guerry$Crm_prs, Wnb, alternative = "greater")
lmoran head(lmoran)
Ii E.Ii Var.Ii Z.Ii Pr(z > E(Ii))
1 0.5222645 -0.017946035 0.36097297 0.8991367 0.184289928
2 0.8280165 -0.008874730 0.11710297 2.4455980 0.007230607
3 0.8035400 -0.010412142 0.13717616 2.1976551 0.013986846
4 0.7418897 -0.011161277 0.22605294 1.5838690 0.056611763
5 0.2311872 -0.001382714 0.03818002 1.1902425 0.116975548
6 0.8389098 -0.024865805 0.27314951 1.6527255 0.049193373
Code
tail(lmoran)
Ii E.Ii Var.Ii Z.Ii Pr(z > E(Ii))
80 0.895182573 -0.0092176169 0.121585312 2.59370206 0.004747436
81 0.025800751 -0.0001695898 0.003472936 0.44068598 0.329720173
82 -0.330384096 -0.0055421769 0.073375477 -1.19921375 0.884777578
83 -0.310472511 -0.0031036125 0.041190939 -1.51446442 0.935045954
84 0.001292507 -0.0002866396 0.003815012 0.02556668 0.489801481
85 -0.126709946 -0.0008641164 0.013969946 -1.06473485 0.856502032
Code
# Mapas interactivos
library(tmap)
tmap_mode("view")
$lmI <- lmoran[, "Ii"]
Guerry$lmZ <- lmoran[, "Z.Ii"]
Guerry$lmpval <- lmoran[, "Pr(z > E(Ii))"]
Guerry<- tm_shape(Guerry) +
p1 tm_polygons(col = "Crm_prs", title = "Crm_prs", style = "quantile") +
tm_layout(legend.outside = TRUE)
<- tm_shape(Guerry) +
p2 tm_polygons(col = "lmI", title = "I de Moran local",
style = "quantile") +
tm_layout(legend.outside = TRUE)
<- tm_shape(Guerry) +
p3 tm_polygons(col = "lmZ", title = "Z-score",
breaks = c(-Inf, 1.65, Inf)) +
tm_layout(legend.outside = TRUE)
<- tm_shape(Guerry) +
p4 tm_polygons(col = "lmpval", title = "p-valor",
breaks = c(-Inf, 0.05, Inf)) +
tm_layout(legend.outside = TRUE)
tmap_arrange(p1, p2, p3, p4)
Code
tm_shape(Guerry) + tm_polygons(col = "lmZ",
title = "I de Moran local", style = "fixed",
breaks = c(-Inf, -1.96, 1.96, Inf),
labels = c("SAC negativa", "SAC no significativa", "SAC positiva"),
palette = c("blue", "white", "red")) +
tm_layout(legend.outside = TRUE)
Code
<- localmoran(Guerry$Crm_prs, Wnb, alternative = "two.sided")
lmoran2 head(lmoran2)
Ii E.Ii Var.Ii Z.Ii Pr(z != E(Ii))
1 0.5222645 -0.017946035 0.36097297 0.8991367 0.36857986
2 0.8280165 -0.008874730 0.11710297 2.4455980 0.01446121
3 0.8035400 -0.010412142 0.13717616 2.1976551 0.02797369
4 0.7418897 -0.011161277 0.22605294 1.5838690 0.11322353
5 0.2311872 -0.001382714 0.03818002 1.1902425 0.23395110
6 0.8389098 -0.024865805 0.27314951 1.6527255 0.09838675
Code
tail(lmoran2)
Ii E.Ii Var.Ii Z.Ii Pr(z != E(Ii))
80 0.895182573 -0.0092176169 0.121585312 2.59370206 0.009494873
81 0.025800751 -0.0001695898 0.003472936 0.44068598 0.659440345
82 -0.330384096 -0.0055421769 0.073375477 -1.19921375 0.230444843
83 -0.310472511 -0.0031036125 0.041190939 -1.51446442 0.129908092
84 0.001292507 -0.0002866396 0.003815012 0.02556668 0.979602961
85 -0.126709946 -0.0008641164 0.013969946 -1.06473485 0.286995936
Code
$lmpval2 <- lmoran2[, 5] # columna 5 contiene p-valores
Guerry<- moran.plot(as.vector(scale(Guerry$Crm_prs)), Wnb) mp
Code
# Mapa LISA
$quadrant <- NA
Guerry# Cuadrante HH (high-high)
$x >= 0 & mp$wx >= 0) & (Guerry$lmpval2 <= 0.05), "quadrant"]<- 1
Guerry[(mp# Cuadrante LL (low-low)
$x <= 0 & mp$wx <= 0) & (Guerry$lmpval2 <= 0.05), "quadrant"]<- 2
Guerry[(mp# Cuadrante HL (high-low)
$x >= 0 & mp$wx <= 0) & (Guerry$lmpval2 <= 0.05), "quadrant"]<- 3
Guerry[(mp# Cuadrante LH (low-high)
$x <= 0 & mp$wx >= 0) & (Guerry$lmpval2 <= 0.05), "quadrant"]<- 4
Guerry[(mp# Cuadrante NS (no significavo)
$lmpval2 > 0.05), "quadrant"] <- 5
Guerry[(Guerry#
tm_shape(Guerry) + tm_fill(col = "quadrant", title = "",
breaks = c(1, 2, 3, 4, 5, 6),
palette = c("red", "blue", "lightpink", "skyblue2", "white"),
labels = c("HH", "LL", "HL",
"LH", "NS")) +
tm_legend(text.size = 1) + tm_borders(alpha = 0.5) +
tm_layout(frame = FALSE, title = "Clusters") +
tm_layout(legend.outside = TRUE)
Code
# ECONOMETRÍA ESPACIAL: ANÁLISIS CONFIRMATORIO DE DATOS ESPACIALES
#
# Modelos econométricos para explicar la variable Crm_prs
#
# Regresión estándar (modelo lineal)
<- lm(Crm_prs ~ Region + Litercy + Donatns + Infants + Suicids,
Crm_LM data = Guerry)
summary(Crm_LM)
Call:
lm(formula = Crm_prs ~ Region + Litercy + Donatns + Infants +
Suicids, data = Guerry)
Residuals:
Min 1Q Median 3Q Max
-11260.7 -4301.5 -571.8 3783.8 14837.0
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.641e+04 3.425e+03 7.709 3.97e-11 ***
RegionE -3.068e+02 2.876e+03 -0.107 0.915
RegionN 2.469e+03 2.804e+03 0.881 0.381
RegionS -1.090e+04 2.260e+03 -4.822 7.14e-06 ***
RegionW 3.813e+02 2.290e+03 0.167 0.868
Litercy -8.997e+01 6.347e+01 -1.417 0.160
Donatns -1.896e-01 1.591e-01 -1.191 0.237
Infants 4.693e-03 9.099e-02 0.052 0.959
Suicids -1.814e-03 2.480e-02 -0.073 0.942
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6196 on 76 degrees of freedom
Multiple R-squared: 0.348, Adjusted R-squared: 0.2794
F-statistic: 5.071 on 8 and 76 DF, p-value: 4.542e-05
Code
# Regresiones espaciales
# Libraria spatialreg (https://r-spatial.github.io/spatialreg/)
library(spatialreg)
# Retardo espacial de la variable dependiente (SLM) [estimación S2SLS]
<- stsls(Crm_prs ~ Region + Litercy + Donatns + Infants + Suicids,
Crm_SLM data = Guerry,
listw = Wnb)
summary(Crm_SLM)
Call:stsls(formula = Crm_prs ~ Region + Litercy + Donatns + Infants +
Suicids, data = Guerry, listw = Wnb)
Residuals:
Min 1Q Median 3Q Max
-11360.90 -3999.21 -609.53 3214.41 15225.47
Coefficients:
Estimate Std. Error t value Pr(>|t|)
Rho 7.5420e-01 2.5326e-01 2.9779 0.002902
(Intercept) 8.0208e+03 6.9631e+03 1.1519 0.249359
RegionE -7.8425e+02 2.7086e+03 -0.2895 0.772166
RegionN 7.2377e+01 2.7564e+03 0.0263 0.979052
RegionS -5.1597e+03 2.8685e+03 -1.7988 0.072057
RegionW 3.8961e+02 2.1527e+03 0.1810 0.856382
Litercy -3.0311e+01 6.2948e+01 -0.4815 0.630148
Donatns -2.2687e-01 1.5015e-01 -1.5109 0.130809
Infants 6.3493e-03 8.5547e-02 0.0742 0.940836
Suicids 7.0054e-03 2.3501e-02 0.2981 0.765637
Residual variance (sigma squared): 33939000, (sigma: 5825.7)
Anselin-Kelejian (1997) test for residual autocorrelation
test value: 2.3847, p-value: 0.12253
Code
# Retardo espacial de los errores (SEM) [estimación GMM]
<- GMerrorsar(Crm_prs ~ Region + Litercy + Donatns + Infants + Suicids,
Crm_SEM data =Guerry,
listw = Wnb)
summary(Crm_SEM)
Call:GMerrorsar(formula = Crm_prs ~ Region + Litercy + Donatns + Infants +
Suicids, data = Guerry, listw = Wnb)
Residuals:
Min 1Q Median 3Q Max
-11791.83 -4256.49 -222.09 4263.67 15003.03
Type: GM SAR estimator
Coefficients: (GM standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.5396e+04 3.4824e+03 7.2927 3.038e-13
RegionE -1.0359e+03 2.9313e+03 -0.3534 0.7238
RegionN 1.0069e+03 2.8768e+03 0.3500 0.7263
RegionS -1.0528e+04 2.4549e+03 -4.2885 1.799e-05
RegionW -1.8482e+02 2.4744e+03 -0.0747 0.9405
Litercy -6.0066e+01 6.2947e+01 -0.9542 0.3400
Donatns -1.8220e-01 1.5174e-01 -1.2007 0.2299
Infants 1.2229e-02 8.3868e-02 0.1458 0.8841
Suicids -2.2536e-03 2.4192e-02 -0.0932 0.9258
Lambda: 0.26267 (standard error): 0.4347 (z-value): 0.60427
Residual variance (sigma squared): 32698000, (sigma: 5718.2)
GM argmin sigma squared: 32424000
Number of observations: 85
Number of parameters estimated: 11
Code
# Modelo combinado (SAC -> SLM + SEM) [estimación GS2SLS]
<- gstsls(Crm_prs ~ Region + Litercy + Donatns + Infants + Suicids,
Crm_SAC data =Guerry,
listw = Wnb)
summary(Crm_SAC)
Call:gstsls(formula = Crm_prs ~ Region + Litercy + Donatns + Infants +
Suicids, data = Guerry, listw = Wnb)
Residuals:
Min 1Q Median 3Q Max
-10239.24 -4103.09 -218.59 3323.59 14042.87
Type: GM SARAR estimator
Coefficients: (GM standard errors)
Estimate Std. Error z value Pr(>|z|)
Rho_Wy 8.9782e-01 1.7828e-01 5.0359 4.756e-07
(Intercept) 5.6147e+03 5.0051e+03 1.1218 0.26195
RegionE 4.2234e+02 1.8599e+03 0.2271 0.82036
RegionN 1.2128e+03 1.8975e+03 0.6392 0.52272
RegionS -3.0187e+03 2.1041e+03 -1.4347 0.15137
RegionW 8.3674e+02 1.3890e+03 0.6024 0.54692
Litercy -5.6247e+01 4.5281e+01 -1.2422 0.21417
Donatns -2.1237e-01 1.1573e-01 -1.8351 0.06649
Infants -1.5936e-02 6.9529e-02 -0.2292 0.81871
Suicids 6.8299e-03 1.6968e-02 0.4025 0.68730
Lambda: -0.60748
Residual variance (sigma squared): 24576000, (sigma: 4957.4)
GM argmin sigma squared: 25561000
Number of observations: 85
Number of parameters estimated: 12
Code
# Versiones estimadas por máxima verosimilitud
<- lagsarlm(Crm_prs ~ Region + Litercy + Donatns + Infants + Suicids,
Crm_SLM_2 data = Guerry,
listw = Wnb)
summary(Crm_SLM_2)
Call:lagsarlm(formula = Crm_prs ~ Region + Litercy + Donatns + Infants +
Suicids, data = Guerry, listw = Wnb)
Residuals:
Min 1Q Median 3Q Max
-9815.314 -4306.668 -55.825 4260.298 15031.038
Type: lag
Coefficients: (numerical Hessian approximate standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.7222e+04 4.6919e+03 3.6706 0.000242
RegionE -5.4531e+02 2.7258e+03 -0.2001 0.841439
RegionN 1.2718e+03 2.7965e+03 0.4548 0.649259
RegionS -8.0315e+03 2.2666e+03 -3.5434 0.000395
RegionW 3.8544e+02 2.0742e+03 0.1858 0.852581
Litercy -6.0168e+01 5.6138e+01 -1.0718 0.283815
Donatns -2.0823e-01 1.4136e-01 -1.4730 0.140757
Infants 5.5205e-03 5.5539e-02 0.0994 0.920823
Suicids 2.5915e-03 2.3303e-02 0.1112 0.911451
Rho: 0.37673, LR test value: 6.9655, p-value: 0.0083095
Approximate (numerical Hessian) standard error: 0.13328
z-value: 2.8265, p-value: 0.0047054
Wald statistic: 7.9893, p-value: 0.0047054
Log likelihood: -854.5647 for lag model
ML residual variance (sigma squared): 30596000, (sigma: 5531.3)
Number of observations: 85
Number of parameters estimated: 11
AIC: 1731.1, (AIC for lm: 1736.1)
Code
<- errorsarlm(Crm_prs ~ Region + Litercy + Donatns + Infants + Suicids,
Crm_SEM_2 data = Guerry,
listw = Wnb,
Durbin = FALSE)
summary(Crm_SEM_2)
Call:errorsarlm(formula = Crm_prs ~ Region + Litercy + Donatns + Infants +
Suicids, data = Guerry, listw = Wnb, Durbin = FALSE)
Residuals:
Min 1Q Median 3Q Max
-10770.31 -4238.89 -658.43 3347.89 15369.79
Type: error
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.5012e+04 3.5727e+03 7.0008 2.544e-12
RegionE -1.2123e+03 2.9929e+03 -0.4051 0.6854
RegionN 4.8363e+02 2.9448e+03 0.1642 0.8695
RegionS -1.0329e+04 2.5682e+03 -4.0219 5.773e-05
RegionW -3.7700e+02 2.5848e+03 -0.1459 0.8840
Litercy -4.9483e+01 6.3640e+01 -0.7775 0.4368
Donatns -1.8187e-01 1.5155e-01 -1.2001 0.2301
Infants 1.3675e-02 8.2765e-02 0.1652 0.8688
Suicids -1.8978e-03 2.4299e-02 -0.0781 0.9377
Lambda: 0.34269, LR test value: 3.6061, p-value: 0.057569
Approximate (numerical Hessian) standard error: 0.16651
z-value: 2.058, p-value: 0.039586
Wald statistic: 4.2356, p-value: 0.039586
Log likelihood: -856.2444 for error model
ML residual variance (sigma squared): 32024000, (sigma: 5659)
Number of observations: 85
Number of parameters estimated: 11
AIC: 1734.5, (AIC for lm: 1736.1)
Code
<- sacsarlm(Crm_prs ~ Region + Litercy + Donatns + Infants + Suicids,
Crm_SAC_2 data = Guerry,
listw = Wnb)
summary(Crm_SAC_2)
Call:sacsarlm(formula = Crm_prs ~ Region + Litercy + Donatns + Infants +
Suicids, data = Guerry, listw = Wnb)
Residuals:
Min 1Q Median 3Q Max
-9518.93 -4094.65 -415.92 3955.99 14639.10
Type: sac
Coefficients: (numerical Hessian approximate standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.4668e+04 4.8636e+03 3.0159 0.002562
RegionE -5.6833e-01 NaN NaN NaN
RegionN 1.7593e+03 1.4607e+03 1.2044 0.228419
RegionS -6.8264e+03 2.1260e+03 -3.2108 0.001323
RegionW 6.9736e+02 1.4631e+03 0.4766 0.633613
Litercy -6.9346e+01 2.3895e+01 -2.9021 0.003707
Donatns -2.1431e-01 1.3310e-01 -1.6102 0.107357
Infants -3.1417e-03 NaN NaN NaN
Suicids 4.3722e-03 1.9408e-02 0.2253 0.821758
Rho: 0.5054
Approximate (numerical Hessian) standard error: 0.17131
z-value: 2.9502, p-value: 0.0031756
Lambda: -0.2419
Approximate (numerical Hessian) standard error: 0.27179
z-value: -0.89001, p-value: 0.37346
LR test value: 7.5349, p-value: 0.023111
Log likelihood: -854.28 for sac model
ML residual variance (sigma squared): 29141000, (sigma: 5398.3)
Number of observations: 85
Number of parameters estimated: 12
AIC: 1732.6, (AIC for lm: 1736.1)
Code
# Lectura de librerías
import warnings
"ignore")
warnings.filterwarnings(import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import geopandas as gpd
#Lectura de datos
= gpd.read_file('data/GUERRY_FRANCE.shp')
Guerry Guerry.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 85 entries, 0 to 84
Data columns (total 30 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CODE_DE 85 non-null object
1 COUNT 85 non-null float64
2 AVE_ID_ 85 non-null float64
3 dept 85 non-null int64
4 Region 85 non-null object
5 Dprtmnt 85 non-null object
6 Crm_prs 85 non-null int64
7 Crm_prp 85 non-null int64
8 Litercy 85 non-null int64
9 Donatns 85 non-null int64
10 Infants 85 non-null int64
11 Suicids 85 non-null int64
12 MainCty 85 non-null int64
13 Wealth 85 non-null int64
14 Commerc 85 non-null int64
15 Clergy 85 non-null int64
16 Crm_prn 85 non-null int64
17 Infntcd 85 non-null int64
18 Dntn_cl 85 non-null int64
19 Lottery 85 non-null int64
20 Desertn 85 non-null int64
21 Instrct 85 non-null int64
22 Prsttts 85 non-null int64
23 Distanc 85 non-null float64
24 Area 85 non-null int64
25 Pop1831 85 non-null float64
26 TopCrm 85 non-null float64
27 TopLit 85 non-null float64
28 TopWealth 85 non-null float64
29 geometry 85 non-null geometry
dtypes: float64(7), geometry(1), int64(19), object(3)
memory usage: 20.0+ KB
Code
Guerry
CODE_DE COUNT ... TopWealth geometry
0 01 1.0 ... 1.0 POLYGON ((801150.000 2092615.000, 800669.000 2...
1 02 1.0 ... 0.0 POLYGON ((729326.000 2521619.000, 729320.000 2...
2 03 1.0 ... 1.0 POLYGON ((710830.000 2137350.000, 711746.000 2...
3 04 1.0 ... 1.0 POLYGON ((882701.000 1920024.000, 882408.000 1...
4 05 1.0 ... 1.0 POLYGON ((886504.000 1922890.000, 885733.000 1...
.. ... ... ... ... ...
80 85 1.0 ... 0.0 MULTIPOLYGON (((243055.000 2197885.000, 243019...
81 86 1.0 ... 1.0 POLYGON ((450395.000 2119945.000, 450065.000 2...
82 87 1.0 ... 1.0 POLYGON ((515230.000 2049895.000, 514990.000 2...
83 88 1.0 ... 1.0 POLYGON ((951144.000 2383215.000, 953082.000 2...
84 89 1.0 ... 0.0 POLYGON ((650260.000 2358330.000, 651712.000 2...
[85 rows x 30 columns]
Code
'Crm_prs', legend = True)
Guerry.plot( plt.show()
Code
# ESTADÍSTICA ESPACIAL: ANÁLISIS EXPLORATORIO DE DATOS ESPACIALES (ESDA)
# Librería PySAL (http://pysal.org/)
import libpysal
import esda
from esda.moran import Moran, Moran_Local
import splot
from splot.esda import moran_scatterplot, plot_moran, lisa_cluster, plot_local_autocorrelation
from splot.libpysal import plot_spatial_weights
from libpysal.weights import Queen, Rook, KNN
# Construcción de la matriz W de pesos espaciales
= Queen.from_dataframe(Guerry)
Wnb = 'r'
Wnb.transform
plot_spatial_weights(Wnb, Guerry) plt.show()
Code
0] Wnb.neighbors[
[66, 35, 68, 36]
Code
1] Wnb.neighbors[
[48, 6, 72, 57, 56, 75]
Code
2] Wnb.neighbors[
[16, 68, 20, 55, 39, 60]
Code
3] Wnb.neighbors[
[4, 79, 78, 23]
Code
4] Wnb.neighbors[
[35, 3, 23]
Code
5] Wnb.neighbors[
[35, 23, 39, 40, 27, 45, 79]
Code
# Cálculo de los estadísticos de autocorrelación espacial
# I de Moran global
= Moran(Guerry['Crm_prs'], Wnb)
gmoran print(gmoran.I)
0.41145971831297995
Code
print(gmoran.p_sim)
0.001
Code
=True, zstandard=True)
moran_scatterplot(gmoran, aspect_equal plt.show()
Code
# I de Moran local (LISAs)
= Moran_Local(Guerry['Crm_prs'], Wnb, permutations = 999, seed=12345)
lmoran lmoran.p_sim
array([0.185, 0.008, 0.013, 0.045, 0.111, 0.048, 0.294, 0.045, 0.193,
0.002, 0.001, 0.046, 0.13 , 0.001, 0.3 , 0.266, 0.011, 0.427,
0.071, 0.113, 0.248, 0.3 , 0.221, 0.032, 0.347, 0.292, 0.138,
0.001, 0.006, 0.166, 0.43 , 0.012, 0.009, 0.243, 0.02 , 0.417,
0.066, 0.412, 0.063, 0.432, 0.11 , 0.143, 0.438, 0.113, 0.17 ,
0.009, 0.16 , 0.134, 0.034, 0.218, 0.004, 0.229, 0.086, 0.094,
0.465, 0.079, 0.039, 0.307, 0.063, 0.035, 0.135, 0.192, 0.322,
0.037, 0.165, 0.206, 0.07 , 0.201, 0.008, 0.054, 0.34 , 0.116,
0.357, 0.479, 0.281, 0.081, 0.033, 0.029, 0.06 , 0.003, 0.334,
0.123, 0.073, 0.498, 0.168])
Code
=0.05, zstandard =False) # p-valor=0.05
moran_scatterplot(lmoran, p plt.show()
Code
=0.05)
lisa_cluster(lmoran, Guerry, p plt.show()
Code
# ECONOMETRÍA ESPACIAL: ANÁLISIS CONFIRMATORIO DE DATOS ESPACIALES
#
# Modelos econométricos para explicar la variable Crm_prs
#
# Regresión estándar (modelo lineal)
# Librería statsmodels (https://www.statsmodels.org/stable/)
import statsmodels.formula.api as smf
= smf.ols(formula = "Crm_prs ~ Region + Litercy + Donatns + Infants + Suicids", data = Guerry)
model = model.fit()
Crm_LM print(Crm_LM.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Crm_prs R-squared: 0.348
Model: OLS Adj. R-squared: 0.279
Method: Least Squares F-statistic: 5.071
Date: Sun, 09 Feb 2025 Prob (F-statistic): 4.54e-05
Time: 15:19:25 Log-Likelihood: -858.05
No. Observations: 85 AIC: 1734.
Df Residuals: 76 BIC: 1756.
Df Model: 8
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------
Intercept 2.641e+04 3425.318 7.709 0.000 1.96e+04 3.32e+04
Region[T.E] -306.8258 2875.815 -0.107 0.915 -6034.507 5420.855
Region[T.N] 2468.9607 2803.985 0.881 0.381 -3115.660 8053.582
Region[T.S] -1.09e+04 2260.121 -4.822 0.000 -1.54e+04 -6396.349
Region[T.W] 381.2855 2289.660 0.167 0.868 -4178.968 4941.539
Litercy -89.9681 63.471 -1.417 0.160 -216.381 36.445
Donatns -0.1896 0.159 -1.191 0.237 -0.507 0.127
Infants 0.0047 0.091 0.052 0.959 -0.177 0.186
Suicids -0.0018 0.025 -0.073 0.942 -0.051 0.048
==============================================================================
Omnibus: 2.467 Durbin-Watson: 1.777
Prob(Omnibus): 0.291 Jarque-Bera (JB): 2.198
Skew: 0.291 Prob(JB): 0.333
Kurtosis: 2.469 Cond. No. 3.15e+05
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.15e+05. This might indicate that there are
strong multicollinearity or other numerical problems.
Code
# Librería spreg (https://pysal.org/spreg/)
import spreg
"Region_E"] = (Guerry.Region == "E").astype(int)
Guerry["Region_N"] = (Guerry.Region == "N").astype(int)
Guerry["Region_S"] = (Guerry.Region == "S").astype(int)
Guerry["Region_W"] = (Guerry.Region == "W").astype(int)
Guerry[# Método OLS
= ["Region_E","Region_N","Region_S","Region_W","Litercy", "Donatns", "Infants", "Suicids",]
variable_names = spreg.OLS(
spreg_LM # Variable dependiente
"Crm_prs"]].values,
Guerry[[# Variables explicativas
Guerry[variable_names].values,# Nombre variable dependiente
="Crm_prs",
name_y# Nombre variables explicativas
=variable_names,
name_x
)print(spreg_LM.summary)
REGRESSION RESULTS
------------------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set : unknown
Weights matrix : None
Dependent Variable : Crm_prs Number of Observations: 85
Mean dependent var : 19960.9412 Number of Variables : 9
S.D. dependent var : 7299.2418 Degrees of Freedom : 76
R-squared : 0.3480
Adjusted R-squared : 0.2794
Sum squared residual: 2.91789e+09 F-statistic : 5.0710
Sigma-square :38393241.687 Prob(F-statistic) : 4.542e-05
S.E. of regression : 6196.228 Log likelihood : -858.047
Sigma-square ML :34328074.920 Akaike info criterion : 1734.095
S.E of regression ML: 5859.0165 Schwarz criterion : 1756.079
------------------------------------------------------------------------------------
Variable Coefficient Std.Error t-Statistic Probability
------------------------------------------------------------------------------------
CONSTANT 26405.29724 3425.31799 7.70886 0.00000
Region_E -306.82579 2875.81461 -0.10669 0.91531
Region_N 2468.96070 2803.98550 0.88052 0.38135
Region_S -10897.76959 2260.12076 -4.82176 0.00001
Region_W 381.28546 2289.66031 0.16652 0.86819
Litercy -89.96807 63.47099 -1.41747 0.16043
Donatns -0.18962 0.15915 -1.19150 0.23717
Infants 0.00469 0.09099 0.05158 0.95900
Suicids -0.00181 0.02480 -0.07315 0.94188
------------------------------------------------------------------------------------
REGRESSION DIAGNOSTICS
MULTICOLLINEARITY CONDITION NUMBER 13.689
TEST ON NORMALITY OF ERRORS
TEST DF VALUE PROB
Jarque-Bera 2 2.198 0.3331
DIAGNOSTICS FOR HETEROSKEDASTICITY
RANDOM COEFFICIENTS
TEST DF VALUE PROB
Breusch-Pagan test 8 7.506 0.4831
Koenker-Bassett test 8 10.218 0.2501
================================ END OF REPORT =====================================
Code
# Modelo con retardo espacial (SLM) [estimación STLS: MC2E-espacial]
= spreg.GM_Lag(
spreg_SLM # Variable dependiente
"Crm_prs"]].values,
Guerry[[# Variables explicativas
Guerry[variable_names].values,# Nombre variable dependiente
="Crm_prs",
name_y# Nombre variables explicativas
=variable_names,
name_x# Matriz W
=Wnb,
w
)print(spreg_SLM.summary)
REGRESSION RESULTS
------------------
SUMMARY OF OUTPUT: SPATIAL TWO STAGE LEAST SQUARES
--------------------------------------------------
Data set : unknown
Weights matrix : unknown
Dependent Variable : Crm_prs Number of Observations: 85
Mean dependent var : 19960.9412 Number of Variables : 10
S.D. dependent var : 7299.2418 Degrees of Freedom : 75
Pseudo R-squared : 0.4333
Spatial Pseudo R-squared: 0.3954
------------------------------------------------------------------------------------
Variable Coefficient Std.Error z-Statistic Probability
------------------------------------------------------------------------------------
CONSTANT 9377.19773 7551.32367 1.24180 0.21431
Region_E -749.02863 2541.99336 -0.29466 0.76825
Region_N 249.19405 2631.78105 0.09469 0.92456
Region_S -5583.06103 2938.95182 -1.89968 0.05748
Region_W 388.99266 2018.81627 0.19268 0.84721
Litercy -34.71215 60.30130 -0.57565 0.56486
Donatns -0.22412 0.14102 -1.58927 0.11200
Infants 0.00623 0.08023 0.07762 0.93813
Suicids 0.00635 0.02211 0.28736 0.77384
W_Crm_prs 0.69856 0.28393 2.46033 0.01388
------------------------------------------------------------------------------------
Instrumented: W_Crm_prs
Instruments: W_Donatns, W_Infants, W_Litercy, W_Region_E, W_Region_N,
W_Region_S, W_Region_W, W_Suicids
================================ END OF REPORT =====================================
Code
# Modelo con errores espaciales (SEM) [estimación GMM]
= spreg.GM_Error( # spreg.GM_Error_Het permite heteroscedasticidad
spreg_SEM # Variable dependiente
"Crm_prs"]].values,
Guerry[[# Variables explicativas
Guerry[variable_names].values,# Nombre variable dependiente
="Crm_prs",
name_y# Nombre variables explicativas
=variable_names,
name_x# Matriz W
=Wnb,
w
)print(spreg_SEM.summary)
REGRESSION RESULTS
------------------
SUMMARY OF OUTPUT: GM SPATIALLY WEIGHTED LEAST SQUARES
------------------------------------------------------
Data set : unknown
Weights matrix : unknown
Dependent Variable : Crm_prs Number of Observations: 85
Mean dependent var : 19960.9412 Number of Variables : 9
S.D. dependent var : 7299.2418 Degrees of Freedom : 76
Pseudo R-squared : 0.3283
------------------------------------------------------------------------------------
Variable Coefficient Std.Error z-Statistic Probability
------------------------------------------------------------------------------------
CONSTANT 24096.24704 3841.99296 6.27181 0.00000
Region_E -1424.15919 3147.92640 -0.45241 0.65097
Region_N -637.02311 3118.18093 -0.20429 0.83812
Region_S -9751.85083 2848.45495 -3.42356 0.00062
Region_W -790.56623 2861.63332 -0.27626 0.78235
Litercy -26.71237 65.54424 -0.40755 0.68361
Donatns -0.18597 0.15231 -1.22100 0.22209
Infants 0.01560 0.08104 0.19253 0.84733
Suicids -0.00033 0.02463 -0.01349 0.98924
lambda 0.49971
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================
Code
# Modelo combinado (SAC -> SLM + SEM) [estimación GMM]
= spreg.GM_Combo(
spreg_SAC # Variable dependiente
"Crm_prs"]].values,
Guerry[[# Variables explicativas
Guerry[variable_names].values,# Nombre variable dependiente
="Crm_prs",
name_y# Nombre variables explicativas
=variable_names,
name_x# Matriz W
=Wnb,
w
)print(spreg_SAC.summary)
REGRESSION RESULTS
------------------
SUMMARY OF OUTPUT: SPATIALLY WEIGHTED 2SLS - GM-COMBO MODEL
-----------------------------------------------------------
Data set : unknown
Weights matrix : unknown
Dependent Variable : Crm_prs Number of Observations: 85
Mean dependent var : 19960.9412 Number of Variables : 10
S.D. dependent var : 7299.2418 Degrees of Freedom : 75
Pseudo R-squared : 0.4232
Spatial Pseudo R-squared: 0.4115
------------------------------------------------------------------------------------
Variable Coefficient Std.Error z-Statistic Probability
------------------------------------------------------------------------------------
CONSTANT 7996.01793 6490.65518 1.23193 0.21798
Region_E 801.55387 1729.01926 0.46359 0.64294
Region_N 1802.35273 1881.90046 0.95773 0.33820
Region_S -3592.10110 2588.88384 -1.38751 0.16529
Region_W 1007.58587 1283.33900 0.78513 0.43238
Litercy -70.99273 45.08605 -1.57461 0.11535
Donatns -0.20684 0.11025 -1.87619 0.06063
Infants -0.02612 0.06681 -0.39089 0.69588
Suicids 0.00633 0.01585 0.39922 0.68973
W_Crm_prs 0.81359 0.24491 3.32197 0.00089
lambda -0.80134
------------------------------------------------------------------------------------
Instrumented: W_Crm_prs
Instruments: W_Donatns, W_Infants, W_Litercy, W_Region_E, W_Region_N,
W_Region_S, W_Region_W, W_Suicids
================================ END OF REPORT =====================================
Code
# Versiones estimadas por máxima verosimilitud
# Modelo con retardo espacial (SLM)
= spreg.ML_Lag(
spreg_SLM_2 # Variable dependiente
"Crm_prs"]].values,
Guerry[[# Variables explicativas
Guerry[variable_names].values,# Nombre variable dependiente
="Crm_prs",
name_y# Nombre variables explicativas
=variable_names,
name_x# Matriz W
=Wnb,
w
)print(spreg_SLM_2.summary)
REGRESSION RESULTS
------------------
SUMMARY OF OUTPUT: MAXIMUM LIKELIHOOD SPATIAL LAG (METHOD = FULL)
-----------------------------------------------------------------
Data set : unknown
Weights matrix : unknown
Dependent Variable : Crm_prs Number of Observations: 85
Mean dependent var : 19960.9412 Number of Variables : 10
S.D. dependent var : 7299.2418 Degrees of Freedom : 75
Pseudo R-squared : 0.4201
Spatial Pseudo R-squared: 0.3818
Log likelihood : -854.5647
Sigma-square ML :30595830.6607 Akaike info criterion : 1729.129
S.E of regression : 5531.3498 Schwarz criterion : 1753.556
------------------------------------------------------------------------------------
Variable Coefficient Std.Error z-Statistic Probability
------------------------------------------------------------------------------------
CONSTANT 17221.98905 4183.63430 4.11651 0.00004
Region_E -545.30719 2574.17509 -0.21184 0.83223
Region_N 1271.83349 2503.93670 0.50793 0.61150
Region_S -8031.53037 2246.94194 -3.57443 0.00035
Region_W 385.44198 2048.12866 0.18819 0.85073
Litercy -60.16838 56.82038 -1.05892 0.28964
Donatns -0.20823 0.14213 -1.46505 0.14291
Infants 0.00552 0.08135 0.06786 0.94590
Suicids 0.00259 0.02227 0.11639 0.90735
W_Crm_prs 0.37673 0.12468 3.02167 0.00251
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================
Code
# Modelo con errores espaciales (SEM)
= spreg.ML_Error(
spreg_SEM_2 # Variable dependiente
"Crm_prs"]].values,
Guerry[[# Variables explicativas
Guerry[variable_names].values,# Nombre variable dependiente
="Crm_prs",
name_y# Nombre variables explicativas
=variable_names,
name_x# Matriz W
=Wnb,
w
)print(spreg_SEM_2.summary)
REGRESSION RESULTS
------------------
SUMMARY OF OUTPUT: ML SPATIAL ERROR (METHOD = full)
---------------------------------------------------
Data set : unknown
Weights matrix : unknown
Dependent Variable : Crm_prs Number of Observations: 85
Mean dependent var : 19960.9412 Number of Variables : 9
S.D. dependent var : 7299.2418 Degrees of Freedom : 76
Pseudo R-squared : 0.3422
Log likelihood : -856.2444
Sigma-square ML :32024233.3912 Akaike info criterion : 1730.489
S.E of regression : 5658.9958 Schwarz criterion : 1752.473
------------------------------------------------------------------------------------
Variable Coefficient Std.Error z-Statistic Probability
------------------------------------------------------------------------------------
CONSTANT 25011.57053 3572.65783 7.00083 0.00000
Region_E -1212.33235 2992.93720 -0.40506 0.68543
Region_N 483.62591 2944.76658 0.16423 0.86955
Region_S -10328.83823 2568.15285 -4.02189 0.00006
Region_W -377.00187 2584.80417 -0.14585 0.88404
Litercy -49.48325 63.64034 -0.77755 0.43684
Donatns -0.18187 0.15155 -1.20006 0.23012
Infants 0.01368 0.08277 0.16523 0.86876
Suicids -0.00190 0.02430 -0.07810 0.93775
lambda 0.34269 0.13671 2.50667 0.01219
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================
Geometría de puntos: Precio de la vivienda en el condado de Lucas
En esta aplicación se realiza un análisis de tipo espacial de la base de datos utilizada, que contiene información sobre los precios de venta y características de las viviendas del condado de Lucas, en el estado de Ohio, Estados Unidos. Concretamente, se proponen modelos de regresión en los que las variables de dicho modelo y/o los errores del mismo presentan autocorrelación espacial.
La especificación econométrica que servirá de referencia es la siguiente:
\[log(PRICE_{i}) = \beta_1 + \beta_2 AGE_{i} + \beta_3 log(LOTSIZE_{i}) + \beta_4 ROOMS_{i} + e_{i}\]
donde \(PRICE\) es el precio de venta de la vivienda, \(AGE\) es la antigüedad de la misma, \(LOTSIZE\) es la superficie construida y \(ROOMS\) el número habitaciones que posee.
Code
# Lectura de librerías
library(tidyverse)
library(viridis)
library(sf)
library(geojsonsf)
library(spdep)
library(spatialreg)
# Lectura de datos (house {spData} -> Lucas county OH housing)
<- geojson_sf("data/LUCAS_USA.geojson")
house class(house)
[1] "sf" "data.frame"
Code
str(house)
Classes 'sf' and 'data.frame': 25357 obs. of 25 variables:
$ TLA : num 3273 920 1956 1430 2208 ...
$ s1993 : num 0 0 1 0 0 1 0 1 0 0 ...
$ price : num 303000 92000 90000 330000 185000 ...
$ frontage : num 177 100 192 200 241 154 320 362 500 220 ...
$ s1998 : num 0 0 0 0 0 0 0 0 0 0 ...
$ garage : chr "attached" "detached" "attached" "no garage" ...
$ stories : chr "one+half" "one" "two" "one+half" ...
$ syear : chr "1996" "1997" "1993" "1997" ...
$ baths : num 3 1 1 1 2 1 1 2 1 1 ...
$ halfbaths : num 1 0 0 0 1 0 0 1 0 0 ...
$ geometry :sfc_POINT of length 25357; first list element: 'XY' num -83.9 41.4
$ wall : chr "partbrk" "metlvnyl" "stucdrvt" "wood" ...
$ garagesqft: num 625 308 480 0 528 1200 360 672 624 780 ...
$ s1996 : num 1 0 0 0 0 0 0 0 1 1 ...
$ depth : num 0 0 0 217 0 1420 0 0 990 0 ...
$ rooms : num 12 4 7 7 7 5 4 6 7 6 ...
$ s1995 : num 0 0 0 0 1 0 1 0 0 0 ...
$ beds : num 4 2 3 4 3 1 2 3 3 2 ...
$ lotsize : num 53496 37900 52900 43560 225800 ...
$ sdate : num 960423 970421 931101 971223 950807 ...
$ yrbuilt : num 1978 1957 1937 1887 1978 ...
$ avalue : num 306514 84628 126514 199228 192514 ...
$ s1994 : num 0 0 0 0 0 0 0 0 0 0 ...
$ s1997 : num 0 1 0 1 0 0 0 0 0 0 ...
$ age : num 0.21 0.42 0.62 1.12 0.21 0.1 0.72 0.09 1.22 0.63 ...
- attr(*, "sf_column")= chr "geometry"
Code
# Análisis exploratorio básico (EDA)
summary(house)
TLA s1993 price frontage
Min. : 120 Min. :0.0000 Min. : 2000 Min. : 0.00
1st Qu.:1070 1st Qu.:0.0000 1st Qu.: 41900 1st Qu.: 40.00
Median :1318 Median :0.0000 Median : 65500 Median : 50.00
Mean :1462 Mean :0.1286 Mean : 79018 Mean : 64.27
3rd Qu.:1682 3rd Qu.:0.0000 3rd Qu.: 97000 3rd Qu.: 75.00
Max. :7616 Max. :1.0000 Max. :875000 Max. :810.00
s1998 garage stories syear
Min. :0.0000 Length:25357 Length:25357 Length:25357
1st Qu.:0.0000 Class :character Class :character Class :character
Median :0.0000 Mode :character Mode :character Mode :character
Mean :0.1727
3rd Qu.:0.0000
Max. :1.0000
baths halfbaths geometry wall
Min. :0.000 Min. :0.0000 POINT :25357 Length:25357
1st Qu.:1.000 1st Qu.:0.0000 epsg:4326 : 0 Class :character
Median :1.000 Median :0.0000 +proj=long...: 0 Mode :character
Mean :1.242 Mean :0.3412
3rd Qu.:1.000 3rd Qu.:1.0000
Max. :7.000 Max. :3.0000
garagesqft s1996 depth rooms
Min. : 0.0 Min. :0.0000 Min. : 0.0 Min. : 1.000
1st Qu.: 280.0 1st Qu.:0.0000 1st Qu.: 105.0 1st Qu.: 5.000
Median : 396.0 Median :0.0000 Median : 120.0 Median : 6.000
Mean : 369.5 Mean :0.1908 Mean : 119.1 Mean : 6.115
3rd Qu.: 484.0 3rd Qu.:0.0000 3rd Qu.: 140.0 3rd Qu.: 7.000
Max. :5755.0 Max. :1.0000 Max. :1587.0 Max. :20.000
s1995 beds lotsize sdate
Min. :0.0000 Min. :0.000 Min. : 702 Min. :930104
1st Qu.:0.0000 1st Qu.:3.000 1st Qu.: 4700 1st Qu.:941025
Median :0.0000 Median :3.000 Median : 6800 Median :960520
Mean :0.1629 Mean :2.987 Mean : 13332 Mean :957704
3rd Qu.:0.0000 3rd Qu.:3.000 3rd Qu.: 11100 3rd Qu.:970818
Max. :1.0000 Max. :9.000 Max. :429100 Max. :981005
yrbuilt avalue s1994 s1997
Min. :1835 Min. : 1714 Min. :0.0000 Min. :0.0000
1st Qu.:1924 1st Qu.: 38514 1st Qu.:0.0000 1st Qu.:0.0000
Median :1950 Median : 59800 Median :0.0000 Median :0.0000
Mean :1945 Mean : 73641 Mean :0.1467 Mean :0.1984
3rd Qu.:1964 3rd Qu.: 91114 3rd Qu.:0.0000 3rd Qu.:0.0000
Max. :1998 Max. :788114 Max. :1.0000 Max. :1.0000
age
Min. :0.0100
1st Qu.:0.3500
Median :0.4900
Mean :0.5366
3rd Qu.:0.7500
Max. :1.6400
Code
# Análisis exploratorio para datos espaciales (ESDA)
# Gráficas: representación de datos espaciales
ggplot(house) + geom_sf(aes(col = price)) + scale_color_viridis() + theme_bw()
Code
# Mapa interactivo
library(leaflet)
<- colorNumeric(palette = "viridis", domain = house$price)
pal leaflet(house) %>% addTiles() %>%
addCircles(lng = st_coordinates(house)[, 1],
lat = st_coordinates(house)[, 2],
color = ~pal(price)) %>%
addLegend(pal = pal, values = ~price, position = "bottomright")
Code
# Información sobre las coordenadas (longitud y latitud) de los datos
<- st_coordinates(house)
coords # Construcción de matrices de pesos espaciales:
# Vecinos más próximos (k=6, mediana de vecinos)
<- knearneigh(coords, k=6)
knn6 <- knn2nb(knn6)
nb6nn # Gráficos de vecinos
plot(st_geometry(house))
plot(nb6nn, coords, add=TRUE, col="red")
Code
# Matriz espacial W estandarizada por filas
<- nb2listw(neighbours=nb6nn, style="W")
Wnb6nn Wnb6nn
Characteristics of weights list object:
Neighbour list object:
Number of regions: 25357
Number of nonzero links: 152142
Percentage nonzero weights: 0.02366211
Average number of links: 6
Non-symmetric neighbours list
Weights style: W
Weights constants summary:
n nn S0 S1 S2
W 25357 642977449 25357 7570.944 104462.6
Code
# Estadísticos de autocorrelación espacial
# Globales
moran.test(house$price, listw=Wnb6nn)
Moran I test under randomisation
data: house$price
weights: Wnb6nn
Moran I statistic standard deviate = 226.75, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
7.777479e-01 -3.943840e-05 1.176627e-05
Code
moran.plot(house$price, listw=Wnb6nn)
Code
# Locales (LISAs)
<- as.data.frame(localmoran(house$price, listw=Wnb6nn))
LocalI str(LocalI)
'data.frame': 25357 obs. of 5 variables:
$ Ii : num 3.875 0.353 0.273 1.586 0.921 ...
$ E.Ii : num -5.56e-04 -1.87e-06 -1.34e-06 -6.98e-04 -1.24e-04 ...
$ Var.Ii : num 2.34795 0.00789 0.00565 2.94771 0.52591 ...
$ Z.Ii : num 2.529 3.972 3.627 0.924 1.27 ...
$ Pr(z != E(Ii)): num 1.14e-02 7.12e-05 2.87e-04 3.56e-01 2.04e-01 ...
Code
<- bind_cols(house,LocalI)
house_LocalI plot(house_LocalI["Z.Ii"])
Code
plot(house_LocalI["Pr(z != E(Ii))"])
Code
# Modelos econométricos espaciales
<- formula(log(price) ~ age + log(lotsize) + rooms)
form # Modelo lineal (LM) [estimación MCO]
<- lm(formula=form, data=house)
m1_LM summary(m1_LM)
Call:
lm(formula = form, data = house)
Residuals:
Min 1Q Median 3Q Max
-3.2148 -0.2090 0.0662 0.2914 2.8415
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.793591 0.042969 204.65 <2e-16 ***
age -1.475724 0.012249 -120.47 <2e-16 ***
log(lotsize) 0.244074 0.004395 55.53 <2e-16 ***
rooms 0.135360 0.002383 56.79 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4791 on 25353 degrees of freedom
Multiple R-squared: 0.6057, Adjusted R-squared: 0.6057
F-statistic: 1.298e+04 on 3 and 25353 DF, p-value: < 2.2e-16
Code
# Modelo con retardo espacial (SLM)
<- stsls(formula=form, data=house, listw=Wnb6nn)
m2_SLM summary(m2_SLM)
Call:stsls(formula = form, data = house, listw = Wnb6nn)
Residuals:
Min 1Q Median 3Q Max
-2.474539 -0.140669 0.041546 0.203977 2.414065
Coefficients:
Estimate Std. Error t value Pr(>|t|)
Rho 0.5238190 0.0085133 61.529 < 2.2e-16
(Intercept) 4.0433688 0.0830646 48.677 < 2.2e-16
age -0.7479265 0.0147058 -50.859 < 2.2e-16
log(lotsize) 0.1158118 0.0037650 30.760 < 2.2e-16
rooms 0.0925783 0.0018369 50.400 < 2.2e-16
Residual variance (sigma squared): 0.11678, (sigma: 0.34174)
Anselin-Kelejian (1997) test for residual autocorrelation
test value: 283.94, p-value: < 2.22e-16
Code
# Modelo con errores espaciales (SEM)
<- GMerrorsar(formula=form, data=house, listw=Wnb6nn)
m3_SEM summary(m3_SEM)
Call:GMerrorsar(formula = form, data = house, listw = Wnb6nn)
Residuals:
Min 1Q Median 3Q Max
-3.15072 -0.20516 0.10807 0.33411 2.54157
Type: GM SAR estimator
Coefficients: (GM standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.8153942 0.0483589 182.291 < 2.2e-16
age -0.8687533 0.0137399 -63.228 < 2.2e-16
log(lotsize) 0.2266407 0.0051131 44.326 < 2.2e-16
rooms 0.1027348 0.0020207 50.842 < 2.2e-16
Lambda: 0.68971 (standard error): 0.011167 (z-value): 61.765
Residual variance (sigma squared): 0.11915, (sigma: 0.34518)
GM argmin sigma squared: 0.11951
Number of observations: 25357
Number of parameters estimated: 6
Code
# Modelo combinado (SAC -> SLM + SEM)
<- gstsls(formula=form, data=house, listw=Wnb6nn)
m4_SAC summary(m4_SAC)
Call:gstsls(formula = form, data = house, listw = Wnb6nn)
Residuals:
Min 1Q Median 3Q Max
-2.385331 -0.136218 0.034588 0.189469 2.532176
Type: GM SARAR estimator
Coefficients: (GM standard errors)
Estimate Std. Error z value Pr(>|z|)
Rho_Wy 0.5181163 0.0089175 58.101 < 2.2e-16
(Intercept) 3.8960079 0.0893801 43.589 < 2.2e-16
age -0.7048022 0.0140965 -49.998 < 2.2e-16
log(lotsize) 0.1338316 0.0043791 30.562 < 2.2e-16
rooms 0.0965855 0.0018598 51.934 < 2.2e-16
Lambda: 0.35507
Residual variance (sigma squared): 0.10714, (sigma: 0.32733)
GM argmin sigma squared: 0.10757
Number of observations: 25357
Number of parameters estimated: 7
Code
# Lectura de librerías
import warnings
"ignore")
warnings.filterwarnings(import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
import libpysal
from libpysal import weights
import esda
from esda.moran import Moran, Moran_Local
import splot
from splot.esda import moran_scatterplot, plot_moran, lisa_cluster, plot_local_autocorrelation
from splot.libpysal import plot_spatial_weights
import statsmodels.formula.api as smf
import spreg
# Lectura de datos
= gpd.read_file("data/LUCAS_USA.geojson")
house house.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 25357 entries, 0 to 25356
Data columns (total 25 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 price 25357 non-null int64
1 yrbuilt 25357 non-null int64
2 stories 25357 non-null object
3 TLA 25357 non-null int64
4 wall 25357 non-null object
5 beds 25357 non-null float64
6 baths 25357 non-null float64
7 halfbaths 25357 non-null float64
8 frontage 25357 non-null int64
9 depth 25357 non-null int64
10 garage 25357 non-null object
11 garagesqft 25357 non-null int64
12 rooms 25357 non-null int64
13 lotsize 25357 non-null int64
14 sdate 25357 non-null int64
15 avalue 25357 non-null int64
16 s1993 25357 non-null int64
17 s1994 25357 non-null int64
18 s1995 25357 non-null int64
19 s1996 25357 non-null int64
20 s1997 25357 non-null int64
21 s1998 25357 non-null int64
22 syear 25357 non-null object
23 age 25357 non-null float64
24 geometry 25357 non-null geometry
dtypes: float64(4), geometry(1), int64(16), object(4)
memory usage: 4.8+ MB
Code
# Análisis exploratorio básico (EDA)
round(2) house.describe().
price yrbuilt TLA ... s1997 s1998 age
count 25357.00 25357.00 25357.00 ... 25357.0 25357.00 25357.00
mean 79017.94 1945.34 1462.25 ... 0.2 0.17 0.54
std 59655.02 27.88 613.12 ... 0.4 0.38 0.28
min 2000.00 1835.00 120.00 ... 0.0 0.00 0.01
25% 41900.00 1924.00 1070.00 ... 0.0 0.00 0.35
50% 65500.00 1950.00 1318.00 ... 0.0 0.00 0.49
75% 97000.00 1964.00 1682.00 ... 0.0 0.00 0.75
max 875000.00 1998.00 7616.00 ... 1.0 1.00 1.64
[8 rows x 20 columns]
Code
# Análisis exploratorio para datos espaciales (ESDA)
# Mapa básico
'price', legend = True)
house.plot( plt.show()
Code
# Vecinos geográficos y matriz de pesos espaciales
= weights.KNN.from_dataframe(house, k=6)
Wnb6nn # Estandarización por filas (Row-standardized W)
= 'r'
Wnb6nn.transform
plot_spatial_weights(Wnb6nn, house) plt.show()
Code
# Estadísticos de autocorrelación espacial
# Dependencia espacial global
= Moran(house['price'], Wnb6nn)
globalMoran print(globalMoran.I)
0.7777478528962092
Code
print(globalMoran.p_sim)
0.001
Code
=True, zstandard=True)
moran_scatterplot(globalMoran, aspect_equal plt.show()
Code
# Dependencia espacial local
= Moran_Local(house['price'], Wnb6nn, permutations = 999, seed=12345)
localMoran localMoran.p_sim
array([0.018, 0.003, 0.005, ..., 0.312, 0.119, 0.046])
Code
=0.05, zstandard =False) # Nota: p-valor=0.05
moran_scatterplot(localMoran, p plt.show()
Code
=0.05)
lisa_cluster(localMoran, house, p plt.show()
Code
# Modelos econométricos espaciales
# Operaciones con variables
'l_price']=house['price'].map(lambda x:np.log(x))
house['l_lotsize']=house['lotsize'].map(lambda x:np.log(x))
house[= ['age', 'l_lotsize', 'rooms']
variable_names # Modelo lineal (LM)
= spreg.OLS(
m1_LM # Var. dep.
'l_price']].values,
house[[# Var. expl.
house[variable_names].values,#
='l_price',
name_y#
=variable_names
name_x
)print(m1_LM.summary)
REGRESSION RESULTS
------------------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set : unknown
Weights matrix : None
Dependent Variable : l_price Number of Observations: 25357
Mean dependent var : 11.0203 Number of Variables : 4
S.D. dependent var : 0.7629 Degrees of Freedom : 25353
R-squared : 0.6057
Adjusted R-squared : 0.6057
Sum squared residual: 5818.9 F-statistic : 12983.3074
Sigma-square : 0.230 Prob(F-statistic) : 0
S.E. of regression : 0.479 Log likelihood : -17317.995
Sigma-square ML : 0.229 Akaike info criterion : 34643.990
S.E of regression ML: 0.4790 Schwarz criterion : 34676.554
------------------------------------------------------------------------------------
Variable Coefficient Std.Error t-Statistic Probability
------------------------------------------------------------------------------------
CONSTANT 8.79359 0.04297 204.64963 0.00000
age -1.47572 0.01225 -120.47428 0.00000
l_lotsize 0.24407 0.00440 55.53175 0.00000
rooms 0.13536 0.00238 56.79125 0.00000
------------------------------------------------------------------------------------
REGRESSION DIAGNOSTICS
MULTICOLLINEARITY CONDITION NUMBER 37.126
TEST ON NORMALITY OF ERRORS
TEST DF VALUE PROB
Jarque-Bera 2 14610.253 0.0000
DIAGNOSTICS FOR HETEROSKEDASTICITY
RANDOM COEFFICIENTS
TEST DF VALUE PROB
Breusch-Pagan test 3 3911.303 0.0000
Koenker-Bassett test 3 1566.036 0.0000
================================ END OF REPORT =====================================
Code
# Modelo con retardo espacial (SLM)
= spreg.GM_Lag(
m2_SLM # Var. dep.
'l_price']].values,
house[[# Var. expl.
house[variable_names].values,# Matriz W
=Wnb6nn,
w#
='l_price',
name_y#
=variable_names
name_x
)print(m2_SLM.summary)
REGRESSION RESULTS
------------------
SUMMARY OF OUTPUT: SPATIAL TWO STAGE LEAST SQUARES
--------------------------------------------------
Data set : unknown
Weights matrix : unknown
Dependent Variable : l_price Number of Observations: 25357
Mean dependent var : 11.0203 Number of Variables : 5
S.D. dependent var : 0.7629 Degrees of Freedom : 25352
Pseudo R-squared : 0.8023
Spatial Pseudo R-squared: 0.6334
------------------------------------------------------------------------------------
Variable Coefficient Std.Error z-Statistic Probability
------------------------------------------------------------------------------------
CONSTANT 4.07422 0.08382 48.60492 0.00000
age -0.75265 0.01481 -50.81376 0.00000
l_lotsize 0.11664 0.00378 30.85158 0.00000
rooms 0.09286 0.00184 50.41191 0.00000
W_l_price 0.52042 0.00860 60.50456 0.00000
------------------------------------------------------------------------------------
Instrumented: W_l_price
Instruments: W_age, W_l_lotsize, W_rooms
================================ END OF REPORT =====================================
Code
# Modelo con errores espaciales (SEM)
= spreg.GM_Error(
m3_SEM # Var. dep.
'l_price']].values,
house[[# Var. expl.
house[variable_names].values,# Matriz W
=Wnb6nn,
w#
='l_price',
name_y#
=variable_names
name_x
)print(m3_SEM.summary)
REGRESSION RESULTS
------------------
SUMMARY OF OUTPUT: GM SPATIALLY WEIGHTED LEAST SQUARES
------------------------------------------------------
Data set : unknown
Weights matrix : unknown
Dependent Variable : l_price Number of Observations: 25357
Mean dependent var : 11.0203 Number of Variables : 4
S.D. dependent var : 0.7629 Degrees of Freedom : 25353
Pseudo R-squared : 0.5976
------------------------------------------------------------------------------------
Variable Coefficient Std.Error z-Statistic Probability
------------------------------------------------------------------------------------
CONSTANT 8.81540 0.04612 191.15257 0.00000
age -0.86875 0.01310 -66.30164 0.00000
l_lotsize 0.22664 0.00488 46.48027 0.00000
rooms 0.10273 0.00193 53.31384 0.00000
lambda 0.68971
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================
Code
# Modelo combinado (SAC -> SLM + SEM)
= spreg.GM_Combo(
m4_SAC # Var. dep.
'l_price']].values,
house[[# Var. expl.
house[variable_names].values,# Matriz W
=Wnb6nn,
w#
='l_price',
name_y#
=variable_names
name_x
)print(m4_SAC.summary)
REGRESSION RESULTS
------------------
SUMMARY OF OUTPUT: SPATIALLY WEIGHTED 2SLS - GM-COMBO MODEL
-----------------------------------------------------------
Data set : unknown
Weights matrix : unknown
Dependent Variable : l_price Number of Observations: 25357
Mean dependent var : 11.0203 Number of Variables : 5
S.D. dependent var : 0.7629 Degrees of Freedom : 25352
Pseudo R-squared : 0.8013
Spatial Pseudo R-squared: 0.6295
------------------------------------------------------------------------------------
Variable Coefficient Std.Error z-Statistic Probability
------------------------------------------------------------------------------------
CONSTANT 3.90758 0.08966 43.58021 0.00000
age -0.70573 0.01410 -50.04807 0.00000
l_lotsize 0.13446 0.00439 30.62860 0.00000
rooms 0.09670 0.00186 51.96693 0.00000
W_l_price 0.51653 0.00894 57.75729 0.00000
lambda 0.35939
------------------------------------------------------------------------------------
Instrumented: W_l_price
Instruments: W_age, W_l_lotsize, W_rooms
================================ END OF REPORT =====================================