Aplicación 3.7: Autocorrelación espacial

Geometría de polígonos: Estadísticas ‘morales’ en Francia en 1830

En esta aplicación se analizarán con técnicas modernas de análisis estadístico espacial algunas de las cuestiones abordadas en el estudio fundacional de las ciencias sociales de André-Michel Guerry sobre la delincuencia, el suicidio, la alfabetización y otras “estadísticas morales” en la Francia de 1830.

Referencia original: Guerry, A.M. (1833). “Essai sur la statistique morale de la France”, París: Crochard.

Fuente de los datos: https://geodacenter.github.io/data-and-lab/Guerry/

Información complementaria: https://www.datavis.ca/gallery/guerry/

Para una introducción asequible al análisis de datos espaciales puede consultarse el libro de Moraga, P. (2023): “Spatial Statistics for Data Science: Theory and Practice with R”, Chapman & Hall/CRC: https://www.paulamoraga.com/book-spatial/.

Para una revisión completa sobre los principales métodos y técnicas utilizados en ciencia de datos espaciales en R y Python las referencias básicas son los libros de Pebesma, E. y Bivand, R. (2023): “Spatial Data Science: With Applications in R”, Chapman & Hall/CRC: https://r-spatial.org/book/, y Rey, S., Arribas-Bel, D. y Wolf, L. (2023): “Geographic Data Science with Python”, Chapman & Hall/CRC: https://geographicdata.science/book/.

Code

# Lectura de librerías
library(tidyverse)
library(sf) # https://r-spatial.github.io/sf/
# Lectura de datos
Guerry <- st_read("data/GUERRY_FRANCE.shp")

Reading layer `GUERRY_FRANCE' from data source 
  `/Users/JulianRamajo/Documentos/R-Python/EcMtRPy-Apps/data/GUERRY_FRANCE.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 85 features and 29 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 47680 ymin: 1703258 xmax: 1031401 ymax: 2677441
Projected CRS: NTF (Paris) / Lambert zone II

Code

class(Guerry)

[1] "sf"         "data.frame"

Code

str(Guerry)

Classes 'sf' and 'data.frame':  85 obs. of  30 variables:
 $ CODE_DE  : chr  "01" "02" "03" "04" ...
 $ COUNT    : num  1 1 1 1 1 1 1 1 1 1 ...
 $ AVE_ID_  : num  49 812 1418 1603 1802 ...
 $ dept     : num  1 2 3 4 5 7 8 9 10 11 ...
 $ Region   : chr  "E" "N" "C" "E" ...
 $ Dprtmnt  : chr  "Ain" "Aisne" "Allier" "Basses-Alpes" ...
 $ Crm_prs  : num  28870 26226 26747 12935 17488 ...
 $ Crm_prp  : num  15890 5521 7925 7289 8174 ...
 $ Litercy  : num  37 51 13 46 69 27 67 18 59 34 ...
 $ Donatns  : num  5098 8901 10973 2733 6962 ...
 $ Infants  : num  33120 14572 17044 23018 23076 ...
 $ Suicids  : num  35039 12831 114121 14238 16171 ...
 $ MainCty  : num  2 2 2 1 1 1 2 1 2 2 ...
 $ Wealth   : num  73 22 61 76 83 84 33 72 14 17 ...
 $ Commerc  : num  58 10 66 49 65 1 4 60 3 35 ...
 $ Clergy   : num  11 82 68 5 10 28 50 39 42 15 ...
 $ Crm_prn  : num  71 4 46 70 22 76 53 74 77 80 ...
 $ Infntcd  : num  60 82 42 12 23 47 85 28 54 35 ...
 $ Dntn_cl  : num  69 36 76 37 64 67 49 63 9 27 ...
 $ Lottery  : num  41 38 66 80 79 70 31 75 28 50 ...
 $ Desertn  : num  55 82 16 32 35 19 62 22 86 63 ...
 $ Instrct  : num  46 24 85 29 7 62 9 77 15 48 ...
 $ Prsttts  : num  13 327 34 2 1 1 83 3 207 1 ...
 $ Distanc  : num  218.4 65.9 161.9 351.4 320.3 ...
 $ Area     : num  5762 7369 7340 6925 5549 ...
 $ Pop1831  : num  346 513 298 156 129 ...
 $ TopCrm   : num  1 1 1 0 0 0 1 0 0 0 ...
 $ TopLit   : num  0 1 0 1 1 0 1 0 1 0 ...
 $ TopWealth: num  1 0 1 1 1 1 0 1 0 0 ...
 $ geometry :sfc_MULTIPOLYGON of length 85; first list element: List of 1
  ..$ :List of 1
  .. ..$ : num [1:498, 1:2] 801150 800669 800688 800780 800589 ...
  ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
  ..- attr(*, "names")= chr [1:29] "CODE_DE" "COUNT" "AVE_ID_" "dept" ...

Code

# Análisis exploratorio básico (EDA)
summary(Guerry)

   CODE_DE              COUNT          AVE_ID_           dept      
 Length:85          Min.   :1.000   Min.   :   49   Min.   : 1.00  
 Class :character   1st Qu.:1.000   1st Qu.: 8612   1st Qu.:24.00  
 Mode  :character   Median :1.000   Median :16732   Median :45.00  
                    Mean   :1.059   Mean   :17701   Mean   :45.08  
                    3rd Qu.:1.000   3rd Qu.:26842   3rd Qu.:66.00  
                    Max.   :4.000   Max.   :36521   Max.   :89.00  
    Region            Dprtmnt             Crm_prs         Crm_prp     
 Length:85          Length:85          Min.   : 5883   Min.   : 1368  
 Class :character   Class :character   1st Qu.:14790   1st Qu.: 5990  
 Mode  :character   Mode  :character   Median :18785   Median : 7624  
                                       Mean   :19961   Mean   : 7881  
                                       3rd Qu.:26221   3rd Qu.: 9190  
                                       Max.   :37014   Max.   :20235  
    Litercy         Donatns         Infants         Suicids          MainCty 
 Min.   :12.00   Min.   : 1246   Min.   : 2660   Min.   :  3460   Min.   :1  
 1st Qu.:25.00   1st Qu.: 3446   1st Qu.:14281   1st Qu.: 15400   1st Qu.:2  
 Median :38.00   Median : 4964   Median :17044   Median : 26198   Median :2  
 Mean   :39.14   Mean   : 6723   Mean   :18983   Mean   : 36517   Mean   :2  
 3rd Qu.:52.00   3rd Qu.: 9242   3rd Qu.:21981   3rd Qu.: 45180   3rd Qu.:2  
 Max.   :74.00   Max.   :27830   Max.   :62486   Max.   :163241   Max.   :3  
     Wealth         Commerc          Clergy         Crm_prn         Infntcd  
 Min.   : 1.00   Min.   : 1.00   Min.   : 2.00   Min.   : 1.00   Min.   : 1  
 1st Qu.:22.00   1st Qu.:21.00   1st Qu.:23.00   1st Qu.:22.00   1st Qu.:23  
 Median :44.00   Median :42.00   Median :44.00   Median :43.00   Median :44  
 Mean   :43.58   Mean   :42.33   Mean   :43.93   Mean   :43.06   Mean   :44  
 3rd Qu.:65.00   3rd Qu.:63.00   3rd Qu.:65.00   3rd Qu.:64.00   3rd Qu.:65  
 Max.   :86.00   Max.   :86.00   Max.   :86.00   Max.   :86.00   Max.   :86  
    Dntn_cl         Lottery         Desertn         Instrct     
 Min.   : 1.00   Min.   : 1.00   Min.   : 1.00   Min.   : 1.00  
 1st Qu.:22.00   1st Qu.:22.00   1st Qu.:23.00   1st Qu.:23.00  
 Median :43.00   Median :43.00   Median :44.00   Median :42.00  
 Mean   :43.02   Mean   :43.04   Mean   :43.91   Mean   :43.34  
 3rd Qu.:64.00   3rd Qu.:64.00   3rd Qu.:65.00   3rd Qu.:65.00  
 Max.   :86.00   Max.   :86.00   Max.   :86.00   Max.   :86.00  
    Prsttts          Distanc           Area          Pop1831     
 Min.   :   0.0   Min.   :  0.0   Min.   :  762   Min.   :129.1  
 1st Qu.:   6.0   1st Qu.:119.7   1st Qu.: 5361   1st Qu.:283.8  
 Median :  34.0   Median :199.2   Median : 6040   Median :346.3  
 Mean   : 143.5   Mean   :204.1   Mean   : 6117   Mean   :380.8  
 3rd Qu.: 117.0   3rd Qu.:283.8   3rd Qu.: 6815   3rd Qu.:445.2  
 Max.   :4744.0   Max.   :403.4   Max.   :10000   Max.   :989.9  
     TopCrm           TopLit         TopWealth               geometry 
 Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   MULTIPOLYGON :85  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   epsg:27572   : 0  
 Median :0.0000   Median :0.0000   Median :0.0000   +proj=lcc ...: 0  
 Mean   :0.3294   Mean   :0.3294   Mean   :0.3294                     
 3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000                     
 Max.   :1.0000   Max.   :1.0000   Max.   :1.0000

Code

# Gráficas: representación de datos espaciales
# Mapa de cloropletas
plot(Guerry['Crm_prs'])

Code

# Mapa interactivo
library(mapview)
mapview(Guerry, zcol = "Crm_prs")

Code

# ESTADÍSTICA ESPACIAL: ANÁLISIS EXPLORATORIO DE DATOS ESPACIALES (ESDA)
# Librería spdep (https://r-spatial.github.io/spdep/)
# [Ver también la librería sfdep: https://sfdep.josiahparry.com/]
library(spdep)
# Matriz de pesos espaciales 
# (www.paulamoraga.com/book-spatial/spatial-neighborhood-matrices.html)
# Cálculo de los vecinos de cada región
nb <- spdep::poly2nb(Guerry, queen = TRUE)
nb

Neighbour list object:
Number of regions: 85 
Number of nonzero links: 420 
Percentage nonzero weights: 5.813149 
Average number of links: 4.941176

Code

plot(st_geometry(Guerry), border = "lightgrey")
plot.nb(nb, st_geometry(Guerry), add = TRUE)

Code

# ¿Cómo funciona al nivel individual la lista nb?
head(nb)

[[1]]
[1] 36 37 67 69

[[2]]
[1]  7 49 57 58 73 76

[[3]]
[1] 17 21 40 56 61 69

[[4]]
[1]  5 24 79 80

[[5]]
[1]  4 24 36

[[6]]
[1] 24 28 36 40 41 46 80

Code

id <- 1 # id de la región
Guerry$neighbors <- "Otros"
Guerry$neighbors[id] <- "Región"
Guerry$neighbors[nb[[id]]] <- "Vecinos"
ggplot(Guerry) + geom_sf(aes(fill = neighbors)) + theme_bw() +
    scale_fill_manual(values = c("white","gray30", "gray"))

Code

# Construcción de la matriz W
Wnb <- spdep::nb2listw(nb, style = "W")
Wnb

Characteristics of weights list object:
Neighbour list object:
Number of regions: 85 
Number of nonzero links: 420 
Percentage nonzero weights: 5.813149 
Average number of links: 4.941176 

Weights style: W 
Weights constants summary:
   n   nn S0      S1       S2
W 85 7225 85 37.2761 347.6683

Code

head(Wnb$weights)

[[1]]
[1] 0.25 0.25 0.25 0.25

[[2]]
[1] 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667

[[3]]
[1] 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667

[[4]]
[1] 0.25 0.25 0.25 0.25

[[5]]
[1] 0.3333333 0.3333333 0.3333333

[[6]]
[1] 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571

Code

m <- listw2mat(Wnb)
lattice::levelplot(t(m))

Code

# Cálculo de los estadísticos de autocorrelación espacial
# I de Moran global
moran.plot(Guerry$Crm_prs, Wnb)

Code

gmoran <- moran.test(Guerry$Crm_prs, Wnb,
                     alternative = "greater")
gmoran


    Moran I test under randomisation

data:  Guerry$Crm_prs  
weights: Wnb    

Moran I statistic standard deviate = 6.0484, p-value = 7.316e-10
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.411459718      -0.011904762       0.004899501

Code

gmoranMC <- moran.mc(Guerry$Crm_prs, Wnb, nsim = 999)
gmoranMC


    Monte-Carlo simulation of Moran I

data:  Guerry$Crm_prs 
weights: Wnb  
number of simulations + 1: 1000 

statistic = 0.41146, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater

Code

hist(gmoranMC$res)
abline(v = gmoranMC$statistic, col = "red")

Code

# I de Moran local (LISAs)
lmoran <- localmoran(Guerry$Crm_prs, Wnb, alternative = "greater")
head(lmoran)

         Ii         E.Ii     Var.Ii      Z.Ii Pr(z > E(Ii))
1 0.5222645 -0.017946035 0.36097297 0.8991367   0.184289928
2 0.8280165 -0.008874730 0.11710297 2.4455980   0.007230607
3 0.8035400 -0.010412142 0.13717616 2.1976551   0.013986846
4 0.7418897 -0.011161277 0.22605294 1.5838690   0.056611763
5 0.2311872 -0.001382714 0.03818002 1.1902425   0.116975548
6 0.8389098 -0.024865805 0.27314951 1.6527255   0.049193373

Code

tail(lmoran)

             Ii          E.Ii      Var.Ii        Z.Ii Pr(z > E(Ii))
80  0.895182573 -0.0092176169 0.121585312  2.59370206   0.004747436
81  0.025800751 -0.0001695898 0.003472936  0.44068598   0.329720173
82 -0.330384096 -0.0055421769 0.073375477 -1.19921375   0.884777578
83 -0.310472511 -0.0031036125 0.041190939 -1.51446442   0.935045954
84  0.001292507 -0.0002866396 0.003815012  0.02556668   0.489801481
85 -0.126709946 -0.0008641164 0.013969946 -1.06473485   0.856502032

Code

# Mapas interactivos
library(tmap)
tmap_mode("view")
Guerry$lmI <- lmoran[, "Ii"]
Guerry$lmZ <- lmoran[, "Z.Ii"]
Guerry$lmpval <- lmoran[, "Pr(z > E(Ii))"]
p1 <- tm_shape(Guerry) +
  tm_polygons(col = "Crm_prs", title = "Crm_prs", style = "quantile") +
  tm_layout(legend.outside = TRUE)
p2 <- tm_shape(Guerry) +
  tm_polygons(col = "lmI", title = "I de Moran local",
              style = "quantile") +
  tm_layout(legend.outside = TRUE)
p3 <- tm_shape(Guerry) +
  tm_polygons(col = "lmZ", title = "Z-score",
              breaks = c(-Inf, 1.65, Inf)) +
  tm_layout(legend.outside = TRUE)
p4 <- tm_shape(Guerry) +
  tm_polygons(col = "lmpval", title = "p-valor",
              breaks = c(-Inf, 0.05, Inf)) +
  tm_layout(legend.outside = TRUE)
tmap_arrange(p1, p2, p3, p4)

Code

tm_shape(Guerry) + tm_polygons(col = "lmZ",
title = "I de Moran local", style = "fixed",
breaks = c(-Inf, -1.96, 1.96, Inf),
labels = c("SAC negativa", "SAC no significativa", "SAC positiva"),
palette =  c("blue", "white", "red")) +
tm_layout(legend.outside = TRUE)

Code

lmoran2 <- localmoran(Guerry$Crm_prs, Wnb, alternative = "two.sided")
head(lmoran2)

         Ii         E.Ii     Var.Ii      Z.Ii Pr(z != E(Ii))
1 0.5222645 -0.017946035 0.36097297 0.8991367     0.36857986
2 0.8280165 -0.008874730 0.11710297 2.4455980     0.01446121
3 0.8035400 -0.010412142 0.13717616 2.1976551     0.02797369
4 0.7418897 -0.011161277 0.22605294 1.5838690     0.11322353
5 0.2311872 -0.001382714 0.03818002 1.1902425     0.23395110
6 0.8389098 -0.024865805 0.27314951 1.6527255     0.09838675

Code

tail(lmoran2)

             Ii          E.Ii      Var.Ii        Z.Ii Pr(z != E(Ii))
80  0.895182573 -0.0092176169 0.121585312  2.59370206    0.009494873
81  0.025800751 -0.0001695898 0.003472936  0.44068598    0.659440345
82 -0.330384096 -0.0055421769 0.073375477 -1.19921375    0.230444843
83 -0.310472511 -0.0031036125 0.041190939 -1.51446442    0.129908092
84  0.001292507 -0.0002866396 0.003815012  0.02556668    0.979602961
85 -0.126709946 -0.0008641164 0.013969946 -1.06473485    0.286995936

Code

Guerry$lmpval2 <- lmoran2[, 5] # columna 5 contiene p-valores
mp <- moran.plot(as.vector(scale(Guerry$Crm_prs)), Wnb)

Code

# Mapa LISA
Guerry$quadrant <- NA
# Cuadrante HH (high-high)
Guerry[(mp$x >= 0 & mp$wx >= 0) & (Guerry$lmpval2 <= 0.05), "quadrant"]<- 1
# Cuadrante LL (low-low)
Guerry[(mp$x <= 0 & mp$wx <= 0) & (Guerry$lmpval2 <= 0.05), "quadrant"]<- 2
# Cuadrante HL (high-low)
Guerry[(mp$x >= 0 & mp$wx <= 0) & (Guerry$lmpval2 <= 0.05), "quadrant"]<- 3
# Cuadrante LH (low-high)
Guerry[(mp$x <= 0 & mp$wx >= 0) & (Guerry$lmpval2 <= 0.05), "quadrant"]<- 4
# Cuadrante NS (no significavo)
Guerry[(Guerry$lmpval2 > 0.05), "quadrant"] <- 5
#
tm_shape(Guerry) + tm_fill(col = "quadrant", title = "",
breaks = c(1, 2, 3, 4, 5, 6),
palette =  c("red", "blue", "lightpink", "skyblue2", "white"),
labels = c("HH", "LL", "HL",
           "LH", "NS")) +
tm_legend(text.size = 1)  + tm_borders(alpha = 0.5) +
tm_layout(frame = FALSE,  title = "Clusters")  +
tm_layout(legend.outside = TRUE)

Code

# ECONOMETRÍA ESPACIAL: ANÁLISIS CONFIRMATORIO DE DATOS ESPACIALES
#
# Modelos econométricos para explicar la variable Crm_prs
#
# Regresión estándar (modelo lineal)
Crm_LM <- lm(Crm_prs ~ Region + Litercy + Donatns + Infants + Suicids,
              data = Guerry)
summary(Crm_LM)


Call:
lm(formula = Crm_prs ~ Region + Litercy + Donatns + Infants + 
    Suicids, data = Guerry)

Residuals:
     Min       1Q   Median       3Q      Max 
-11260.7  -4301.5   -571.8   3783.8  14837.0 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.641e+04  3.425e+03   7.709 3.97e-11 ***
RegionE     -3.068e+02  2.876e+03  -0.107    0.915    
RegionN      2.469e+03  2.804e+03   0.881    0.381    
RegionS     -1.090e+04  2.260e+03  -4.822 7.14e-06 ***
RegionW      3.813e+02  2.290e+03   0.167    0.868    
Litercy     -8.997e+01  6.347e+01  -1.417    0.160    
Donatns     -1.896e-01  1.591e-01  -1.191    0.237    
Infants      4.693e-03  9.099e-02   0.052    0.959    
Suicids     -1.814e-03  2.480e-02  -0.073    0.942    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6196 on 76 degrees of freedom
Multiple R-squared:  0.348, Adjusted R-squared:  0.2794 
F-statistic: 5.071 on 8 and 76 DF,  p-value: 4.542e-05

Code

# Regresiones espaciales
# Libraria spatialreg (https://r-spatial.github.io/spatialreg/)
library(spatialreg)
# Retardo espacial de la variable dependiente (SLM) [estimación S2SLS]
Crm_SLM <- stsls(Crm_prs ~ Region + Litercy + Donatns + Infants + Suicids,
                 data = Guerry,
                 listw = Wnb)
summary(Crm_SLM)


Call:stsls(formula = Crm_prs ~ Region + Litercy + Donatns + Infants + 
    Suicids, data = Guerry, listw = Wnb)

Residuals:
      Min        1Q    Median        3Q       Max 
-11360.90  -3999.21   -609.53   3214.41  15225.47 

Coefficients: 
               Estimate  Std. Error t value Pr(>|t|)
Rho          7.5420e-01  2.5326e-01  2.9779 0.002902
(Intercept)  8.0208e+03  6.9631e+03  1.1519 0.249359
RegionE     -7.8425e+02  2.7086e+03 -0.2895 0.772166
RegionN      7.2377e+01  2.7564e+03  0.0263 0.979052
RegionS     -5.1597e+03  2.8685e+03 -1.7988 0.072057
RegionW      3.8961e+02  2.1527e+03  0.1810 0.856382
Litercy     -3.0311e+01  6.2948e+01 -0.4815 0.630148
Donatns     -2.2687e-01  1.5015e-01 -1.5109 0.130809
Infants      6.3493e-03  8.5547e-02  0.0742 0.940836
Suicids      7.0054e-03  2.3501e-02  0.2981 0.765637

Residual variance (sigma squared): 33939000, (sigma: 5825.7)
Anselin-Kelejian (1997) test for residual autocorrelation
test value: 2.3847, p-value: 0.12253

Code

# Retardo espacial de los errores (SEM) [estimación GMM]
Crm_SEM <- GMerrorsar(Crm_prs ~ Region + Litercy + Donatns + Infants + Suicids,
                      data =Guerry,
                      listw = Wnb)
summary(Crm_SEM)


Call:GMerrorsar(formula = Crm_prs ~ Region + Litercy + Donatns + Infants + 
    Suicids, data = Guerry, listw = Wnb)

Residuals:
      Min        1Q    Median        3Q       Max 
-11791.83  -4256.49   -222.09   4263.67  15003.03 

Type: GM SAR estimator
Coefficients: (GM standard errors) 
               Estimate  Std. Error z value  Pr(>|z|)
(Intercept)  2.5396e+04  3.4824e+03  7.2927 3.038e-13
RegionE     -1.0359e+03  2.9313e+03 -0.3534    0.7238
RegionN      1.0069e+03  2.8768e+03  0.3500    0.7263
RegionS     -1.0528e+04  2.4549e+03 -4.2885 1.799e-05
RegionW     -1.8482e+02  2.4744e+03 -0.0747    0.9405
Litercy     -6.0066e+01  6.2947e+01 -0.9542    0.3400
Donatns     -1.8220e-01  1.5174e-01 -1.2007    0.2299
Infants      1.2229e-02  8.3868e-02  0.1458    0.8841
Suicids     -2.2536e-03  2.4192e-02 -0.0932    0.9258

Lambda: 0.26267 (standard error): 0.4347 (z-value): 0.60427
Residual variance (sigma squared): 32698000, (sigma: 5718.2)
GM argmin sigma squared: 32424000
Number of observations: 85 
Number of parameters estimated: 11

Code

# Modelo combinado (SAC -> SLM + SEM) [estimación GS2SLS]
Crm_SAC <- gstsls(Crm_prs ~ Region + Litercy + Donatns + Infants + Suicids,
                      data =Guerry,
                      listw = Wnb)
summary(Crm_SAC)


Call:gstsls(formula = Crm_prs ~ Region + Litercy + Donatns + Infants + 
    Suicids, data = Guerry, listw = Wnb)

Residuals:
      Min        1Q    Median        3Q       Max 
-10239.24  -4103.09   -218.59   3323.59  14042.87 

Type: GM SARAR estimator
Coefficients: (GM standard errors) 
               Estimate  Std. Error z value  Pr(>|z|)
Rho_Wy       8.9782e-01  1.7828e-01  5.0359 4.756e-07
(Intercept)  5.6147e+03  5.0051e+03  1.1218   0.26195
RegionE      4.2234e+02  1.8599e+03  0.2271   0.82036
RegionN      1.2128e+03  1.8975e+03  0.6392   0.52272
RegionS     -3.0187e+03  2.1041e+03 -1.4347   0.15137
RegionW      8.3674e+02  1.3890e+03  0.6024   0.54692
Litercy     -5.6247e+01  4.5281e+01 -1.2422   0.21417
Donatns     -2.1237e-01  1.1573e-01 -1.8351   0.06649
Infants     -1.5936e-02  6.9529e-02 -0.2292   0.81871
Suicids      6.8299e-03  1.6968e-02  0.4025   0.68730

Lambda: -0.60748
Residual variance (sigma squared): 24576000, (sigma: 4957.4)
GM argmin sigma squared: 25561000
Number of observations: 85 
Number of parameters estimated: 12

Code

# Versiones estimadas por máxima verosimilitud
Crm_SLM_2 <- lagsarlm(Crm_prs ~ Region + Litercy + Donatns + Infants + Suicids,
                 data = Guerry,
                 listw = Wnb)
summary(Crm_SLM_2)


Call:lagsarlm(formula = Crm_prs ~ Region + Litercy + Donatns + Infants + 
    Suicids, data = Guerry, listw = Wnb)

Residuals:
      Min        1Q    Median        3Q       Max 
-9815.314 -4306.668   -55.825  4260.298 15031.038 

Type: lag 
Coefficients: (numerical Hessian approximate standard errors) 
               Estimate  Std. Error z value Pr(>|z|)
(Intercept)  1.7222e+04  4.6919e+03  3.6706 0.000242
RegionE     -5.4531e+02  2.7258e+03 -0.2001 0.841439
RegionN      1.2718e+03  2.7965e+03  0.4548 0.649259
RegionS     -8.0315e+03  2.2666e+03 -3.5434 0.000395
RegionW      3.8544e+02  2.0742e+03  0.1858 0.852581
Litercy     -6.0168e+01  5.6138e+01 -1.0718 0.283815
Donatns     -2.0823e-01  1.4136e-01 -1.4730 0.140757
Infants      5.5205e-03  5.5539e-02  0.0994 0.920823
Suicids      2.5915e-03  2.3303e-02  0.1112 0.911451

Rho: 0.37673, LR test value: 6.9655, p-value: 0.0083095
Approximate (numerical Hessian) standard error: 0.13328
    z-value: 2.8265, p-value: 0.0047054
Wald statistic: 7.9893, p-value: 0.0047054

Log likelihood: -854.5647 for lag model
ML residual variance (sigma squared): 30596000, (sigma: 5531.3)
Number of observations: 85 
Number of parameters estimated: 11 
AIC: 1731.1, (AIC for lm: 1736.1)

Code

Crm_SEM_2 <- errorsarlm(Crm_prs ~  Region + Litercy + Donatns + Infants + Suicids,
                       data = Guerry,
                       listw = Wnb,
                       Durbin = FALSE)
summary(Crm_SEM_2)


Call:errorsarlm(formula = Crm_prs ~ Region + Litercy + Donatns + Infants + 
    Suicids, data = Guerry, listw = Wnb, Durbin = FALSE)

Residuals:
      Min        1Q    Median        3Q       Max 
-10770.31  -4238.89   -658.43   3347.89  15369.79 

Type: error 
Coefficients: (asymptotic standard errors) 
               Estimate  Std. Error z value  Pr(>|z|)
(Intercept)  2.5012e+04  3.5727e+03  7.0008 2.544e-12
RegionE     -1.2123e+03  2.9929e+03 -0.4051    0.6854
RegionN      4.8363e+02  2.9448e+03  0.1642    0.8695
RegionS     -1.0329e+04  2.5682e+03 -4.0219 5.773e-05
RegionW     -3.7700e+02  2.5848e+03 -0.1459    0.8840
Litercy     -4.9483e+01  6.3640e+01 -0.7775    0.4368
Donatns     -1.8187e-01  1.5155e-01 -1.2001    0.2301
Infants      1.3675e-02  8.2765e-02  0.1652    0.8688
Suicids     -1.8978e-03  2.4299e-02 -0.0781    0.9377

Lambda: 0.34269, LR test value: 3.6061, p-value: 0.057569
Approximate (numerical Hessian) standard error: 0.16651
    z-value: 2.058, p-value: 0.039586
Wald statistic: 4.2356, p-value: 0.039586

Log likelihood: -856.2444 for error model
ML residual variance (sigma squared): 32024000, (sigma: 5659)
Number of observations: 85 
Number of parameters estimated: 11 
AIC: 1734.5, (AIC for lm: 1736.1)

Code

Crm_SAC_2 <- sacsarlm(Crm_prs ~ Region + Litercy + Donatns + Infants + Suicids,
                      data = Guerry,
                      listw = Wnb)
summary(Crm_SAC_2)


Call:sacsarlm(formula = Crm_prs ~ Region + Litercy + Donatns + Infants + 
    Suicids, data = Guerry, listw = Wnb)

Residuals:
     Min       1Q   Median       3Q      Max 
-9518.93 -4094.65  -415.92  3955.99 14639.10 

Type: sac 
Coefficients: (numerical Hessian approximate standard errors) 
               Estimate  Std. Error z value Pr(>|z|)
(Intercept)  1.4668e+04  4.8636e+03  3.0159 0.002562
RegionE     -5.6833e-01         NaN     NaN      NaN
RegionN      1.7593e+03  1.4607e+03  1.2044 0.228419
RegionS     -6.8264e+03  2.1260e+03 -3.2108 0.001323
RegionW      6.9736e+02  1.4631e+03  0.4766 0.633613
Litercy     -6.9346e+01  2.3895e+01 -2.9021 0.003707
Donatns     -2.1431e-01  1.3310e-01 -1.6102 0.107357
Infants     -3.1417e-03         NaN     NaN      NaN
Suicids      4.3722e-03  1.9408e-02  0.2253 0.821758

Rho: 0.5054
Approximate (numerical Hessian) standard error: 0.17131
    z-value: 2.9502, p-value: 0.0031756
Lambda: -0.2419
Approximate (numerical Hessian) standard error: 0.27179
    z-value: -0.89001, p-value: 0.37346

LR test value: 7.5349, p-value: 0.023111

Log likelihood: -854.28 for sac model
ML residual variance (sigma squared): 29141000, (sigma: 5398.3)
Number of observations: 85 
Number of parameters estimated: 12 
AIC: 1732.6, (AIC for lm: 1736.1)

Code

# Lectura de librerías
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import geopandas as gpd
#Lectura de datos
Guerry  = gpd.read_file('data/GUERRY_FRANCE.shp')
Guerry.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 85 entries, 0 to 84
Data columns (total 30 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   CODE_DE    85 non-null     object  
 1   COUNT      85 non-null     float64 
 2   AVE_ID_    85 non-null     float64 
 3   dept       85 non-null     int64   
 4   Region     85 non-null     object  
 5   Dprtmnt    85 non-null     object  
 6   Crm_prs    85 non-null     int64   
 7   Crm_prp    85 non-null     int64   
 8   Litercy    85 non-null     int64   
 9   Donatns    85 non-null     int64   
 10  Infants    85 non-null     int64   
 11  Suicids    85 non-null     int64   
 12  MainCty    85 non-null     int64   
 13  Wealth     85 non-null     int64   
 14  Commerc    85 non-null     int64   
 15  Clergy     85 non-null     int64   
 16  Crm_prn    85 non-null     int64   
 17  Infntcd    85 non-null     int64   
 18  Dntn_cl    85 non-null     int64   
 19  Lottery    85 non-null     int64   
 20  Desertn    85 non-null     int64   
 21  Instrct    85 non-null     int64   
 22  Prsttts    85 non-null     int64   
 23  Distanc    85 non-null     float64 
 24  Area       85 non-null     int64   
 25  Pop1831    85 non-null     float64 
 26  TopCrm     85 non-null     float64 
 27  TopLit     85 non-null     float64 
 28  TopWealth  85 non-null     float64 
 29  geometry   85 non-null     geometry
dtypes: float64(7), geometry(1), int64(19), object(3)
memory usage: 20.0+ KB

Code

Guerry

   CODE_DE  COUNT  ...  TopWealth                                           geometry
0       01    1.0  ...        1.0  POLYGON ((801150.000 2092615.000, 800669.000 2...
1       02    1.0  ...        0.0  POLYGON ((729326.000 2521619.000, 729320.000 2...
2       03    1.0  ...        1.0  POLYGON ((710830.000 2137350.000, 711746.000 2...
3       04    1.0  ...        1.0  POLYGON ((882701.000 1920024.000, 882408.000 1...
4       05    1.0  ...        1.0  POLYGON ((886504.000 1922890.000, 885733.000 1...
..     ...    ...  ...        ...                                                ...
80      85    1.0  ...        0.0  MULTIPOLYGON (((243055.000 2197885.000, 243019...
81      86    1.0  ...        1.0  POLYGON ((450395.000 2119945.000, 450065.000 2...
82      87    1.0  ...        1.0  POLYGON ((515230.000 2049895.000, 514990.000 2...
83      88    1.0  ...        1.0  POLYGON ((951144.000 2383215.000, 953082.000 2...
84      89    1.0  ...        0.0  POLYGON ((650260.000 2358330.000, 651712.000 2...

[85 rows x 30 columns]

Code

Guerry.plot('Crm_prs', legend = True)
plt.show()

Code

# ESTADÍSTICA ESPACIAL: ANÁLISIS EXPLORATORIO DE DATOS ESPACIALES (ESDA)
# Librería PySAL (http://pysal.org/)
import libpysal
import esda
from esda.moran import Moran, Moran_Local
import splot
from splot.esda import moran_scatterplot, plot_moran, lisa_cluster, plot_local_autocorrelation
from splot.libpysal import plot_spatial_weights
from libpysal.weights import Queen, Rook, KNN
# Construcción de la matriz W de pesos espaciales
Wnb = Queen.from_dataframe(Guerry)
Wnb.transform = 'r'
plot_spatial_weights(Wnb, Guerry)
plt.show()

Code

Wnb.neighbors[0]

[66, 35, 68, 36]

Code

Wnb.neighbors[1]

[48, 6, 72, 57, 56, 75]

Code

Wnb.neighbors[2]

[16, 68, 20, 55, 39, 60]

Code

Wnb.neighbors[3]

[4, 79, 78, 23]

Code

Wnb.neighbors[4]

[35, 3, 23]

Code

Wnb.neighbors[5]

[35, 23, 39, 40, 27, 45, 79]

Code

# Cálculo de los estadísticos de autocorrelación espacial
# I de Moran global
gmoran = Moran(Guerry['Crm_prs'], Wnb)
print(gmoran.I)

0.41145971831297995

Code

print(gmoran.p_sim)

0.001

Code

moran_scatterplot(gmoran, aspect_equal=True, zstandard=True)
plt.show()

Code

# I de Moran local (LISAs)
lmoran = Moran_Local(Guerry['Crm_prs'], Wnb, permutations = 999, seed=12345)
lmoran.p_sim

array([0.185, 0.008, 0.013, 0.045, 0.111, 0.048, 0.294, 0.045, 0.193,
       0.002, 0.001, 0.046, 0.13 , 0.001, 0.3  , 0.266, 0.011, 0.427,
       0.071, 0.113, 0.248, 0.3  , 0.221, 0.032, 0.347, 0.292, 0.138,
       0.001, 0.006, 0.166, 0.43 , 0.012, 0.009, 0.243, 0.02 , 0.417,
       0.066, 0.412, 0.063, 0.432, 0.11 , 0.143, 0.438, 0.113, 0.17 ,
       0.009, 0.16 , 0.134, 0.034, 0.218, 0.004, 0.229, 0.086, 0.094,
       0.465, 0.079, 0.039, 0.307, 0.063, 0.035, 0.135, 0.192, 0.322,
       0.037, 0.165, 0.206, 0.07 , 0.201, 0.008, 0.054, 0.34 , 0.116,
       0.357, 0.479, 0.281, 0.081, 0.033, 0.029, 0.06 , 0.003, 0.334,
       0.123, 0.073, 0.498, 0.168])

Code

moran_scatterplot(lmoran, p=0.05, zstandard =False) # p-valor=0.05
plt.show()

Code

lisa_cluster(lmoran, Guerry, p=0.05)
plt.show()

Code

# ECONOMETRÍA ESPACIAL: ANÁLISIS CONFIRMATORIO DE DATOS ESPACIALES
#
# Modelos econométricos para explicar la variable Crm_prs
#
# Regresión estándar (modelo lineal)
# Librería statsmodels (https://www.statsmodels.org/stable/)
import statsmodels.formula.api as smf
model = smf.ols(formula = "Crm_prs ~ Region + Litercy + Donatns + Infants + Suicids", data = Guerry)
Crm_LM = model.fit()
print(Crm_LM.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                Crm_prs   R-squared:                       0.348
Model:                            OLS   Adj. R-squared:                  0.279
Method:                 Least Squares   F-statistic:                     5.071
Date:                Sun, 09 Feb 2025   Prob (F-statistic):           4.54e-05
Time:                        15:19:25   Log-Likelihood:                -858.05
No. Observations:                  85   AIC:                             1734.
Df Residuals:                      76   BIC:                             1756.
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
===============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
Intercept    2.641e+04   3425.318      7.709      0.000    1.96e+04    3.32e+04
Region[T.E]  -306.8258   2875.815     -0.107      0.915   -6034.507    5420.855
Region[T.N]  2468.9607   2803.985      0.881      0.381   -3115.660    8053.582
Region[T.S]  -1.09e+04   2260.121     -4.822      0.000   -1.54e+04   -6396.349
Region[T.W]   381.2855   2289.660      0.167      0.868   -4178.968    4941.539
Litercy       -89.9681     63.471     -1.417      0.160    -216.381      36.445
Donatns        -0.1896      0.159     -1.191      0.237      -0.507       0.127
Infants         0.0047      0.091      0.052      0.959      -0.177       0.186
Suicids        -0.0018      0.025     -0.073      0.942      -0.051       0.048
==============================================================================
Omnibus:                        2.467   Durbin-Watson:                   1.777
Prob(Omnibus):                  0.291   Jarque-Bera (JB):                2.198
Skew:                           0.291   Prob(JB):                        0.333
Kurtosis:                       2.469   Cond. No.                     3.15e+05
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.15e+05. This might indicate that there are
strong multicollinearity or other numerical problems.

Code

# Librería spreg (https://pysal.org/spreg/)
import spreg
Guerry["Region_E"] = (Guerry.Region == "E").astype(int)
Guerry["Region_N"] = (Guerry.Region == "N").astype(int)
Guerry["Region_S"] = (Guerry.Region == "S").astype(int)
Guerry["Region_W"] = (Guerry.Region == "W").astype(int)
# Método OLS
variable_names = ["Region_E","Region_N","Region_S","Region_W","Litercy", "Donatns", "Infants", "Suicids",]
spreg_LM = spreg.OLS(
    # Variable dependiente
    Guerry[["Crm_prs"]].values,
    # Variables explicativas 
    Guerry[variable_names].values,
    # Nombre variable dependiente
    name_y="Crm_prs",
    # Nombre variables explicativas 
    name_x=variable_names,
)
print(spreg_LM.summary)

REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set            :     unknown
Weights matrix      :        None
Dependent Variable  :     Crm_prs                Number of Observations:          85
Mean dependent var  :  19960.9412                Number of Variables   :           9
S.D. dependent var  :   7299.2418                Degrees of Freedom    :          76
R-squared           :      0.3480
Adjusted R-squared  :      0.2794
Sum squared residual: 2.91789e+09                F-statistic           :      5.0710
Sigma-square        :38393241.687                Prob(F-statistic)     :   4.542e-05
S.E. of regression  :    6196.228                Log likelihood        :    -858.047
Sigma-square ML     :34328074.920                Akaike info criterion :    1734.095
S.E of regression ML:   5859.0165                Schwarz criterion     :    1756.079

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     t-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT     26405.29724      3425.31799         7.70886         0.00000
            Region_E      -306.82579      2875.81461        -0.10669         0.91531
            Region_N      2468.96070      2803.98550         0.88052         0.38135
            Region_S    -10897.76959      2260.12076        -4.82176         0.00001
            Region_W       381.28546      2289.66031         0.16652         0.86819
             Litercy       -89.96807        63.47099        -1.41747         0.16043
             Donatns        -0.18962         0.15915        -1.19150         0.23717
             Infants         0.00469         0.09099         0.05158         0.95900
             Suicids        -0.00181         0.02480        -0.07315         0.94188
------------------------------------------------------------------------------------

REGRESSION DIAGNOSTICS
MULTICOLLINEARITY CONDITION NUMBER           13.689

TEST ON NORMALITY OF ERRORS
TEST                             DF        VALUE           PROB
Jarque-Bera                       2           2.198           0.3331

DIAGNOSTICS FOR HETEROSKEDASTICITY
RANDOM COEFFICIENTS
TEST                             DF        VALUE           PROB
Breusch-Pagan test                8           7.506           0.4831
Koenker-Bassett test              8          10.218           0.2501
================================ END OF REPORT =====================================

Code

# Modelo con retardo espacial (SLM) [estimación STLS: MC2E-espacial]
spreg_SLM = spreg.GM_Lag(
    # Variable dependiente
    Guerry[["Crm_prs"]].values,
    # Variables explicativas 
    Guerry[variable_names].values,
    # Nombre variable dependiente
    name_y="Crm_prs",
    # Nombre variables explicativas 
    name_x=variable_names,
    # Matriz W
    w=Wnb, 
)
print(spreg_SLM.summary)

REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: SPATIAL TWO STAGE LEAST SQUARES
--------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :     Crm_prs                Number of Observations:          85
Mean dependent var  :  19960.9412                Number of Variables   :          10
S.D. dependent var  :   7299.2418                Degrees of Freedom    :          75
Pseudo R-squared    :      0.4333
Spatial Pseudo R-squared:  0.3954

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      9377.19773      7551.32367         1.24180         0.21431
            Region_E      -749.02863      2541.99336        -0.29466         0.76825
            Region_N       249.19405      2631.78105         0.09469         0.92456
            Region_S     -5583.06103      2938.95182        -1.89968         0.05748
            Region_W       388.99266      2018.81627         0.19268         0.84721
             Litercy       -34.71215        60.30130        -0.57565         0.56486
             Donatns        -0.22412         0.14102        -1.58927         0.11200
             Infants         0.00623         0.08023         0.07762         0.93813
             Suicids         0.00635         0.02211         0.28736         0.77384
           W_Crm_prs         0.69856         0.28393         2.46033         0.01388
------------------------------------------------------------------------------------
Instrumented: W_Crm_prs
Instruments: W_Donatns, W_Infants, W_Litercy, W_Region_E, W_Region_N,
             W_Region_S, W_Region_W, W_Suicids
================================ END OF REPORT =====================================

Code

# Modelo con errores espaciales (SEM) [estimación GMM]
spreg_SEM = spreg.GM_Error(    # spreg.GM_Error_Het permite heteroscedasticidad
    # Variable dependiente
    Guerry[["Crm_prs"]].values,
    # Variables explicativas 
    Guerry[variable_names].values,
    # Nombre variable dependiente
    name_y="Crm_prs",
    # Nombre variables explicativas 
    name_x=variable_names,
    # Matriz W
    w=Wnb, 
)
print(spreg_SEM.summary)

REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: GM SPATIALLY WEIGHTED LEAST SQUARES
------------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :     Crm_prs                Number of Observations:          85
Mean dependent var  :  19960.9412                Number of Variables   :           9
S.D. dependent var  :   7299.2418                Degrees of Freedom    :          76
Pseudo R-squared    :      0.3283

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT     24096.24704      3841.99296         6.27181         0.00000
            Region_E     -1424.15919      3147.92640        -0.45241         0.65097
            Region_N      -637.02311      3118.18093        -0.20429         0.83812
            Region_S     -9751.85083      2848.45495        -3.42356         0.00062
            Region_W      -790.56623      2861.63332        -0.27626         0.78235
             Litercy       -26.71237        65.54424        -0.40755         0.68361
             Donatns        -0.18597         0.15231        -1.22100         0.22209
             Infants         0.01560         0.08104         0.19253         0.84733
             Suicids        -0.00033         0.02463        -0.01349         0.98924
              lambda         0.49971    
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================

Code

# Modelo combinado (SAC -> SLM + SEM) [estimación GMM]
spreg_SAC = spreg.GM_Combo(
    # Variable dependiente
    Guerry[["Crm_prs"]].values,
    # Variables explicativas 
    Guerry[variable_names].values,
    # Nombre variable dependiente
    name_y="Crm_prs",
    # Nombre variables explicativas 
    name_x=variable_names,
    # Matriz W
    w=Wnb, 
)
print(spreg_SAC.summary)

REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: SPATIALLY WEIGHTED 2SLS - GM-COMBO MODEL
-----------------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :     Crm_prs                Number of Observations:          85
Mean dependent var  :  19960.9412                Number of Variables   :          10
S.D. dependent var  :   7299.2418                Degrees of Freedom    :          75
Pseudo R-squared    :      0.4232
Spatial Pseudo R-squared:  0.4115

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      7996.01793      6490.65518         1.23193         0.21798
            Region_E       801.55387      1729.01926         0.46359         0.64294
            Region_N      1802.35273      1881.90046         0.95773         0.33820
            Region_S     -3592.10110      2588.88384        -1.38751         0.16529
            Region_W      1007.58587      1283.33900         0.78513         0.43238
             Litercy       -70.99273        45.08605        -1.57461         0.11535
             Donatns        -0.20684         0.11025        -1.87619         0.06063
             Infants        -0.02612         0.06681        -0.39089         0.69588
             Suicids         0.00633         0.01585         0.39922         0.68973
           W_Crm_prs         0.81359         0.24491         3.32197         0.00089
              lambda        -0.80134    
------------------------------------------------------------------------------------
Instrumented: W_Crm_prs
Instruments: W_Donatns, W_Infants, W_Litercy, W_Region_E, W_Region_N,
             W_Region_S, W_Region_W, W_Suicids
================================ END OF REPORT =====================================

Code

# Versiones estimadas por máxima verosimilitud
# Modelo con retardo espacial (SLM)
spreg_SLM_2 = spreg.ML_Lag(
    # Variable dependiente
    Guerry[["Crm_prs"]].values,
    # Variables explicativas 
    Guerry[variable_names].values,
    # Nombre variable dependiente
    name_y="Crm_prs",
    # Nombre variables explicativas 
    name_x=variable_names,
    # Matriz W
    w=Wnb, 
)
print(spreg_SLM_2.summary)

REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: MAXIMUM LIKELIHOOD SPATIAL LAG (METHOD = FULL)
-----------------------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :     Crm_prs                Number of Observations:          85
Mean dependent var  :  19960.9412                Number of Variables   :          10
S.D. dependent var  :   7299.2418                Degrees of Freedom    :          75
Pseudo R-squared    :      0.4201
Spatial Pseudo R-squared:  0.3818
Log likelihood      :   -854.5647
Sigma-square ML     :30595830.6607                Akaike info criterion :    1729.129
S.E of regression   :   5531.3498                Schwarz criterion     :    1753.556

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT     17221.98905      4183.63430         4.11651         0.00004
            Region_E      -545.30719      2574.17509        -0.21184         0.83223
            Region_N      1271.83349      2503.93670         0.50793         0.61150
            Region_S     -8031.53037      2246.94194        -3.57443         0.00035
            Region_W       385.44198      2048.12866         0.18819         0.85073
             Litercy       -60.16838        56.82038        -1.05892         0.28964
             Donatns        -0.20823         0.14213        -1.46505         0.14291
             Infants         0.00552         0.08135         0.06786         0.94590
             Suicids         0.00259         0.02227         0.11639         0.90735
           W_Crm_prs         0.37673         0.12468         3.02167         0.00251
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================

Code

# Modelo con errores espaciales (SEM)
spreg_SEM_2 = spreg.ML_Error( 
    # Variable dependiente
    Guerry[["Crm_prs"]].values,
    # Variables explicativas 
    Guerry[variable_names].values,
    # Nombre variable dependiente
    name_y="Crm_prs",
    # Nombre variables explicativas 
    name_x=variable_names,
    # Matriz W
    w=Wnb, 
)
print(spreg_SEM_2.summary)

REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: ML SPATIAL ERROR (METHOD = full)
---------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :     Crm_prs                Number of Observations:          85
Mean dependent var  :  19960.9412                Number of Variables   :           9
S.D. dependent var  :   7299.2418                Degrees of Freedom    :          76
Pseudo R-squared    :      0.3422
Log likelihood      :   -856.2444
Sigma-square ML     :32024233.3912                Akaike info criterion :    1730.489
S.E of regression   :   5658.9958                Schwarz criterion     :    1752.473

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT     25011.57053      3572.65783         7.00083         0.00000
            Region_E     -1212.33235      2992.93720        -0.40506         0.68543
            Region_N       483.62591      2944.76658         0.16423         0.86955
            Region_S    -10328.83823      2568.15285        -4.02189         0.00006
            Region_W      -377.00187      2584.80417        -0.14585         0.88404
             Litercy       -49.48325        63.64034        -0.77755         0.43684
             Donatns        -0.18187         0.15155        -1.20006         0.23012
             Infants         0.01368         0.08277         0.16523         0.86876
             Suicids        -0.00190         0.02430        -0.07810         0.93775
              lambda         0.34269         0.13671         2.50667         0.01219
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================

Geometría de puntos: Precio de la vivienda en el condado de Lucas

En esta aplicación se realiza un análisis de tipo espacial de la base de datos utilizada, que contiene información sobre los precios de venta y características de las viviendas del condado de Lucas, en el estado de Ohio, Estados Unidos. Concretamente, se proponen modelos de regresión en los que las variables de dicho modelo y/o los errores del mismo presentan autocorrelación espacial.

La especificación econométrica que servirá de referencia es la siguiente:

\[log(PRICE_{i}) = \beta_1 + \beta_2 AGE_{i} + \beta_3 log(LOTSIZE_{i}) + \beta_4 ROOMS_{i} + e_{i}\]

donde \(PRICE\) es el precio de venta de la vivienda, \(AGE\) es la antigüedad de la misma, \(LOTSIZE\) es la superficie construida y \(ROOMS\) el número habitaciones que posee.

Code

# Lectura de librerías
library(tidyverse)
library(viridis)
library(sf)
library(geojsonsf)
library(spdep)
library(spatialreg)
# Lectura de datos (house {spData} -> Lucas county OH housing)
house <- geojson_sf("data/LUCAS_USA.geojson")
class(house)

[1] "sf"         "data.frame"

Code

str(house)

Classes 'sf' and 'data.frame':  25357 obs. of  25 variables:
 $ TLA       : num  3273 920 1956 1430 2208 ...
 $ s1993     : num  0 0 1 0 0 1 0 1 0 0 ...
 $ price     : num  303000 92000 90000 330000 185000 ...
 $ frontage  : num  177 100 192 200 241 154 320 362 500 220 ...
 $ s1998     : num  0 0 0 0 0 0 0 0 0 0 ...
 $ garage    : chr  "attached" "detached" "attached" "no garage" ...
 $ stories   : chr  "one+half" "one" "two" "one+half" ...
 $ syear     : chr  "1996" "1997" "1993" "1997" ...
 $ baths     : num  3 1 1 1 2 1 1 2 1 1 ...
 $ halfbaths : num  1 0 0 0 1 0 0 1 0 0 ...
 $ geometry  :sfc_POINT of length 25357; first list element:  'XY' num  -83.9 41.4
 $ wall      : chr  "partbrk" "metlvnyl" "stucdrvt" "wood" ...
 $ garagesqft: num  625 308 480 0 528 1200 360 672 624 780 ...
 $ s1996     : num  1 0 0 0 0 0 0 0 1 1 ...
 $ depth     : num  0 0 0 217 0 1420 0 0 990 0 ...
 $ rooms     : num  12 4 7 7 7 5 4 6 7 6 ...
 $ s1995     : num  0 0 0 0 1 0 1 0 0 0 ...
 $ beds      : num  4 2 3 4 3 1 2 3 3 2 ...
 $ lotsize   : num  53496 37900 52900 43560 225800 ...
 $ sdate     : num  960423 970421 931101 971223 950807 ...
 $ yrbuilt   : num  1978 1957 1937 1887 1978 ...
 $ avalue    : num  306514 84628 126514 199228 192514 ...
 $ s1994     : num  0 0 0 0 0 0 0 0 0 0 ...
 $ s1997     : num  0 1 0 1 0 0 0 0 0 0 ...
 $ age       : num  0.21 0.42 0.62 1.12 0.21 0.1 0.72 0.09 1.22 0.63 ...
 - attr(*, "sf_column")= chr "geometry"

Code

# Análisis exploratorio básico (EDA)
summary(house)

      TLA           s1993            price           frontage     
 Min.   : 120   Min.   :0.0000   Min.   :  2000   Min.   :  0.00  
 1st Qu.:1070   1st Qu.:0.0000   1st Qu.: 41900   1st Qu.: 40.00  
 Median :1318   Median :0.0000   Median : 65500   Median : 50.00  
 Mean   :1462   Mean   :0.1286   Mean   : 79018   Mean   : 64.27  
 3rd Qu.:1682   3rd Qu.:0.0000   3rd Qu.: 97000   3rd Qu.: 75.00  
 Max.   :7616   Max.   :1.0000   Max.   :875000   Max.   :810.00  
     s1998           garage            stories             syear          
 Min.   :0.0000   Length:25357       Length:25357       Length:25357      
 1st Qu.:0.0000   Class :character   Class :character   Class :character  
 Median :0.0000   Mode  :character   Mode  :character   Mode  :character  
 Mean   :0.1727                                                           
 3rd Qu.:0.0000                                                           
 Max.   :1.0000                                                           
     baths         halfbaths               geometry         wall          
 Min.   :0.000   Min.   :0.0000   POINT        :25357   Length:25357      
 1st Qu.:1.000   1st Qu.:0.0000   epsg:4326    :    0   Class :character  
 Median :1.000   Median :0.0000   +proj=long...:    0   Mode  :character  
 Mean   :1.242   Mean   :0.3412                                           
 3rd Qu.:1.000   3rd Qu.:1.0000                                           
 Max.   :7.000   Max.   :3.0000                                           
   garagesqft         s1996            depth            rooms       
 Min.   :   0.0   Min.   :0.0000   Min.   :   0.0   Min.   : 1.000  
 1st Qu.: 280.0   1st Qu.:0.0000   1st Qu.: 105.0   1st Qu.: 5.000  
 Median : 396.0   Median :0.0000   Median : 120.0   Median : 6.000  
 Mean   : 369.5   Mean   :0.1908   Mean   : 119.1   Mean   : 6.115  
 3rd Qu.: 484.0   3rd Qu.:0.0000   3rd Qu.: 140.0   3rd Qu.: 7.000  
 Max.   :5755.0   Max.   :1.0000   Max.   :1587.0   Max.   :20.000  
     s1995             beds          lotsize           sdate       
 Min.   :0.0000   Min.   :0.000   Min.   :   702   Min.   :930104  
 1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:  4700   1st Qu.:941025  
 Median :0.0000   Median :3.000   Median :  6800   Median :960520  
 Mean   :0.1629   Mean   :2.987   Mean   : 13332   Mean   :957704  
 3rd Qu.:0.0000   3rd Qu.:3.000   3rd Qu.: 11100   3rd Qu.:970818  
 Max.   :1.0000   Max.   :9.000   Max.   :429100   Max.   :981005  
    yrbuilt         avalue           s1994            s1997       
 Min.   :1835   Min.   :  1714   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:1924   1st Qu.: 38514   1st Qu.:0.0000   1st Qu.:0.0000  
 Median :1950   Median : 59800   Median :0.0000   Median :0.0000  
 Mean   :1945   Mean   : 73641   Mean   :0.1467   Mean   :0.1984  
 3rd Qu.:1964   3rd Qu.: 91114   3rd Qu.:0.0000   3rd Qu.:0.0000  
 Max.   :1998   Max.   :788114   Max.   :1.0000   Max.   :1.0000  
      age        
 Min.   :0.0100  
 1st Qu.:0.3500  
 Median :0.4900  
 Mean   :0.5366  
 3rd Qu.:0.7500  
 Max.   :1.6400

Code

# Análisis exploratorio para datos espaciales (ESDA)
# Gráficas: representación de datos espaciales
ggplot(house) + geom_sf(aes(col = price)) + scale_color_viridis() + theme_bw()

Code

# Mapa interactivo
library(leaflet)
pal <- colorNumeric(palette = "viridis", domain = house$price)
leaflet(house) %>% addTiles() %>%
    addCircles(lng = st_coordinates(house)[, 1],
               lat = st_coordinates(house)[, 2],
               color = ~pal(price)) %>%
    addLegend(pal = pal, values = ~price, position = "bottomright")

Code

# Información sobre las coordenadas (longitud y latitud) de los datos
coords <- st_coordinates(house)
# Construcción de matrices de pesos espaciales: 
# Vecinos más próximos (k=6, mediana de vecinos)
knn6 <- knearneigh(coords, k=6) 
nb6nn <- knn2nb(knn6)
# Gráficos de vecinos
plot(st_geometry(house)) 
plot(nb6nn, coords, add=TRUE, col="red")

Code

# Matriz espacial W estandarizada por filas
Wnb6nn <- nb2listw(neighbours=nb6nn, style="W")
Wnb6nn

Characteristics of weights list object:
Neighbour list object:
Number of regions: 25357 
Number of nonzero links: 152142 
Percentage nonzero weights: 0.02366211 
Average number of links: 6 
Non-symmetric neighbours list

Weights style: W 
Weights constants summary:
      n        nn    S0       S1       S2
W 25357 642977449 25357 7570.944 104462.6

Code

# Estadísticos de autocorrelación espacial
# Globales
moran.test(house$price, listw=Wnb6nn)


    Moran I test under randomisation

data:  house$price  
weights: Wnb6nn    

Moran I statistic standard deviate = 226.75, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
     7.777479e-01     -3.943840e-05      1.176627e-05

Code

moran.plot(house$price, listw=Wnb6nn)

Code

# Locales (LISAs)
LocalI <- as.data.frame(localmoran(house$price, listw=Wnb6nn))
str(LocalI)

'data.frame':   25357 obs. of  5 variables:
 $ Ii            : num  3.875 0.353 0.273 1.586 0.921 ...
 $ E.Ii          : num  -5.56e-04 -1.87e-06 -1.34e-06 -6.98e-04 -1.24e-04 ...
 $ Var.Ii        : num  2.34795 0.00789 0.00565 2.94771 0.52591 ...
 $ Z.Ii          : num  2.529 3.972 3.627 0.924 1.27 ...
 $ Pr(z != E(Ii)): num  1.14e-02 7.12e-05 2.87e-04 3.56e-01 2.04e-01 ...

Code

house_LocalI <- bind_cols(house,LocalI)
plot(house_LocalI["Z.Ii"])

Code

plot(house_LocalI["Pr(z != E(Ii))"])

Code

# Modelos econométricos espaciales
form <- formula(log(price) ~ age + log(lotsize) + rooms)
# Modelo lineal (LM) [estimación MCO]
m1_LM <- lm(formula=form, data=house)
summary(m1_LM)


Call:
lm(formula = form, data = house)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.2148 -0.2090  0.0662  0.2914  2.8415 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   8.793591   0.042969  204.65   <2e-16 ***
age          -1.475724   0.012249 -120.47   <2e-16 ***
log(lotsize)  0.244074   0.004395   55.53   <2e-16 ***
rooms         0.135360   0.002383   56.79   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4791 on 25353 degrees of freedom
Multiple R-squared:  0.6057,    Adjusted R-squared:  0.6057 
F-statistic: 1.298e+04 on 3 and 25353 DF,  p-value: < 2.2e-16

Code

# Modelo con retardo espacial (SLM)
m2_SLM <- stsls(formula=form, data=house, listw=Wnb6nn)
summary(m2_SLM)


Call:stsls(formula = form, data = house, listw = Wnb6nn)

Residuals:
      Min        1Q    Median        3Q       Max 
-2.474539 -0.140669  0.041546  0.203977  2.414065 

Coefficients: 
               Estimate Std. Error t value  Pr(>|t|)
Rho           0.5238190  0.0085133  61.529 < 2.2e-16
(Intercept)   4.0433688  0.0830646  48.677 < 2.2e-16
age          -0.7479265  0.0147058 -50.859 < 2.2e-16
log(lotsize)  0.1158118  0.0037650  30.760 < 2.2e-16
rooms         0.0925783  0.0018369  50.400 < 2.2e-16

Residual variance (sigma squared): 0.11678, (sigma: 0.34174)
Anselin-Kelejian (1997) test for residual autocorrelation
test value: 283.94, p-value: < 2.22e-16

Code

# Modelo con errores espaciales (SEM)
m3_SEM <- GMerrorsar(formula=form, data=house, listw=Wnb6nn)
summary(m3_SEM)


Call:GMerrorsar(formula = form, data = house, listw = Wnb6nn)

Residuals:
     Min       1Q   Median       3Q      Max 
-3.15072 -0.20516  0.10807  0.33411  2.54157 

Type: GM SAR estimator
Coefficients: (GM standard errors) 
               Estimate Std. Error z value  Pr(>|z|)
(Intercept)   8.8153942  0.0483589 182.291 < 2.2e-16
age          -0.8687533  0.0137399 -63.228 < 2.2e-16
log(lotsize)  0.2266407  0.0051131  44.326 < 2.2e-16
rooms         0.1027348  0.0020207  50.842 < 2.2e-16

Lambda: 0.68971 (standard error): 0.011167 (z-value): 61.765
Residual variance (sigma squared): 0.11915, (sigma: 0.34518)
GM argmin sigma squared: 0.11951
Number of observations: 25357 
Number of parameters estimated: 6

Code

# Modelo combinado (SAC -> SLM + SEM)
m4_SAC <- gstsls(formula=form, data=house, listw=Wnb6nn)
summary(m4_SAC)


Call:gstsls(formula = form, data = house, listw = Wnb6nn)

Residuals:
      Min        1Q    Median        3Q       Max 
-2.385331 -0.136218  0.034588  0.189469  2.532176 

Type: GM SARAR estimator
Coefficients: (GM standard errors) 
               Estimate Std. Error z value  Pr(>|z|)
Rho_Wy        0.5181163  0.0089175  58.101 < 2.2e-16
(Intercept)   3.8960079  0.0893801  43.589 < 2.2e-16
age          -0.7048022  0.0140965 -49.998 < 2.2e-16
log(lotsize)  0.1338316  0.0043791  30.562 < 2.2e-16
rooms         0.0965855  0.0018598  51.934 < 2.2e-16

Lambda: 0.35507
Residual variance (sigma squared): 0.10714, (sigma: 0.32733)
GM argmin sigma squared: 0.10757
Number of observations: 25357 
Number of parameters estimated: 7

Code

# Lectura de librerías
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
import libpysal
from libpysal  import weights
import esda
from esda.moran import Moran, Moran_Local
import splot
from splot.esda import moran_scatterplot, plot_moran, lisa_cluster, plot_local_autocorrelation
from splot.libpysal import plot_spatial_weights
import statsmodels.formula.api as smf
import spreg
# Lectura de datos
house = gpd.read_file("data/LUCAS_USA.geojson")
house.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 25357 entries, 0 to 25356
Data columns (total 25 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   price       25357 non-null  int64   
 1   yrbuilt     25357 non-null  int64   
 2   stories     25357 non-null  object  
 3   TLA         25357 non-null  int64   
 4   wall        25357 non-null  object  
 5   beds        25357 non-null  float64 
 6   baths       25357 non-null  float64 
 7   halfbaths   25357 non-null  float64 
 8   frontage    25357 non-null  int64   
 9   depth       25357 non-null  int64   
 10  garage      25357 non-null  object  
 11  garagesqft  25357 non-null  int64   
 12  rooms       25357 non-null  int64   
 13  lotsize     25357 non-null  int64   
 14  sdate       25357 non-null  int64   
 15  avalue      25357 non-null  int64   
 16  s1993       25357 non-null  int64   
 17  s1994       25357 non-null  int64   
 18  s1995       25357 non-null  int64   
 19  s1996       25357 non-null  int64   
 20  s1997       25357 non-null  int64   
 21  s1998       25357 non-null  int64   
 22  syear       25357 non-null  object  
 23  age         25357 non-null  float64 
 24  geometry    25357 non-null  geometry
dtypes: float64(4), geometry(1), int64(16), object(4)
memory usage: 4.8+ MB

Code

# Análisis exploratorio básico (EDA)
house.describe().round(2)

           price   yrbuilt       TLA  ...    s1997     s1998       age
count   25357.00  25357.00  25357.00  ...  25357.0  25357.00  25357.00
mean    79017.94   1945.34   1462.25  ...      0.2      0.17      0.54
std     59655.02     27.88    613.12  ...      0.4      0.38      0.28
min      2000.00   1835.00    120.00  ...      0.0      0.00      0.01
25%     41900.00   1924.00   1070.00  ...      0.0      0.00      0.35
50%     65500.00   1950.00   1318.00  ...      0.0      0.00      0.49
75%     97000.00   1964.00   1682.00  ...      0.0      0.00      0.75
max    875000.00   1998.00   7616.00  ...      1.0      1.00      1.64

[8 rows x 20 columns]

Code

# Análisis exploratorio para datos espaciales (ESDA)
# Mapa básico
house.plot('price', legend = True)
plt.show()

Code

# Vecinos geográficos y matriz de pesos espaciales 
Wnb6nn = weights.KNN.from_dataframe(house, k=6)
# Estandarización por filas (Row-standardized W)
Wnb6nn.transform = 'r'
plot_spatial_weights(Wnb6nn, house)
plt.show()

Code

# Estadísticos de autocorrelación espacial
# Dependencia espacial global
globalMoran = Moran(house['price'], Wnb6nn)
print(globalMoran.I)

0.7777478528962092

Code

print(globalMoran.p_sim)

0.001

Code

moran_scatterplot(globalMoran, aspect_equal=True, zstandard=True)
plt.show()

Code

# Dependencia espacial local
localMoran = Moran_Local(house['price'], Wnb6nn, permutations = 999, seed=12345)
localMoran.p_sim

array([0.018, 0.003, 0.005, ..., 0.312, 0.119, 0.046])

Code

moran_scatterplot(localMoran, p=0.05, zstandard =False) # Nota: p-valor=0.05
plt.show()

Code

lisa_cluster(localMoran, house, p=0.05)
plt.show()

Code

# Modelos econométricos espaciales
# Operaciones con variables
house['l_price']=house['price'].map(lambda x:np.log(x))
house['l_lotsize']=house['lotsize'].map(lambda x:np.log(x))
variable_names = ['age', 'l_lotsize', 'rooms']
# Modelo lineal (LM)
m1_LM = spreg.OLS(
    # Var. dep.
    house[['l_price']].values, 
    # Var. expl.
    house[variable_names].values,
    # 
    name_y='l_price', 
    # 
    name_x=variable_names
)
print(m1_LM.summary)

REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set            :     unknown
Weights matrix      :        None
Dependent Variable  :     l_price                Number of Observations:       25357
Mean dependent var  :     11.0203                Number of Variables   :           4
S.D. dependent var  :      0.7629                Degrees of Freedom    :       25353
R-squared           :      0.6057
Adjusted R-squared  :      0.6057
Sum squared residual:      5818.9                F-statistic           :  12983.3074
Sigma-square        :       0.230                Prob(F-statistic)     :           0
S.E. of regression  :       0.479                Log likelihood        :  -17317.995
Sigma-square ML     :       0.229                Akaike info criterion :   34643.990
S.E of regression ML:      0.4790                Schwarz criterion     :   34676.554

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     t-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT         8.79359         0.04297       204.64963         0.00000
                 age        -1.47572         0.01225      -120.47428         0.00000
           l_lotsize         0.24407         0.00440        55.53175         0.00000
               rooms         0.13536         0.00238        56.79125         0.00000
------------------------------------------------------------------------------------

REGRESSION DIAGNOSTICS
MULTICOLLINEARITY CONDITION NUMBER           37.126

TEST ON NORMALITY OF ERRORS
TEST                             DF        VALUE           PROB
Jarque-Bera                       2       14610.253           0.0000

DIAGNOSTICS FOR HETEROSKEDASTICITY
RANDOM COEFFICIENTS
TEST                             DF        VALUE           PROB
Breusch-Pagan test                3        3911.303           0.0000
Koenker-Bassett test              3        1566.036           0.0000
================================ END OF REPORT =====================================

Code

# Modelo con retardo espacial (SLM)
m2_SLM = spreg.GM_Lag(
    # Var. dep.
    house[['l_price']].values, 
    # Var. expl.
    house[variable_names].values,
    # Matriz W
    w=Wnb6nn, 
    # 
    name_y='l_price', 
    # 
    name_x=variable_names
)
print(m2_SLM.summary)

REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: SPATIAL TWO STAGE LEAST SQUARES
--------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :     l_price                Number of Observations:       25357
Mean dependent var  :     11.0203                Number of Variables   :           5
S.D. dependent var  :      0.7629                Degrees of Freedom    :       25352
Pseudo R-squared    :      0.8023
Spatial Pseudo R-squared:  0.6334

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT         4.07422         0.08382        48.60492         0.00000
                 age        -0.75265         0.01481       -50.81376         0.00000
           l_lotsize         0.11664         0.00378        30.85158         0.00000
               rooms         0.09286         0.00184        50.41191         0.00000
           W_l_price         0.52042         0.00860        60.50456         0.00000
------------------------------------------------------------------------------------
Instrumented: W_l_price
Instruments: W_age, W_l_lotsize, W_rooms
================================ END OF REPORT =====================================

Code

# Modelo con errores espaciales (SEM)
m3_SEM = spreg.GM_Error(
    # Var. dep.
    house[['l_price']].values, 
    # Var. expl.
    house[variable_names].values,
    # Matriz W
    w=Wnb6nn, 
    #  
    name_y='l_price', 
    # 
    name_x=variable_names
)
print(m3_SEM.summary)

REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: GM SPATIALLY WEIGHTED LEAST SQUARES
------------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :     l_price                Number of Observations:       25357
Mean dependent var  :     11.0203                Number of Variables   :           4
S.D. dependent var  :      0.7629                Degrees of Freedom    :       25353
Pseudo R-squared    :      0.5976

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT         8.81540         0.04612       191.15257         0.00000
                 age        -0.86875         0.01310       -66.30164         0.00000
           l_lotsize         0.22664         0.00488        46.48027         0.00000
               rooms         0.10273         0.00193        53.31384         0.00000
              lambda         0.68971    
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================

Code

# Modelo combinado (SAC -> SLM + SEM)
m4_SAC = spreg.GM_Combo(
    # Var. dep.
    house[['l_price']].values, 
    # Var. expl.
    house[variable_names].values,
    # Matriz W
    w=Wnb6nn, 
    # 
    name_y='l_price', 
    # 
    name_x=variable_names
)
print(m4_SAC.summary)

REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: SPATIALLY WEIGHTED 2SLS - GM-COMBO MODEL
-----------------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :     l_price                Number of Observations:       25357
Mean dependent var  :     11.0203                Number of Variables   :           5
S.D. dependent var  :      0.7629                Degrees of Freedom    :       25352
Pseudo R-squared    :      0.8013
Spatial Pseudo R-squared:  0.6295

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT         3.90758         0.08966        43.58021         0.00000
                 age        -0.70573         0.01410       -50.04807         0.00000
           l_lotsize         0.13446         0.00439        30.62860         0.00000
               rooms         0.09670         0.00186        51.96693         0.00000
           W_l_price         0.51653         0.00894        57.75729         0.00000
              lambda         0.35939    
------------------------------------------------------------------------------------
Instrumented: W_l_price
Instruments: W_age, W_l_lotsize, W_rooms
================================ END OF REPORT =====================================