class: center, middle, title-slide # Spatial Join ## Claudia Gutiérrez-Arellano ### 30 April 2026 | The University of Manchester --- # Which areas in Greater Manchester are best served by the tram network? <div style="text-align: center;"> <img src="images/tram_network.png" width="80%"> <p class="image-caption"> Source: TfGM Metrolink Network Map </p> </div> --- Tram stops (points) and boroughs (polygons) <img src="data:image/png;base64,#index_files/figure-html/unnamed-chunk-1-1.png" width="80%" style="display: block; margin: auto;" /> <p class="image-caption"> GM Metrolink Network, Source: data.gov.uk; Local Authority Districts (Dec 2022), Source: ONS Geography --- # Spatial join - Combines attributes from one spatial dataset with another - **Key feature:** Based on **location**, not shared data --- # Spatial join - Combines attributes from one spatial dataset with another - **Key feature:** Based on **location**, not shared data .pull-left[ - For example: Which boroughs are not served by the tram network? Which borough has the most tram stops? Which tram stops are located within each borough? ] .pull-right[ <div style="text-align: center;"> <img src="images/gmtram.png" width="90%"> <p class="image-caption"> Source: Open Street Map, ONS Combined Authorities, TfGM </div> ] --- # Spatial join process ## Inputs - target & source (same Coordinate Reference System) - Tram stops as **points** - Boroughs as **polygons** ## Spatial rule - For each tram stop, identify the borough it falls **within** ## Output - 'Stops in borough' as **points** Each tram stop inherits the attributes of a polygon --- # Spatial join in R .pull-left[ Libraries ``` r library(sf) # simple features library(tidyverse) # data management ``` Function: `st_join` Our Inputs - target: `stops` - source: `boroughs` Our Rule: `st_within` (but there are many more!) Our Output: `stops_in_borough`] .pull-right[ Let's run! ``` r stops = st_read("geodata/Metrolink_Stops_Functional.json", quiet = TRUE) boroughs = st_read("geodata/GM_lad.gpkg", quiet = TRUE) # check they have the same reference st_crs(boroughs) == st_crs(stops) #[1] TRUE stops_in_borough = st_join( stops, boroughs, join = st_within ) ``` ] --- ## Checking the output ### Before: .pull-left[ `stops` <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> name </th> <th style="text-align:left;"> stationCode </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Heaton Park </td> <td style="text-align:left;"> HPK </td> </tr> <tr> <td style="text-align:left;"> Bowker Vale </td> <td style="text-align:left;"> BKV </td> </tr> <tr> <td style="text-align:left;"> Crumpsall </td> <td style="text-align:left;"> CRP </td> </tr> <tr> <td style="text-align:left;"> Bury </td> <td style="text-align:left;"> BRY </td> </tr> </tbody> </table> ] .pull-right[ `boroughs` <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> LAD22CD </th> <th style="text-align:left;"> LAD22NM </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> E06000007 </td> <td style="text-align:left;"> Warrington </td> </tr> <tr> <td style="text-align:left;"> E06000008 </td> <td style="text-align:left;"> Blackburn with Darwen </td> </tr> <tr> <td style="text-align:left;"> E06000049 </td> <td style="text-align:left;"> Cheshire East </td> </tr> <tr> <td style="text-align:left;"> E07000037 </td> <td style="text-align:left;"> High Peak </td> </tr> </tbody> </table> ] <br> <span style="font-size: 16px;"> *Use `glimpse(input_name)` for a quick check <span style="font-size: 16px;"> --- ## Checking the output .pull-left[ ### After: `stops_in_borough` <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> name </th> <th style="text-align:left;"> stationCode </th> <th style="text-align:left;"> LAD22CD </th> <th style="text-align:left;"> LAD22NM </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Heaton Park </td> <td style="text-align:left;"> HPK </td> <td style="text-align:left;"> E08000002 </td> <td style="text-align:left;"> Bury </td> </tr> <tr> <td style="text-align:left;"> Bowker Vale </td> <td style="text-align:left;"> BKV </td> <td style="text-align:left;"> E08000003 </td> <td style="text-align:left;"> Manchester </td> </tr> <tr> <td style="text-align:left;"> Crumpsall </td> <td style="text-align:left;"> CRP </td> <td style="text-align:left;"> E08000003 </td> <td style="text-align:left;"> Manchester </td> </tr> <tr> <td style="text-align:left;"> Bury </td> <td style="text-align:left;"> BRY </td> <td style="text-align:left;"> E08000002 </td> <td style="text-align:left;"> Bury </td> </tr> </tbody> </table> <br> <span style="font-size: 16px;"> *Use `glimpse(output_name)` for a quick check <span style="font-size: 16px;"> ] .pull-right[ <img src="data:image/png;base64,#index_files/figure-html/unnamed-chunk-6-1.png" width="130%" style="display: block; margin: auto;" /> ] --- ## Using the joined data to answer questions .pull-left[ How many stops are there in each borough (LADs)? ``` r borough_counts = stops_in_borough %>% st_drop_geometry() %>% count(LAD22NM, name = "n_stops")%>% arrange(-n_stops) ``` .pull-left[ <table class="table" style="font-size: 12px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> LAD22NM </th> <th style="text-align:right;"> n_stops </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Manchester </td> <td style="text-align:right;"> 42 </td> </tr> <tr> <td style="text-align:left;"> Trafford </td> <td style="text-align:right;"> 18 </td> </tr> <tr> <td style="text-align:left;"> Oldham </td> <td style="text-align:right;"> 10 </td> </tr> <tr> <td style="text-align:left;"> Salford </td> <td style="text-align:right;"> 10 </td> </tr> </tbody> </table> ] .pull-right[ <table class="table" style="font-size: 12px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> LAD22NM </th> <th style="text-align:right;"> n_stops </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Tameside </td> <td style="text-align:right;"> 7 </td> </tr> <tr> <td style="text-align:left;"> Bury </td> <td style="text-align:right;"> 6 </td> </tr> <tr> <td style="text-align:left;"> Rochdale </td> <td style="text-align:right;"> 6 </td> </tr> </tbody> </table> ] ] .pull-right[ Which boroughs are not served? ``` r not_served = boroughs %>% st_drop_geometry() %>% anti_join(stops_in_borough, by = "LAD22NM") %>% select(LAD22NM) ``` <div class="three-col"> <div> <table class="table" style="font-size: 12px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> LAD22NM </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Warrington </td> </tr> <tr> <td style="text-align:left;"> Blackburn with Darwen </td> </tr> <tr> <td style="text-align:left;"> Cheshire East </td> </tr> <tr> <td style="text-align:left;"> High Peak </td> </tr> <tr> <td style="text-align:left;"> Chorley </td> </tr> </tbody> </table> </div> <div> <table class="table" style="font-size: 12px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> LAD22NM </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Rossendale </td> </tr> <tr> <td style="text-align:left;"> West Lancashire </td> </tr> <tr> <td style="text-align:left;"> Bolton </td> </tr> <tr> <td style="text-align:left;"> Stockport </td> </tr> </tbody> </table> </div> <div> <table class="table" style="font-size: 12px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> LAD22NM </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Wigan </td> </tr> <tr> <td style="text-align:left;"> St. Helens </td> </tr> <tr> <td style="text-align:left;"> Calderdale </td> </tr> <tr> <td style="text-align:left;"> Kirklees </td> </tr> </tbody> </table> </div> </div> <span style="font-size: 16px;"> `anti_join` returns all rows from `boroughs` without a match </span> ] --- class: code-small .pull-left[ # Spatial join in Python - Same inputs - Same spatial rule: within - Same type of output <br> <img src="images/languages1.png" width="90%"> ] .pull-right[ ``` python import geopandas as gpd stops_in_borough = gpd.sjoin( stops, boroughs, how="inner", predicate="within" ) borough_counts = ( stops_in_borough .groupby("LAD22NM") .size() .reset_index(name="n_stops") ) not_served = ( boroughs[["LAD22NM"]] .merge(stops_in_borough[["LAD22NM"]], on="LAD22NM", how="left", indicator=True) .query('_merge == "left_only"') [["LAD22NM"]] ) ``` ] --- class: code-small .pull-left[ # Spatial join in Python - Same inputs - Same spatial rule: within - Same type of output <br> <img src="images/languages.png" width="90%"> ] .pull-right[ ``` python import geopandas as gpd stops_in_borough = gpd.sjoin( stops, boroughs, how="inner", predicate="within" ) borough_counts = ( stops_in_borough .groupby("LAD22NM") .size() .reset_index(name="n_stops") ) not_served = ( boroughs[["LAD22NM"]] .merge(stops_in_borough[["LAD22NM"]], on="LAD22NM", how="left", indicator=True) .query('_merge == "left_only"') [["LAD22NM"]] ) ``` ] --- class: middle ## Other spatial operations | Spatial relationship | R (sf) | Python (GeoPandas) | Example | |---------------------|--------|--------------------|---------| | point within polygon | `join = st_within` | `predicate="within"` | Which schools are located within flood risk zones? | | features overlap | `join = st_intersects` | `predicate="intersects"` | Which green spaces overlap proposed development zones? | | boundaries touch | `join = st_touches` | `predicate="touches"` | Which protected areas border urban areas? | | nearest feature* | `st_nearest_feature` | `sjoin_nearest()` | Which hospital is closest to a sports facility? | <span style="font-size: 20px;"> Explore: <a href="https://r-spatial.github.io/sf/reference/st_join.html" target="_blank">sf st_join</a> | <a href="https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.sjoin.html" target="_blank">geopandas sjoin</a> <span style="font-size: 20px;"> --- class: middle # Summary - A spatial join links datasets using **location** - It follows a **process**: inputs > spatial rule > output - We used a spatial join to identify the boroughs that are/are not served by the tram network - The same process applies in **R** and **Python** --- class: middle ## A question for you If a borough contains a tram stop, does that mean people across the whole borough have good access to the network? --- class: middle ## Not necessarily - Presence of stops `\(\neq\)` accessibility - Other analyses (wait to hear about `st_nearest_feature`, `st_distance` and `st_buffer`!) --- <br><br><br><br> <h1 style="text-align: center;"> Questions? </h1> <br><br><br><br> .pull-left[ <span style="font-size: 20px;"> Access R and Python code, data and slides <a href="https://github.com/Claudia-Gutierrez/spatial-join-presentation.git" target="_blank">here</a> <span style="font-size: 16px;"> ]