I have a matrix of real detailed routes which I want to efficiently turn into a simple spatial network. Simple means that I don't care about the intricacies of local transport and possible intersections of routes near the start and end points. I do want to add major intersections outside the start and end points as nodes to the network. Below I give a simple example. My real data has 12,500 routes between 500 start and end points and is approx. 2Gb in size.
library(fastverse)#> -- Attaching packages --------------------------------------- fastverse 0.3.2 --#> v data.table 1.15.0 v kit 0.0.13#> v magrittr 2.0.3 v collapse 2.0.12fastverse_extend(osrm, sf, sfnetworks, install = TRUE)#> -- Attaching extension packages ----------------------------- fastverse 0.3.2 --#> v osrm 4.1.1 v sfnetworks 0.6.3 #> v sf 1.0.16largest_20_german_cities <- data.frame( city = c("Berlin", "Stuttgart", "Munich", "Hamburg", "Cologne", "Frankfurt","Duesseldorf", "Leipzig", "Dortmund", "Essen", "Bremen", "Dresden","Hannover", "Nuremberg", "Duisburg", "Bochum", "Wuppertal", "Bielefeld", "Bonn", "Muenster"), lon = c(13.405, 9.18, 11.575, 10, 6.9528, 8.6822, 6.7833, 12.375, 7.4653, 7.0131, 8.8072, 13.74, 9.7167, 11.0775, 6.7625, 7.2158, 7.1833, 8.5347, 7.1, 7.6256), lat = c(52.52, 48.7775, 48.1375, 53.55, 50.9364, 50.1106, 51.2333, 51.34, 51.5139, 51.4508, 53.0758, 51.05, 52.3667, 49.4539, 51.4347, 51.4819, 51.2667, 52.0211, 50.7333, 51.9625))# Unique routesm <- matrix(1, 20, 20)diag(m) <- NAm[upper.tri(m)] <- NAroutes_ind <- which(!is.na(m), arr.ind = TRUE)rm(m)# Routes DFroutes <- data.table(from_city = largest_20_german_cities$city[routes_ind[, 1]], to_city = largest_20_german_cities$city[routes_ind[, 2]], duration = NA_real_, distance = NA_real_, geometry = list())# Fetch Routesi = 1Lfor (r in mrtl(routes_ind)) { route <- osrmRoute(ss(largest_20_german_cities, r[1], c("lon", "lat")), ss(largest_20_german_cities, r[2], c("lon", "lat")), overview = "full") set(routes, i, 3:5, fselect(route, duration, distance, geometry)) i <- i + 1L}routes %<>% st_as_sf(crs = st_crs(route))routes_net = as_sfnetwork(routes, directed = FALSE)print(routes_net)#> # A sfnetwork with 20 nodes and 190 edges#> ##> # CRS: EPSG:4326 #> ##> # An undirected simple graph with 1 component with spatially explicit edges#> ##> # A tibble: 20 × 1#> geometry#> <POINT [°]>#> 1 (9.179999 48.7775)#> 2 (13.405 52.52)#> 3 (11.57486 48.13675)#> 4 (10.00001 53.54996)#> 5 (6.95285 50.9364)#> 6 (8.68202 50.1109)#> # ℹ 14 more rows#> ##> # A tibble: 190 × 7#> from to from_city to_city duration distance geometry#> <int> <int> <chr> <chr> <dbl> <dbl> <LINESTRING [°]>#> 1 1 2 Stuttgart Berlin 390. 633. (9.179999 48.7775, 9.18005 48…#> 2 2 3 Munich Berlin 356. 586. (11.57486 48.13675, 11.57486 …#> 3 2 4 Hamburg Berlin 176. 288. (10.00001 53.54996, 10.0002 5…#> # ℹ 187 more rowsplot(routes_net)
Created on 2024-03-28 with reprex v2.0.2
Regarding possible solutions I am open to any software (R, Python, QGIS etc.). I know in R there is tidygraph which allows me to do something like
library(tidygraph)routes_net_subdiv = convert(routes_net, to_spatial_subdivision)But this seems to run forever even with this mock example. I have also seen ideas to use GRASS's v.clean tool to break up the geometry, but haven't tried that yet and a bit reluctant to install GRASS.
I think perhaps the best solution for performance is converting to S2 and comparing all linestrings individually using s2_intersection() and then turning this information into a graph somehow. But hoping for more elegant and performant solutions.