with ease-of-use
Map your urban dataset with the street network of your choice, urban pipeline your analysis, and propagate it along the urban research network!
Important
- We highly recommend exploring the
/example
folder for Jupyter Notebook-based tutorials 🎉 - The following library is under active development and is not yet stable. Expect bugs & frequent changes!
OSMNxMapping
–– f(.)
–– brings road networks from OpenStreetMap –– X
–– and your urban datasets –– Y
–– together
through the function f(X, Y) = X ⋈ Y, allowing you to map these components in any direction—whether attaching street
info to your dataset using latitude
and longitude
coordinates, or computing insights from your datasets to enrich the street network
(e.g., mapping to nodes or edges).
OSMNxMapping
, built with a Scikit-Learn-like philosophy – i.e., (I) from loading
to viz.
passing by mapping
,
we want to cover as much as users’ wishes in a welcoming way without having to code 20+/50+ lines of code for one,
non-reproducible, non-shareable, non-updatable piece of code; and (II) the library’s flexibility allows for easy
contributions to sub-modules without having to start from scratch “all the time”.
👀 Read me Further! Click here ⬅️
To answer (I) –– one out many other ways –– we propose a scikit-like
pipeline to, for instance, stack the following steps:
- Query a user-defined road network via the use of the great
OSMNx
–– Network module; - Load your geospatial data (
CSV
,Parquet
, orshapefiles
) using its Loader module; - Wrangle the loaded data with optional imputation and filtering to handle missing coordinates or irrelevant regions –– Preprocessing module;
- Map data to street nodes or edges, enrich the network (e.g., averaging building floors per street or counting taxi pickups per segment) – no big deal, a flexible factory makes it easy to configure –– Enricher module;
- Visualise results statically or interactively –– Visual module;
- Optional but handy—save your pipeline for later use or sharing with others using a simple
save()
call.
Though, to answer (II) using the right state-of-the-art open-source initiatives and tools, highly type-safe and tested and documented library is a must. We already are fully highly-typed thanks to BearType, yet we aim at reaching a decent test coverage and documentation to make the library more robust and user-friendly.
Who knows— we’d like you to deal with what matters to you; e.g., if you are a machine learning enthusiast, you can apply machine learning to the enriched networks; if you are a researcher, you can easily map your data to street networks and get insights from them. Nonetheless, if you want to contribute to the library, you can easily do so by adding new modules or extending the existing ones, and we are happy in advance to welcome you doing so! 🥐
We embrace a DRY (Do Not Repeat Yourself) philosophy—focusing on what matters and letting us handle the mapping intricacies. Each of the steps mentioned works independently, but the pipeline ties them together seamlessly 🙃!
See further notebook-based examples in the examples/
directory. 📓
We highly recommend using uv
for installation from source to avoid the hassle of Conda
or other package managers.
It is also the fastest known to date on the OSS market and manages dependencies seamlessly without manual environment
activation (Biggest flex!). If you do not want to use uv
, there are no issues, but we will cover it in the upcoming
documentation – not as follows.
First, ensure uv
is installed on your machine by
following these instructions.
- Install
uv
as described above. - Clone
Auctus
(required for alpha development) into the same parent directory asOSMNxMapping
. Use:This step ensuresgit clone [email protected]:VIDA-NYU/auctus_search.git
pyproject.toml
buildsauctus_search
from source during installation, though we plan forauctus_search
to become a PyPi package (uv add auctus_search
orpip install auctus_search
) in future releases.
Note
Future versions will simplify this process: auctus_search
will move to PyPi, removing the need for manual cloning,
and Jupyter extensions will auto-install via pyproject.toml
configuration.
- Clone the
OSMNxMapping
repository:git clone https://github.com/yourusername/OSMNxMapping.git cd OSMNxMapping
- Lock and sync dependencies with
uv
:uv lock uv sync
- (Recommended) Install Jupyter extensions for interactive visualisations requiring Jupyter widgets:
uv run jupyter labextension install @jupyter-widgets/jupyterlab-manager
- Launch Jupyter Lab to explore
OSMNxMapping
(Way faster than running Jupyter withoutuv
):uv run --with jupyter jupyter lab
Voila 🥐 ! You’re all set to explore OSMNxMapping
in Jupyter Lab.
Below are two approaches to help you begin using the OSMNxMapping
library in a Jupyter notebook. Yet, note that
the first, fine-grained step-by-step is basically reproduced in the examples/
directory as the chapter 1's Notebook.
This detailed approach walks you through each step of mapping urban data
to a street network
using PLUTO
(Primary Land Use Tax Lot Output) buildings in New York City
as an example.
import osmnx_mapping as oxm
pluto_buildings = oxm.OSMNxMapping() # Represents an urban analysis study of PLUTO in NYC. Nothing is loaded or queried yet –– See further below.
You can manually load datasets (see the /examples
folder), but here we use the Auctus
integration to search for PLUTO-related datasets.
collection = pluto_buildings.auctus.explore_datasets_from_auctus(search_query="PLUTO", display_initial_results=True)
# Searches the Auctus API for "PLUTO" datasets, returning an AuctusDatasetCollection.
# display_initial_results=True provides an interactive preview of results in the notebook.
# Additional parameters like page and size are available—see the OSMNxMapping API docs.
dataset = pluto_buildings.auctus.load_dataset_from_auctus()
# Loads the selected dataset into memory as a pandas.DataFrame or geopandas.GeoDataFrame, with an interactive table preview by default.
The load_from_dataframe
method formats the data for OSMNxMapping without reloading it entirely.
loaded_data = pluto_buildings.loader.load_from_dataframe(
input_dataframe=dataset,
latitude_column="latitude", # Replace with your dataset’s latitude column name
longitude_column="longitude" # Replace with your dataset’s longitude column name
)
pluto_buildings.table_vis.interactive_display(loaded_data) # Displays an interactive table of the loaded data
graph, nodes, edges = pluto_buildings.network.network_from_place("Manhattan, New York City, USA", render=True)
# Fetches the road network for Manhattan and renders it (render=True shows a basic plot).
This step adds a column (e.g., nearest_node
) to loaded_data
, linking each record to its closest network node.
loaded_data = pluto_buildings.network.map_nearest_street(
data=loaded_data,
longitude_column="longitude",
latitude_column="latitude"
)
Using SimpleGeoImputer
to drop rows with missing latitude/longitude values. Check the PreprocessingMixin
API for advanced options.
loaded_data = (
pluto_buildings.preprocessing
.with_default_imputer(latitude_column_name="latitude", longitude_column_name="longitude")
.transform(input_data=loaded_data)
)
Using BoundingBoxFilter
to retain only data within the network’s bounding box.
loaded_data = (
pluto_buildings.preprocessing
.with_default_filter(nodes=nodes)
.transform(input_data=loaded_data)
)
Here, we calculate the average number of floors (numfloors
) per street segment. You can use with_default
for simplicity:
pluto_buildings.enricher.with_default(
group_by_column="nearest_node",
values_from_column="numfloors",
method="mean",
output_column="avg_numfloors",
target="edges" # Enrich edges; use "nodes" for node-based enrichment
)
enriched_data, graph, nodes, edges = pluto_buildings.enricher.enrich_network(
input_data=loaded_data,
input_graph=graph,
input_nodes=nodes,
input_edges=edges
)
Or use CreateEnricher
for more control:
enricher = (
oxm.CreateEnricher()
.with_data(group_by="nearest_node", values_from="numfloors")
.aggregate_with(method="mean", output_column="avg_numfloors", target="edges")
.build()
)
pluto_buildings.enricher.enricher = enricher
enriched_data, graph, nodes, edges = pluto_buildings.enricher.enrich_network(
input_data=loaded_data,
input_graph=graph,
input_nodes=nodes,
input_edges=edges
)
viz = pluto_buildings.visual.visualise(graph, nodes, edges, "avg_numfloors", target="edges")
viz # Displays a static plot
from osmnx_mapping.modules.visualiser.visualisers.interactive_visualiser import InteractiveVisualiser
viz = pluto_buildings.visual(InteractiveVisualiser()).visualise(graph, nodes, edges, "avg_numfloors", target="edges")
viz # Displays an interactive map
The UrbanPipeline
class offers a concise, reproducible workflow. This example uses a local pluto.csv
file
(Auctus isn’t directly supported in pipelines –– and might never because it requires user interaction).
import osmnx_mapping as oxm
from osmnx_mapping.modules.network import CreateNetwork
from osmnx_mapping.modules.loader import CSVLoader
from osmnx_mapping.modules.preprocessing import CreatePreprocessor
from osmnx_mapping.modules.enricher import CreateEnricher
from osmnx_mapping.modules.visualiser import InteractiveVisualiser
from osmnx_mapping.pipeline import UrbanPipeline
# Define the pipeline
pipeline = UrbanPipeline([
("network", CreateNetwork()
.with_place("Manhattan, New York City, USA")
.with_mapping("node", "longitude", "latitude", "nearest_node")
.build()),
("load", CSVLoader(file_path="./pluto.csv")),
("impute", CreatePreprocessor().with_default_imputer().build()),
("filter", CreatePreprocessor().with_default_filter().build()),
("enrich", CreateEnricher()
.with_data(group_by="nearest_node", values_from="numfloors")
.aggregate_with(method="mean", output_column="avg_numfloors", target="edges")
.build()),
("viz", InteractiveVisualiser())
])
# Run the pipeline and visualise
data, graph, nodes, edges = pipeline.compose_transform("latitude", "longitude")
viz = pipeline.visualise("avg_numfloors", target="edges", colormap="Greens", tile_provider="CartoDB positron")
viz
# Optional: Save for later use –– This could further be explored in the `examples/` directory.
# pipeline.save("pluto_pipeline.joblib")
- Network: Builds Manhattan’s road network with node mappings.
- Load: Reads
pluto.csv
from your local directory. - Impute/Filter: Cleans missing values and bounds data to the network.
- Enrich: Computes average floors per street segment.
- Visualise: Creates an interactive Folium map.
- Save: Allows reuse with
UrbanPipeline.load("pluto_pipeline.joblib")
. Once more, see further in theexamples/
directory.
Note: Update the file path and column names (
latitude
,longitude
,numfloors
) to match your dataset.
Voila! 🥐 Whether you prefer the fine-grained control of the step-by-step approach or the concise reproducible urban pipeline, you’ve successfully mapped urban data to a street network, enriched it, and visualised the results. 🎉
Note
More advanced usage is possible—explore the API and examples/
directory for details!
Note
For more about future works, explore the issues
tab above!
- From labs to more general communities, we want to advance
OSMNxMapping
by attaining large unit-test coverage, integrating routines viaG.Actions
, and producing thorough documentation for users all around. - We are also looking at building a function f(X, set(Ys)) that could introduce a
MultiAggregatorEnricher
to handle multiple datasets –– yes, at the same time –– necessitating a rethink of visualisation approaches—brainstorming is underway. - Finally, we’re pondering
whether
X
, currently OSMNx street networks, could evolve to other urban networks, questioning if alternatives exist or if we might redefine networks beyond roads, with these discussions still in progress.
We'd be welcome to see more loader
, geo imputer
and geo filter
primitives to be pull requested, as well as
enricher
and visualiser
primitives to be extended. We are also looking forward to seeing more examples in the
examples/
directory, and we are happy to welcome you to contribute to the library 🎄
Important
The following project is fully python-typed safe and uses the great @beartype! It should reduce side effects and better library usability on the user end side.
Users familiar with data pipelines
will find the modular, scikit-learn-inspired
design of the OSMNxMapping
library
clear-cut. For others, believe us it is the way to go!
We offer a set of mixins that simplify difficult chores including data loading
, road network building
,
preprocessing
, enrichment
, and visualising
of enriched graph data. Your main interface is these mixins, which
neatly wrap the underlying modules for a flawless performance.
LoaderMixin – Load Your Urban Data
The LoaderMixin
handles loading geospatial data from files or DataFrames, converting it into a GeoDataFrame
for
further analysis.
[!NOTE]
Only .csv, .parquet, and shapefiles are supported for now. If you need additional formats, please let us know! Or pssst! You can contribute to the library by adding new loader primitive to theloader
module.
-
load_from_file(file_path, latitude_column="", longitude_column="")
- Purpose: Loads data from a file (CSV, Parquet, or Shapefile) into a
GeoDataFrame
. - Parameters:
file_path
(str): Path to the file.latitude_column
(str, optional): Name of the latitude column.longitude_column
(str, optional): Name of the longitude column.
- Returns: A
geopandas.GeoDataFrame
. - Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() # The loader module handles csv, parquet, and shapefiles as a factory that means, no need for you to worry about # the file format. data = mapping.loader.load_from_file("city_data.csv", latitude_column="lat", longitude_column="lon")
- Purpose: Loads data from a file (CSV, Parquet, or Shapefile) into a
-
load_from_dataframe(input_data, latitude_column, longitude_column)
-
Purpose: Converts a DataFrame to a
GeoDataFrame
using specified lat/lon columns. -
Parameters:
input_data
(pandas.DataFrame or geopandas.GeoDataFrame): The input data.latitude_column
(str): Latitude column name.longitude_column
(str): Longitude column name.
-
Returns: A
geopandas.GeoDataFrame
. -
Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() import pandas as pd df = pd.DataFrame({"lat": [40.7128], "lon": [-74.0060]}) geo_data = mapping.loader.load_from_dataframe(df, "lat", "lon")
Another example is if you are using Auctus loaded selected dataset:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() # Assuming you have loaded a dataset from Auctus into `new_data` geo_data = mapping.loader.load_from_dataframe(new_data, "lat", "lon")
-
NetworkMixin – Build and Map Road Networks
The NetworkMixin
lets you query road networks from OpenStreetMap and map data points to the nearest street nodes or edges.
-
network_from_place(place_name, network_type="drive", render=False)
- Purpose: Queries a road network for a specified place.
- Parameters:
place_name
(str): Location (e.g., "Manhattan, New York City, USA").network_type
(str, default="drive"): Type of network ("drive", "walk", "bike").render
(bool, default=False): If True, displays a plot of the network.
- Returns: A tuple (
networkx.MultiDiGraph
,geopandas.GeoDataFrame
,geopandas.GeoDataFrame
) of the graph, nodes, and edges. - Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() graph, nodes, edges = mapping.network.network_from_place("Manhattan, New York City, USA")
-
map_nearest_street(data, longitude_column, latitude_column, output_column="nearest_node", reset_output_column=False, **kwargs)
- Purpose: Maps data points to the nearest street nodes in the network.
- Parameters:
data
(geopandas.GeoDataFrame): Input data with lat/lon.longitude_column
(str): Longitude column name.latitude_column
(str): Latitude column name.output_column
(str, default="nearest_node"): Column to store node IDs.reset_output_column
(bool, default=False): Overwrite existing output column.**kwargs
: Additional parameters for OSMnx’snearest_nodes
.
- Returns: A
geopandas.GeoDataFrame
with mapped nodes. - Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() # Assuming data is a GeoDataFrame from previous steps (e.g., LoaderMixin) mapped_data = mapping.network.map_nearest_street(data, "lon", "lat")
[!NOTE]
When using the pipeline, it is recommended to configure the network using theCreateNetwork
factory, which allows specifying mappings for nearest nodes or edges. See the UrbanPipelineMixin section for details and examples.
PreprocessingMixin – Clean and Filter Data
The PreprocessingMixin
offers tools to handle missing values and filter data geographically.
[!IMPORTANT]
You cannot stack a filter with an imputer (or vice versa) in a singlePreprocessingMixin
instance. Each instance can only perform one action—either imputing or filtering. If you want to stack operations (e.g., impute then filter, or filter then impute), simply use the pipeline and create two steps—it’s as easy as that! See the UrbanPipelineMixin section for more details on chaining steps.
[!TIP]
Available imputers:
SimpleGeoImputer
: "Naively" drops rows with missing latitude or longitude values.AddressGeoImputer
: Fills missing lat/lon by geocoding an address column if available (requiresaddress_column_name
).
> Available filter:BoundingBoxFilter
: Keeps only data points within the bounding box of the road network’s nodes (requiresnodes
).
-
with_imputer(imputer_type, latitude_column_name=None, longitude_column_name=None, **extra_params)
- Purpose: Configures an imputer to handle missing lat/lon values.
- Parameters:
imputer_type
(str): Imputer type (e.g., "SimpleGeoImputer", "AddressGeoImputer").latitude_column_name
(str, optional): Latitude column name. If omitted and used within a pipeline, it will be set by the pipeline’scompose
method.longitude_column_name
(str, optional): Longitude column name. If omitted and used within a pipeline, it will be set by the pipeline’scompose
method.**extra_params
: Additional parameters (e.g.,address_column_name
for "AddressGeoImputer").
- Returns: The mixin instance for chaining.
- Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() mapping.preprocessing.with_imputer("SimpleGeoImputer", "lat", "lon")
-
with_default_imputer(latitude_column_name=None, longitude_column_name=None)
- Purpose: Uses a default imputer that drops rows with missing lat/lon.
- Parameters: Same as above, without
imputer_type
. - Returns: The mixin instance.
- Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() mapping.preprocessing.with_default_imputer("lat", "lon")
-
with_filter(filter_type, **extra_params)
- Purpose: Configures a filter (e.g., "BoundingBoxFilter").
- Parameters:
filter_type
(str): Filter type.**extra_params
: Filter-specific parameters (e.g.,nodes
for bounding box).
- Returns: The mixin instance.
- Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() # Assuming nodes is from network_from_place graph, nodes, edges = mapping.network.network_from_place("Manhattan, New York City, USA") mapping.preprocessing.with_filter("BoundingBoxFilter", nodes=nodes)
-
with_default_filter(nodes)
- Purpose: Uses a default filter to keep data within the road network’s bounding box.
- Parameters:
nodes
(geopandas.GeoDataFrame): Nodes from the road network defining the bounding box.
- Returns: The mixin instance.
- Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() # Assuming nodes is from network_from_place graph, nodes, edges = mapping.network.network_from_place("Manhattan, New York City, USA") mapping.preprocessing.with_default_filter(nodes)
-
transform(input_data)
- Purpose: Applies the configured imputer or filter to the data.
- Parameters:
input_data
(geopandas.GeoDataFrame): Data to preprocess.
- Returns: A preprocessed
geopandas.GeoDataFrame
. - Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() data = mapping.loader.load_from_file("city_data.csv", latitude_column="lat", longitude_column="lon") mapping.preprocessing.with_default_imputer("lat", "lon") cleaned_data = mapping.preprocessing.transform(data)
EnricherMixin – Enrich Your Network with Data
The EnricherMixin
is the core component of the library, empowering you to aggregate urban data (e.g., traffic counts, building heights) and map it onto a road network's nodes, edges, or both. It's designed for flexibility with advanced customization through the CreateEnricher
factory, while also offering a simpler default setup for standard use cases.
[!NOTE]
How the Enricher Works:
The enricher processes data in two key steps:
- Aggregation: It groups your data by a specified column that connects with the graph (e.g.,
nearest_node
followingmap_nearest_street(.)
) and applies an aggregation method likemean
,sum
, orcount
to compute values for each group. For example, it could sum traffic volumes per node.- Mapping: These aggregated values are then assigned to the network's nodes, edges, or both, based on the
target
parameter. For edges, a method likeaverage
,sum
,max
, ormin
is used to compute values from the connected nodes' values.
This process transforms raw data into meaningful insights mapped onto the road network, making it ideal for urban analysis tasks like traffic studies or accident mapping.
The CreateEnricher
factory (an alias for EnricherFactory
) is the primary and recommended way to configure enrichers. It offers a flexible, step-by-step approach to define how data is aggregated and mapped to the network, giving you full control over the enrichment process.
-
Key Methods:
with_data(group_by, values_from=None)
:- Purpose: Specifies the column to group data by (e.g.,
"nearest_node"
) and, optionally, the column containing values to aggregate (e.g.,"traffic"
). - Example:
enricher_factory = CreateEnricher().with_data(group_by="nearest_node", values_from="traffic")
- Purpose: Specifies the column to group data by (e.g.,
aggregate_with(method, edge_method='average', output_column=None, target='edges')
:- Purpose: Configures the aggregation method (e.g.,
"sum"
,"mean"
) and how aggregated values are mapped to the network. - Parameters:
method
(str): Aggregation method (e.g.,"mean"
,"sum"
,"median"
,"min"
,"max"
).edge_method
(str, optional, default="average"): Method to compute edge values (e.g.,"average"
,"sum"
,"max"
,"min"
). Only applicable iftarget
includes edges.output_column
(str, optional): Name of the output column in the target GeoDataFrame(s).target
(str, optional, default="edges"): Specifies where to apply the enrichment:"nodes"
,"edges"
, or"both"
. If"nodes"
, values are mapped directly to nodes; if"edges"
, values are computed for edges usingedge_method
; if"both"
, enrichment applies to both.
- Example:
enricher_factory = enricher_factory.aggregate_with(method="sum", edge_method="average", output_column="total_traffic", target="edges")
- Purpose: Configures the aggregation method (e.g.,
count_by(edge_method='sum', output_column=None, target='edges')
:- Purpose: Configures a counting aggregation (e.g., counting accidents per node), without needing a
values_from
column. - Parameters:
edge_method
(str, optional, default="sum"): Method to map counts to edges (iftarget
includes edges).output_column
(str, optional): Name of the output column.target
(str, optional, default="edges"): Specifies where to apply the enrichment:"nodes"
,"edges"
, or"both"
.
- Example:
enricher_factory = CreateEnricher().with_data(group_by="nearest_node").count_by(edge_method="sum", output_column="accident_count", target="both")
- Purpose: Configures a counting aggregation (e.g., counting accidents per node), without needing a
using_enricher(enricher_type)
:- Purpose: Selects a specific enricher type (currently, only
"SingleAggregatorEnricher"
is available). - Example:
enricher_factory = enricher_factory.using_enricher("SingleAggregatorEnricher")
- Purpose: Selects a specific enricher type (currently, only
preview(format="ascii")
:- Purpose: Displays a summary of the current configuration, helping you verify settings before building the enricher.
- Example:
print(enricher_factory.preview())
build()
:- Purpose: Constructs and returns the configured
EnricherBase
instance. - Example:
enricher = enricher_factory.build()
- Purpose: Constructs and returns the configured
-
Example (Full Configuration):
from osmnx_mapping.modules.enricher import CreateEnricher enricher = (CreateEnricher() .with_data(group_by="nearest_node", values_from="traffic") .aggregate_with(method="sum", edge_method="average", output_column="total_traffic", target="edges") .build())
[!TIP]
- Use
CreateEnricher
when you need full control over the enrichment process, such as experimenting with different aggregation methods or counting occurrences without a value column.- Call
preview()
beforebuild()
to verify your configuration and catch potential errors early.- The
target
parameter determines whether the enrichment is applied to nodes, edges, or both. Fortarget="both"
, the aggregated values are assigned to both nodes and edges, with edges using the specifiededge_method
.
If you do not need advanced customisation and prefer a quick setup with sensible defaults, the with_default
method in EnricherMixin
provides a convenient shortcut. It internally uses CreateEnricher
with predefined settings, making it ideal for standard use cases.
with_default(group_by_column, values_from_column, output_column="aggregated_value", method="mean", edge_method="average", target="edges")
- Purpose: Quickly configures a default enricher using
CreateEnricher
with predefined settings. - Parameters:
group_by_column
(str): Column to group by (e.g.,"nearest_node"
).values_from_column
(str): Column to aggregate (e.g.,"traffic"
).output_column
(str, optional): Name of the output column (default:"aggregated_value"
).method
(str, optional): Aggregation method (default:"mean"
).edge_method
(str, optional): Edge mapping method (default:"average"
).target
(str, optional): Specifies where to apply the enrichment (default:"edges"
).
- Returns: The
EnricherMixin
instance for method chaining. - Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() mapping.enricher.with_default("nearest_node", "traffic", method="sum", edge_method="average", target="both")
- Purpose: Quickly configures a default enricher using
[!TIP]
- Use
with_default
for standard use cases where you want a quick setup with minimal configuration.- If you need more control, switch to
CreateEnricher
for advanced customisation.
Once configured (using either CreateEnricher
or with_default
), the enricher can be applied to the network using the enrich_network
method.
enrich_network(input_data, input_graph, input_nodes, input_edges, **kwargs)
- Purpose: Applies the configured enricher to the road network, enriching the specified target (nodes, edges, or both) with aggregated data.
- Parameters:
input_data
(geopandas.GeoDataFrame): Dataset to enrich with.input_graph
(networkx.MultiDiGraph): Road network graph.input_nodes
(geopandas.GeoDataFrame): Network nodes.input_edges
(geopandas.GeoDataFrame): Network edges.**kwargs
: Additional options for custom enrichers.
- Returns: A tuple (
GeoDataFrame
,MultiDiGraph
,GeoDataFrame
,GeoDataFrame
) of enriched data, updated graph, updated nodes, and updated edges. Depending on thetarget
, the enrichment is applied to nodes, edges, or both. - Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() data = mapping.loader.load_from_file("city_data.csv", latitude_column="lat", longitude_column="lon") graph, nodes, edges = mapping.network.network_from_place("Manhattan, New York City, USA") mapping.enricher.with_default("nearest_node", "traffic", method="sum", edge_method="average", target="edges") enriched_data, graph, nodes, edges = mapping.enricher.enrich_network(data, graph, nodes, edges)
[!TIP]
- Counting Occurrences: Use
count_by
inCreateEnricher
to count events (e.g., accidents) per group without needing avalues_from
column.- Choosing Between Approaches: Start with
with_default
for simplicity, but switch toCreateEnricher
if you need advanced customisation or encounter limitations.- Multiple Enrichers: When using multiple enrichers in a pipeline, ensure each writes to a unique
output_column
to avoid overwriting data.
VisualMixin – Visualise Your Results
The VisualMixin
provides tools to visualise your enriched network. By default, it uses StaticVisualiser
for static Matplotlib plots, but you can pass any VisualiserBase
subclass (e.g., InteractiveVisualiser
for interactive Folium maps) to the constructor for custom visualisations.
[!TIP]
Available visualisers:
StaticVisualiser
: Generates a static Matplotlib plot of the network (default).InteractiveVisualiser
: Creates an interactive Folium map for exploration in a browser.
visualise(graph, nodes, edges, result_columns, target="edges", **kwargs)
- Purpose: Creates a visualisation of the enriched network using the configured visualiser.
- Parameters:
graph
(networkx.MultiDiGraph): The network graph.nodes
(geopandas.GeoDataFrame): Network nodes.edges
(geopandas.GeoDataFrame): Network edges.result_columns
(str or list of str): Column(s) to visualise. For static visualisers (e.g.,StaticVisualiser
), provide a single string (e.g.,"aggregated_value"
). For interactive visualisers (e.g.,InteractiveVisualiser
), provide a list of strings (e.g.,["column1", "column2"]
) to enable multi-layer visualisation with a dropdown selection.target
(str, default="edges"): Specifies what to visualise:"nodes"
,"edges"
, or"both"
. Determines whether the visualisation focuses on nodes, edges, or both, based on the enriched data inresult_columns
.**kwargs
: Visualisation parameters (e.g.,colormap="Blues"
forStaticVisualiser
, ortile_provider="CartoDB positron"
forInteractiveVisualiser
).
- Returns: A Matplotlib figure (for
StaticVisualiser
) or Folium map (forInteractiveVisualiser
), depending on the visualiser. - Example (Static Visualiser):
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() data = mapping.loader.load_from_file("city_data.csv", latitude_column="lat", longitude_column="lon") graph, nodes, edges = mapping.network.network_from_place("Manhattan, New York City, USA") mapping.enricher.with_default("nearest_node", "traffic", method="sum", target="edges") enriched_data, graph, nodes, edges = mapping.enricher.enrich_network(data, graph, nodes, edges) fig = mapping.visual.visualise(graph, nodes, edges, "aggregated_value", target="edges", colormap="Blues")
- Example (Interactive Visualiser):
import osmnx_mapping as oxm from osmnx_mapping.modules.visualiser.visualisers.interactive_visualiser import InteractiveVisualiser mapping = oxm.OSMNxMapping() data = mapping.loader.load_from_file("city_data.csv", latitude_column="lat", longitude_column="lon") graph, nodes, edges = mapping.network.network_from_place("Manhattan, New York City, USA") mapping.enricher.with_default("nearest_node", "traffic", method="sum", target="edges") enriched_data, graph, nodes, edges = mapping.enricher.enrich_network(data, graph, nodes, edges) # Use InteractiveVisualiser for multi-layer visualisation –– Note that here we assume "aggregated_value" and # "traffic_density" are columns in the enriched edges GeoDataFrame. fmap = mapping.visual(InteractiveVisualiser()).visualise( graph, nodes, edges, ["aggregated_value", "traffic_density"], target="edges", colormap="Greens", tile_provider="CartoDB positron" )
TableVisMixin – Interactive Data Exploration
The TableVisMixin
offers interactive table visualisations for your data within Jupyter notebooks using the great Skrub
library.
interactive_display(dataframe, n_rows=10, order_by=None, title="Table Report", column_filters=None, verbose=1)
- Purpose: Displays an interactive table for exploring your data.
- Parameters:
dataframe
(pandas.DataFrame or geopandas.GeoDataFrame): The data to display.n_rows
(int, default=10): Number of rows to show.order_by
(str or list, optional): Column(s) to sort by.title
(str, optional): Title of the table.column_filters
(dict, optional): Filters for specific columns.verbose
(int, default=1): Verbosity level.
- Returns: Displays the table (no return value).
- Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() data = mapping.loader.load_from_file("city_data.csv", latitude_column="lat", longitude_column="lon") mapping.table_vis.interactive_display(data, n_rows=5)
AuctusSearchMixin – Discover (Urban) Datasets
The AuctusSearchMixin
integrates with Auctus Search, allowing you to discover, profile, and load (urban) datasets directly into your OSMNxMapping workflow.
For detailed usage and examples, please refer to the Auctus Search README. In the meantime, here are the key methods for using AuctusSearchMixin with OSMNxMapping:
-
explore_datasets_from_auctus(search_query, page=1, size=10, display_initial_results=False)
- Purpose: Searches Auctus for datasets matching the query and optionally displays initial results.
- Parameters:
search_query
(str or list): Search term(s).page
(int, default=1): Page number (pagination).size
(int, default=10): Number of results per page.display_initial_results
(bool, default=False): If True, displays initial search results. Note that if you add.with_<action>
filtering from AuctusSearch, results display before filtering; use.display()
afterward to see filtered datasets.
- Returns: An
AuctusDatasetCollection
object. See more in the Auctus Search README.
-
profile_dataset_from_auctus()
- Purpose: Displays an interactive data profile summary of the selected dataset using the Data Profile Viz library.
- Parameters: None
- Returns: None (displays the profile interactively in the notebook)
- Example:
osmnx_mapping = OSMNxMapping() osmnx_mapping.explore_datasets_from_auctus("Taxis") # Select a dataset from the interactive results osmnx_mapping.profile_dataset_from_auctus() # Displays the profile using Data Profile Viz.
-
load_dataset_from_auctus(display_table=True)
- Purpose: Loads the selected dataset from Auctus after choosing one via "Select This Dataset" from the interactive search results. Afterward, you can use the OSMNxMapping Loader module’s
load_from_dataframe
method. - Parameters:
display_table
(bool, default=True): If True, displays a preview table usingSkrub
.
- Returns: A
pandas.DataFrame
orgeopandas.GeoDataFrame
.
- Purpose: Loads the selected dataset from Auctus after choosing one via "Select This Dataset" from the interactive search results. Afterward, you can use the OSMNxMapping Loader module’s
UrbanPipelineMixin – Chain Your Workflow
The UrbanPipelineMixin
enables you to chain multiple steps into a single, reproducible pipeline, modeled after scikit-learn’s Pipeline
.
[!IMPORTANT]
Pipeline Restrictions (per configuration):
- Exactly 1
NetworkBase
step (e.g.,OSMNxNetwork
or built usingCreateNetwork
).- Exactly 1
LoaderBase
step (e.g.,CSVLoader
).- 1 or more
EnricherBase
steps (e.g.,SingleAggregatorEnricher
).- 0 or 1
VisualiserBase
step.- 0 or more
GeoImputerBase
orGeoFilterBase
steps.
Steps must adhere to these constraints, or the pipeline will raise a validation error upon creation or execution.
[!NOTE]
When using multipleEnricherBase
steps, ensure each writes to a uniqueoutput_column
. If multiple enrichers target the sameoutput_column
, the last one executed will silently overwrite the others.
- Purpose: Constructs a pipeline from a list of (name, step) tuples, where each step is an instance of a supported base class.
- Parameters:
steps
(list of tuples): Steps to include, e.g.,[("loader", CSVLoader(...)), ("network", CreateNetwork().with_place(...).build()), ("enricher", CreateEnricher().with_data(...).build())]
.
- Returns: An
UrbanPipeline
object. - Example:
import osmnx_mapping as oxm from osmnx_mapping.modules.loader.loaders.csv_loader import CSVLoader from osmnx_mapping.modules.network import CreateNetwork from osmnx_mapping.modules.enricher import CreateEnricher mapping = oxm.OSMNxMapping() pipeline = mapping.urban_pipeline([ ("loader", CSVLoader(file_path="city_data.csv")), ("network", CreateNetwork().with_place("Manhattan, New York City, USA").with_mapping("node", "lon", "lat", "nearest_node").build()), ("enricher1", CreateEnricher() .with_data(group_by="nearest_node", values_from="traffic") .aggregate_with(method="sum", output_column="total_traffic", target="edges") .build()), ("enricher2", CreateEnricher() .with_data(group_by="nearest_node", values_from="incidents") .count_by(output_column="incident_count", target="nodes") .build()) ])
- Purpose: Configures the pipeline by setting latitude and longitude column names, which are propagated to all relevant steps (e.g., imputers, filters, network mappings) requiring geographic data.
- Parameters:
latitude_column_name
(str): Name of the latitude column in the input data.longitude_column_name
(str): Name of the longitude column in the input data.
- Example:
pipeline.compose("lat", "lon")
- Purpose: Executes the pipeline after
compose()
has been called, processing the data and returning the results. - Parameters: None (requires prior
compose()
call). - Returns: A tuple (
GeoDataFrame
,MultiDiGraph
,GeoDataFrame
,GeoDataFrame
) containing the processed data, network graph, nodes, and edges, respectively. - Example:
data, graph, nodes, edges = pipeline.transform()
- Purpose: Combines configuration and execution into a single step, configuring the pipeline and immediately processing the data.
- Parameters:
latitude_column_name
(str): Name of the latitude column.longitude_column_name
(str): Name of the longitude column.
- Returns: A tuple (
GeoDataFrame
,MultiDiGraph
,GeoDataFrame
,GeoDataFrame
) of processed data, graph, nodes, and edges. - Example:
data, graph, nodes, edges = pipeline.compose_transform("lat", "lon")
- Purpose: Visualises the pipeline’s output using the configured
VisualiserBase
step (if present). - Parameters:
result_columns
(str or list of str): Column(s) to visualise. Use a single string (e.g.,"total_traffic"
) for static visualisers (e.g.,StaticVisualiser
). Use a list of strings (e.g.,["total_traffic", "incident_count"]
) for interactive visualisers (e.g.,InteractiveVisualiser
) supporting multi-layer singular visualisation.**kwargs
: Additional visualisation options (e.g.,colormap="Blues"
,tile_provider="CartoDB positron"
).
- Returns: A plot (e.g., Matplotlib figure) for static visualisers or an interactive map for interactive visualisers.
- Note: Passing a list to
result_columns
with a static visualiser will raise an error. Ensure the type matches the visualiser used. - Example:
# For a static visualiser fig = pipeline.visualise("total_traffic", colormap="Blues") # For an interactive visualiser fmap = pipeline.visualise(["total_traffic", "incident_count"], colormap="Greens", tile_provider="CartoDB positron")
- Purpose: Saves the pipeline to a file or loads a previously saved pipeline for reuse.
- Parameters:
filepath
(str): Path to the file (e.g.,"my_pipeline.joblib"
).
- Example:
pipeline.save("my_pipeline.joblib") loaded_pipeline = UrbanPipeline.load("my_pipeline.joblib")
named_steps
: Access pipeline steps by name, e.g.,pipeline.named_steps["loader"]
.get_step_names()
: Returns a list of all step names in the pipeline.get_step(name)
: Retrieves a specific step by its name.get_params(deep=True)
: Intended to return all pipeline parameters (not yet implemented).set_params(**kwargs)
: Intended to update pipeline parameters (not yet implemented).
[!NOTE]
Theget_params
andset_params
methods are planned features and are not functional in the current release.
Important
Full documentation is forthcoming; Hence, expect some breaking changes in the API – Bare wth us a doc is cooking-up!
Check out the examples/
directory in the OSMNxMapping repo for more
detailed Jupyter notebook examples.
OSMNxMapping
is released under the MIT Licence.