Urban morphometrics

Morpohometric assessment measure wide range of characters of urban form to derive a complex description of built-up patterns composed of enclosed tessellation, buildings and street network.

All algorithms used within this notebook are part of momepy Python toolkit and can be used from there. We have extracted them from momepy, adapted for dask and pygeos and used in raw form tailored directly to our use case. The algorithms which were enhanced are pushed back to momepy and will be part of momepy 0.4.0.

All steps within this notebook are parallelised using dask. The first part, which measures aspects of individual elements (does not require to know the context) uses pre-release of dask-geopandas. The rest uses dask to manage parallel iteration over geo-chunks with single-core algorithms.

Some functions are imported from a momepy_utils.py file stored wihtin this directory. Those are either helper functions taken directly from momepy or their enhanced versions, all which will be included in the next release of momepy:

  • get_edge_ratios is implemented in momepy 0.4.0 as get_network_ratio

  • get_nodes is included in get_node_id

  • remaining functions have been used to refactor existing momepy classes.

Individual elements

Note: Requires dask-geopandas and current master of geopandas to support dask version or gds_py:6.0.

# !pip install git+git://github.com/jsignell/dask-geopandas.git
# !pip install git+git://github.com/geopandas/geopandas.git
import time
import warnings
from time import time

import dask.dataframe as dd
import dask_geopandas as dask_geopandas
import geopandas
import libpysal
import momepy
import networkx as nx
import numpy as np
import pandas as pd
import pygeos
import scipy
from tqdm.notebook import tqdm
from dask.distributed import Client, LocalCluster, as_completed
from libpysal.weights import Queen
from momepy_utils import (
    _circle_radius,
    centroid_corner,
    elongation,
    get_corners,
    get_edge_ratios,
    get_nodes,
    solar_orientation_poly,
    squareness,
)

We are using a single machine wihtin this notebook with 14 cores, so we start local dask cluster with 14 workers.

client = Client(LocalCluster(n_workers=14))
client

dask-geopandas is still under development and raises few warnigns at the moment, all which can be ignored.

warnings.filterwarnings('ignore', message='.*initial implementation of Parquet.*')
warnings.filterwarnings('ignore', message='.*Assigning CRS to a GeoDataFrame without a geometry*')

Measuring buildings and enclosed cells

In the first step, we iterate over geo-chunks, merge enclosed tessellation and buildings to a single geopandas.GeoDataFrame and convert it to dask.GeoDataFrame. The rest of the code is mostly an extraction from momepy source code adapted for dask.

for chunk_id in tqdm(range(103), total=103):
    
    # Load data and merge them together
    blg = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/buildings/blg_{chunk_id}.pq")
    tess = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/tessellation/tess_{chunk_id}.pq")
    
    blg = blg.rename_geometry('buildings')
    tess = tess.rename_geometry('tessellation')

    df = tess.merge(blg, on='uID', how='left')
    
    # Convert to dask.GeoDataFrame
    ddf = dask_geopandas.from_geopandas(df, npartitions=14)
    
    ## Measure morphometric characters
    # Building area
    ddf['sdbAre'] = ddf.buildings.area
    
    # Building perimeter
    ddf['sdbPer'] = ddf.buildings.length
    
    # Courtyard area
    exterior_area = ddf.buildings.map_partitions(lambda series: pygeos.area(pygeos.polygons(series.exterior.values.data)), meta='float')
    ddf['sdbCoA'] = exterior_area - ddf['sdbAre']

    # Circular compactness
    hull = ddf.buildings.convex_hull.exterior

    radius = hull.apply(lambda g: _circle_radius(list(g.coords)) if g is not None else None, meta='float')
    ddf['ssbCCo'] = ddf['sdbAre'] / (np.pi * radius ** 2)

    # Corners
    ddf['ssbCor'] = ddf.buildings.apply(lambda g: get_corners(g), meta='float')

    # Squareness
    ddf['ssbSqu'] = ddf.buildings.apply(lambda g: squareness(g), meta='float')
    
    # Equivalent rectangular index
    bbox = ddf.buildings.apply(lambda g: g.minimum_rotated_rectangle if g is not None else None, meta=geopandas.GeoSeries())
    ddf['ssbERI'] = (ddf['sdbAre'] / bbox.area).pow(1./2) * (bbox.length / ddf['sdbPer'])

    # Elongation
    ddf['ssbElo'] = bbox.map_partitions(lambda s: elongation(s), meta='float')
    
    # Centroid corner mean distance and deviation
    def _centroid_corner(series):
        ccd = series.apply(lambda g: centroid_corner(g))
        return pd.DataFrame(ccd.to_list(), index=series.index)

    
    ddf[['ssbCCM', 'ssbCCD']] = ddf.buildings.map_partitions(_centroid_corner, meta=pd.DataFrame({0: [0.1], 1: [1.1]}))
    
    # Solar orientation
    ddf['stbOri'] = bbox.apply(lambda g: solar_orientation_poly(g), meta='float')
    
    # Tessellation longest axis length
    hull = ddf.tessellation.convex_hull.exterior

    ddf['sdcLAL'] = hull.apply(lambda g: _circle_radius(list(g.coords)), meta='float') * 2
    
    # Tessellation area
    ddf['sdcAre'] = ddf.tessellation.area
    
    # Circular compactness
    radius = hull.apply(lambda g: _circle_radius(list(g.coords)), meta='float')
    ddf['sscCCo'] = ddf['sdcAre'] / (np.pi * radius ** 2)
    
    # Equivalent rectangular index
    bbox = ddf.tessellation.apply(lambda g: g.minimum_rotated_rectangle, meta=geopandas.GeoSeries())
    ddf['sscERI'] = (ddf['sdcAre'] / bbox.area).pow(1./2) * (bbox.length / ddf.tessellation.length)
    
    # Solar orientation
    ddf['stcOri'] = bbox.apply(lambda g: solar_orientation_poly(g), meta='float')
    
    # Covered area ratio
    ddf['sicCAR'] = ddf['sdbAre'] / ddf['sdcAre']
    
    # Building-cell alignment
    ddf['stbCeA'] = (ddf['stbOri'] - ddf['stcOri']).abs()
    
    # Compute all characters using dask
    df = ddf.compute()
    
    # Save to parquet file
    df.to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
    client.restart()
    time.sleep(5)

Measuring enclosures

All enclosures are loaded as a single dask.GeoDataFrame and measured at once.

%%time
# Load data
encl = dask_geopandas.read_parquet("../../urbangrammar_samba/spatial_signatures/enclosures/encl_*.pq")

# Area
encl['ldeAre'] = encl.geometry.area

# Perimeter
encl['ldePer'] = encl.geometry.length

# Circular compacntess
hull = encl.geometry.convex_hull.exterior

radius = hull.apply(lambda g: _circle_radius(list(g.coords)) if g is not None else None, meta='float')
encl['lseCCo'] = encl['ldeAre'] / (np.pi * radius ** 2)

# Equivalent rectangular index
bbox = encl.geometry.apply(lambda g: g.minimum_rotated_rectangle if g is not None else None, meta=geopandas.GeoSeries())
encl['lseERI'] = (encl['ldeAre'] / bbox.area).pow(1./2) * (bbox.length / encl['ldePer'])

# Compactness-weighted axis
longest_axis = hull.apply(lambda g: _circle_radius(list(g.coords)), meta='float') * 2
encl['lseCWA'] = longest_axis * ((4 / np.pi) - (16 * encl['ldeAre']) / ((encl['ldePer']) ** 2))

# Solar orientation
encl['lteOri'] = bbox.apply(lambda g: solar_orientation_poly(g), meta='float')

# Compute data and return geopandas.GeoDataFrame
encl_df = encl.compute()

# Weighted number of neighbors
inp, res = encl_df.sindex.query_bulk(encl_df.geometry, predicate='intersects')
indices, counts = np.unique(inp, return_counts=True)
encl_df['neighbors'] = counts - 1
encl_df['lteWNB'] = encl_df['neighbors'] / encl_df['ldePer']

# Load complete enclosed tessellation as a dask.GeoDataFrame
tess = dd.read_parquet("../../urbangrammar_samba/spatial_signatures/tessellation/tess_*.pq")

# Measure weighted cells within enclosure
encl_counts = tess.groupby('enclosureID').count().compute()
merged = encl_df[['enclosureID', 'ldeAre']].merge(encl_counts[['geometry']], how='left', on='enclosureID')
encl_df['lieWCe'] = merged['geometry'] / merged['ldeAre']

# Save data to parquet
encl_df.drop(columns='geometry').to_parquet("../../urbangrammar_samba/spatial_signatures/morphometrics/enclosures.pq")

We can now close dask client.

client.close()

Generate spatial weights (W)

Subsequent steps will require understanding of the context of each tessellation cell in a form of spatial weights matrices (Queen contiguity and Queen contiguty of inclusive 3rd order). We generate them beforehand and store as npz files representing sparse matrix.

Each geo-chunk is loaded together with relevant cross-chunk tessellation cells (to avoid edge effect). We use dask to parallelise the iteration. Number of workers is smaller now to ensure enough memory for each chunk.

workers = 8
client = Client(LocalCluster(n_workers=workers, threads_per_worker=1))
client

First we have to specify a function doing the processing itself, where the only attribure is the chunk_id.

def generate_w(chunk_id):
    # load cells of a chunk
    cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
    
    # add neighbouring cells from other chunks
    cross_chunk_cells = []
    
    for chunk, inds in cross_chunk.loc[chunk_id].indices.iteritems():
        add_cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk}.pq").iloc[inds]
        cross_chunk_cells.append(add_cells)
    
    df = cells.append(pd.concat(cross_chunk_cells, ignore_index=True), ignore_index=True)

    w = libpysal.weights.Queen.from_dataframe(df, geom_col='tessellation')
    w3 = momepy.sw_high(k=3, weights=w)
    
    scipy.sparse.save_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w_{chunk_id}.npz", w.sparse)
    scipy.sparse.save_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w3_{chunk_id}.npz", w3.sparse)
    
    return f"Chunk {chunk_id} processed sucessfully."

Then we use dask to iterate over all 103 chunks. The following script sends first 8 chunks to dask together and then submits a new chunk as soon as any of previous finishes (courtesy of Matthew Rocklin). That way we process only 8 chunks at once ensuring that we the cluster will not run out of memory.

%%time
inputs = iter(range(103))
futures = [client.submit(generate_w, next(inputs)) for i in range(workers)]
ac = as_completed(futures)
for finished_future in ac:
    # submit new future 
    try:
        new_future = client.submit(generate_w, next(inputs))
        ac.add(new_future)
    except StopIteration:
        pass
    print(finished_future.result())
client.close()

Spatial distribution and network analysis

To measure spatial distribution of we use single-core algorithm and parallelise iteration.

workers = 8
client = Client(LocalCluster(n_workers=workers, threads_per_worker=1))
client

We will need to load street network data from PostGIS datatabase, so we establish a connection which will be used within the loop.

cross_chunk = pd.read_parquet('../../urbangrammar_samba/spatial_signatures/cross-chunk_indices.pq')
chunks = geopandas.read_parquet('../../urbangrammar_samba/spatial_signatures/local_auth_chunks.pq')

user = os.environ.get('DB_USER')
pwd = os.environ.get('DB_PWD')
host = os.environ.get('DB_HOST')
port = os.environ.get('DB_PORT')

db_connection_url = f"postgres+psycopg2://{user}:{pwd}@{host}:{port}/built_env"

Within the same function below we measure spatial distribution of elements and network-based characters.

def measure(chunk_id):
    # load cells of a chunk
    cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
    cells['keep'] = True
    
    # add neighbouring cells from other chunks
    cross_chunk_cells = []
    
    for chunk, inds in cross_chunk.loc[chunk_id].indices.iteritems():
        add_cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk}.pq").iloc[inds]
        add_cells['keep'] = False
        cross_chunk_cells.append(add_cells)
    
    df = cells.append(pd.concat(cross_chunk_cells, ignore_index=True), ignore_index=True)

    # read W
    w = libpysal.weights.WSP(scipy.sparse.load_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w_{chunk_id}.npz")).to_W()
    
    # alignment
    def alignment(x, orientation='stbOri'):
        orientations = df[orientation].iloc[w.neighbors[x]]
        return abs(orientations - df[orientation].iloc[x]).mean()
    
    df['mtbAli'] = [alignment(x) for x in range(len(df))]

    # mean neighbour distance
    def neighbor_distance(x):
        geom = df.buildings.iloc[x]
        if geom is None:
            return np.nan
        return df.buildings.iloc[w.neighbors[x]].distance(df.buildings.iloc[x]).mean()

    df['mtbNDi'] = [neighbor_distance(x) for x in range(len(df))]
    
    # weighted neighbours
    df['mtcWNe'] = pd.Series([w.cardinalities[x] for x in range(len(df))], index=df.index) / df.tessellation.length
    
    # area covered by neighbours
    def area_covered(x, area='sdcAre'):
        neighbours = [x]
        neighbours += w.neighbors[x]

        return df[area].iloc[neighbours].sum()

    df['mdcAre'] = [area_covered(x) for x in range(len(df))]
    
    # read W3
    w3 = libpysal.weights.WSP(scipy.sparse.load_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w3_{chunk_id}.npz")).to_W()
      
    # weighted reached enclosures
    def weighted_reached_enclosures(x, area='sdcAre', enclosure_id='enclosureID'):
        neighbours = [x]
        neighbours += w3.neighbors[x]

        vicinity = df[[area, enclosure_id]].iloc[neighbours]

        return vicinity[enclosure_id].unique().shape[0] / vicinity[area].sum()
    
    df['ltcWRE'] = [weighted_reached_enclosures(x) for x in range(len(df))]
    
    # mean interbuilding distance
    # define adjacency list from lipysal
    adj_list = w.to_adjlist(remove_symmetric=False)
    adj_list["weight"] = (
        df.buildings.iloc[adj_list.focal]
        .reset_index(drop=True)
        .distance(df.buildings.iloc[adj_list.neighbor].reset_index(drop=True)).values
    )

    G = nx.from_pandas_edgelist(
            adj_list, source="focal", target="neighbor", edge_attr="weight"
        )
    ibd = []
    for i in range(len(df)):
        try:
            sub = nx.ego_graph(G, i, radius=3)
            ibd.append(np.nanmean([x[-1] for x in list(sub.edges.data('weight'))]))
        except:
            ibd.append(np.nan)

    df['ltbIBD'] = ibd
    
    # Reached neighbors and area on 3 topological steps on tessellation
    df['ltcRea'] = [w3.cardinalities[i] for i in range(len(df))]
    df['ltcAre'] = [df.sdcAre.iloc[w3.neighbors[i]].sum() for i in range(len(df))]

    # Save cells to parquet keeping only within-chunk data not the additional neighboring
    df[df['keep']].drop(columns=['keep']).to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")

    # Load street network for an extended chunk area
    chunk_area = chunks.geometry.iloc[chunk_id].buffer(5000)  # we extend the area by 5km to minimise edge effect
    engine = create_engine(db_connection_url)
    sql = f"SELECT * FROM openroads_200803_topological WHERE ST_Intersects(geometry, ST_GeomFromText('{chunk_area.wkt}',27700))"
    streets = geopandas.read_postgis(sql, engine, geom_col='geometry')
    
    # Street profile (measures width, width deviation and openness)
    sp = street_profile(streets, blg)
    streets['sdsSPW'] = sp[0]
    streets['sdsSWD'] = sp[1]
    streets['sdsSPO'] = sp[2]
    
    # Street segment length
    streets['sdsLen'] = streets.length
    
    # Street segment linearity
    streets['sssLin'] = momepy.Linearity(streets).series
    
    # Convert geopadnas.GeoDataFrame to networkx.Graph for network analysis
    G = momepy.gdf_to_nx(streets)
    
    # Node degree
    G = momepy.node_degree(G)
    
    # Subgraph analysis (meshedness, proportion of 0, 3 and 4 way intersections, local closeness)
    G = momepy.subgraph(
        G,
        radius=5,
        meshedness=True,
        cds_length=False,
        mode="sum",
        degree="degree",
        length="mm_len",
        mean_node_degree=False,
        proportion={0: True, 3: True, 4: True},
        cyclomatic=False,
        edge_node_ratio=False,
        gamma=False,
        local_closeness=True,
        closeness_weight="mm_len",
        verbose=False
    )
    
    # Cul-de-sac length
    G = momepy.cds_length(G, radius=3, name="ldsCDL", verbose=False)
    
    # Square clustering
    G = momepy.clustering(G, name="xcnSCl")
    
    # Mean node distance
    G = momepy.mean_node_dist(G, name="mtdMDi", verbose=False)
    
    # Convert networkx.Graph back to GeoDataFrames and W (denoting relationships between nodes)
    nodes, edges, sw = momepy.nx_to_gdf(G, spatial_weights=True)
    
    # Generate inclusive higher order weights
    edges_w3 = momepy.sw_high(k=3, gdf=edges)
    
    # Mean segment length
    edges["ldsMSL"] = momepy.SegmentsLength(edges, spatial_weights=edges_w3, mean=True, verbose=False).series
    
    # Generate inclusive higher order weights
    nodes_w5 = momepy.sw_high(k=5, weights=sw)
    
    # Node density
    nodes["lddNDe"] = momepy.NodeDensity(nodes, edges, nodes_w5, verbose=False).series
    
    # Weighter node density
    nodes["linWID"] = momepy.NodeDensity(nodes, edges, nodes_w5, weighted=True, node_degree="degree", verbose=False).series
    
    # Save to parquets
    edges.to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/edges/edges_{chunk_id}.pq")
    nodes.to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/nodes/nodes_{chunk_id}.pq")


    return f"Chunk {chunk_id} processed sucessfully."

Again we use dask to iterate over all 103 chunks. The following script sends first 8 chunks to dask together and then submits a new chunk as soon as any of previous finishes. That way we process only 8 chunks at once ensuring that we the cluster will not run out of memory.

inputs = iter(range(103))
futures = [client.submit(measure, next(inputs)) for i in range(workers)]
ac = as_completed(futures)
for finished_future in ac:
    # submit new future 
    try:
        new_future = client.submit(measure, next(inputs))
        ac.add(new_future)
    except StopIteration:
        pass
    print(finished_future.result())
client.close()

Inter-element characters

The remaining morphometric characters are based on a relations between multiple elements. The implementation mirrors the approach above.

workers = 8
client = Client(LocalCluster(n_workers=workers, threads_per_worker=1))
client
def measure(chunk_id):
    s = time()
    # Load data
    cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
    edges = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/edges/edges_{chunk_id}.pq")
    nodes = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/nodes/nodes_{chunk_id}.pq")
    
    # Street Alignment
    edges['orient'] = momepy.Orientation(edges, verbose=False).series
    edges['edgeID'] = range(len(edges))
    keys = cells.edgeID_values.apply(lambda a: np.argmax(a))
    cells['edgeID_primary'] = [inds[i] for inds, i in zip(cells.edgeID_keys, keys)]
    cells['stbSAl'] = momepy.StreetAlignment(cells, 
                                             edges, 
                                             'stbOri', 
                                             left_network_id='edgeID_primary', 
                                             right_network_id='edgeID').series
   
    # Area Covered by each edge
    vals = {x:[] for x in range(len(edges))}
    for i, keys in enumerate(cells.edgeID_keys):
        for k in keys:
            vals[k].append(i)
    area_sums = []
    for inds in vals.values():
        area_sums.append(cells.sdcAre.iloc[inds].sum())
    edges['sdsAre'] = area_sums
    
    # Building per meter
    bpm = []
    for inds, l in zip(vals.values(), edges.sdsLen):
        bpm.append(cells.buildings.iloc[inds].notna().sum() / l if len(inds) > 0 else 0)
    edges['sisBpM'] = bpm
    
    # Cell area
    nodes['sddAre'] = nodes.nodeID.apply(lambda nid: cells[cells.nodeID == nid].sdcAre.sum())
    
    # Area covered by neighboring edges + count of reached cells
    edges_W = Queen.from_dataframe(edges)
    
    areas = []
    reached_cells = []
    for i in range(len(edges)):
        neighbors = [i] + edges_W.neighbors[i]
    #     areas
        areas.append(edges.sdsAre.iloc[neighbors].sum())
    #     reached cells
        ids = []
        for n in neighbors:
             ids += vals[n]
        reached_cells.append(len(set(ids)))

    edges['misCel'] = reached_cells
    edges['mdsAre'] = areas
    
    # Area covered by neighboring (3 steps) edges + count of reached cells
    edges_W3 = momepy.sw_high(k=3, weights=edges_W)
    
    areas = []
    reached_cells = []
    for i in range(len(edges)):
        neighbors = [i] + edges_W3.neighbors[i]
    #     areas
        areas.append(edges.sdsAre.iloc[neighbors].sum())
    #     reached cells
        ids = []
        for n in neighbors:
             ids += vals[n]
        reached_cells.append(len(set(ids)))

    edges['lisCel'] = reached_cells
    edges['ldsAre'] = areas

    # Link together 
    e_to_link = ['sdsAre', 'sisBpM', 'misCel', 'mdsAre', 'lisCel', 'ldsAre']
    n_to_link = 'sddAre'

    cells = cells.merge(nodes[['nodeID', 'sddAre']], on='nodeID', how='left')

    l = []
    for keys, values in zip(cells.edgeID_keys, cells.edgeID_values):
        l.append((edges.iloc[keys][e_to_link].multiply(values, axis='rows')).sum(axis=0))  # weighted by the proportion
    cells[e_to_link] = pd.DataFrame(l, index=cells.index)
    
    # Reached neighbors and area on 3 topological steps on tessellation
    cells['keep'] = True
    
    # add neighbouring cells from other chunks
    cross_chunk_cells = []
    
    for chunk, inds in cross_chunk.loc[chunk_id].indices.iteritems():
        add_cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk}.pq").iloc[inds]
        add_cells['keep'] = False
        cross_chunk_cells.append(add_cells)
    
    df = cells.append(pd.concat(cross_chunk_cells, ignore_index=True), ignore_index=True)
    w3 = libpysal.weights.WSP(scipy.sparse.load_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w3_{chunk_id}.npz")).to_W()
    
    # Reached cells in 3 topological steps
    df['ltcRea'] = [w3.cardinalities[i] for i in range(len(df))]
    
    # Reached area in 3 topological steps
    df['ltcAre'] = [df.sdcAre.iloc[w3.neighbors[i]].sum() for i in range(len(df))]
    
    # Save
    df[df['keep']].drop(columns=['keep']).to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
    
    return f"Chunk {chunk_id} processed sucessfully in {time() - s} seconds."
%%time
inputs = iter(range(103))
futures = [client.submit(measure, next(inputs)) for i in range(workers)]
ac = as_completed(futures)
for finished_future in ac:
    # submit new future 
    try:
        new_future = client.submit(measure, next(inputs))
        ac.add(new_future)
    except StopIteration:
        pass
    print(finished_future.result())
client.close()

At this point, all primary morphometric characters are measured and stored in a chunked parquet.

Convolution

Morphometric variables are an input of cluster analysis, which should result in delineation of spatial signatures. However, primary morphometric characters can’t be used directly. We have to understand them in context. For that reason, we introduce a convolution step. Each of the characters above will be expressed as a distance-decay-weighted first, second (median) and third quartile within 10 topological steps on enclosed tessellation. Distance is measured between centres of maximum inscribed circles of each geometry. Resulting convolutional data will be then used as an input of cluster analysis.

Generate weights of 10th order

cross_chunk = pd.read_parquet('../../urbangrammar_samba/spatial_signatures/cross-chunk_indices_10.pq')

def generate_w(chunk_id):
    s = time()
    # load cells of a chunk
    cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
    
    # add neighbouring cells from other chunks
    cross_chunk_cells = []
    
    for chunk, inds in cross_chunk.loc[chunk_id].indices.iteritems():
        add_cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk}.pq").iloc[inds]
        cross_chunk_cells.append(add_cells)
    
    df = cells.append(pd.concat(cross_chunk_cells, ignore_index=True), ignore_index=True)

    w = libpysal.weights.Queen.from_dataframe(df, geom_col='tessellation', silence_warnings=True)
    w10 = momepy.sw_high(k=10, weights=w)
    
    scipy.sparse.save_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w10_queen_{chunk_id}.npz", w.sparse)
    scipy.sparse.save_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w10_10_{chunk_id}.npz", w10.sparse)
    
    return f"Chunk {chunk_id} processed sucessfully in {time() - s} seconds."
# I am afraid that we would run out of memory if we did this in parallel
for i in tqdm(range(103), total=103):
    print(generate_w(i))

#### Generate distance weights within 10th order

def generate_distance_w(chunk_id):
    s = time()
    # load cells of a chunk
    points = geopandas.read_parquet(f'../../urbangrammar_samba/spatial_signatures/inscribed_circle/circle_{chunk_id}.pq')
    
    # add neighbouring cells from other chunks
    cross_chunk_cells = []
    
    for chunk, inds in cross_chunk.loc[chunk_id].indices.iteritems():
        add_cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/inscribed_circle/circle_{chunk}.pq").iloc[inds]
        cross_chunk_cells.append(add_cells)
    
    df = points.append(pd.concat(cross_chunk_cells, ignore_index=True), ignore_index=True)

    w = libpysal.weights.WSP(scipy.sparse.load_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w10_10_{chunk_id}.npz")).to_W()

    for i, (radius, geom) in enumerate(points[['radius', 'geometry']].itertuples(index=False)):
        neighbours = w.neighbors[i]
        vicinity = df.iloc[neighbours]
        distance = vicinity.distance(geom).to_list()
       
        distance.append(radius)
        w.neighbors[i].append(i)
        w.weights[i] = distance
    
    scipy.sparse.save_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w10_10_distance_circles_{chunk_id}.npz", w.sparse)
    
    return f"Chunk {chunk_id} processed sucessfully in {time() - s} seconds."
# I am afraid that we would run out of memory if we did this in parallel
for i in tqdm(range(103), total=103):
    print(generate_distance_w(i))

Measure distance-decay weighted convolutions

characters = ['sdbAre', 'sdbPer', 'sdbCoA', 'ssbCCo', 'ssbCor', 'ssbSqu', 'ssbERI', 'ssbElo', 
             'ssbCCM', 'ssbCCD', 'stbOri', 'sdcLAL', 'sdcAre', 'sscCCo', 'sscERI', 'stcOri', 
             'sicCAR', 'stbCeA', 'mtbAli', 'mtbNDi', 'mtcWNe', 'mdcAre', 'ltcWRE', 'ltbIBD', 
             'sdsSPW', 'sdsSWD', 'sdsSPO', 'sdsLen', 'sssLin', 'ldsMSL', 'mtdDeg', 'lcdMes', 
             'linP3W', 'linP4W', 'linPDE', 'lcnClo', 'ldsCDL', 'xcnSCl', 'mtdMDi', 'lddNDe', 
             'linWID', 'stbSAl', 'sddAre', 'sdsAre', 'sisBpM', 'misCel', 'mdsAre', 'lisCel', 
             'ldsAre', 'ltcRea', 'ltcAre', 'ldeAre', 'ldePer', 'lseCCo', 'lseERI', 'lseCWA', 
             'lteOri', 'lteWNB', 'lieWCe'
            ]

def convolute(chunk_id):
   
    s = time()
    cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
    cells['keep'] = True
    # add neighbouring cells from other chunks
    cross_chunk_cells = []

    for chunk, inds in cross_chunk.loc[chunk_id].indices.iteritems():
        add_cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk}.pq").iloc[inds]
        add_cells['keep'] = False
        cross_chunk_cells.append(add_cells)

    df = cells.append(pd.concat(cross_chunk_cells, ignore_index=True), ignore_index=True)

    # read W
    W = libpysal.weights.WSP(scipy.sparse.load_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w10_10_distance_circles_{chunk_id}.npz")).to_W()
 
    # prepare dictionary to store results
    convolutions = {}
    for c in characters:
        convolutions[c] = []
        
    # measure convolutions
    for i in range(len(df)):
        neighbours = W.neighbors[i]
        vicinity = df.iloc[neighbours]
        distance = W.weights[i]
        distance_decay = 1 / np.array(distance)
        
        for c in characters:
            values = vicinity[c].values
            sorter = np.argsort(values)
            values = values[sorter]
            nan_mask = np.isnan(values)
            if nan_mask.all():
                convolutions[c].append(np.array([np.nan] * 3))
            else:
                sample_weight = distance_decay[sorter][~nan_mask]
                weighted_quantiles = np.cumsum(sample_weight) - 0.5 * sample_weight
                weighted_quantiles /= np.sum(sample_weight)
                interpolate = np.interp([.25, .5, .75], weighted_quantiles, values[~nan_mask])
                convolutions[c].append(interpolate)
    
    # save convolutions to parquet file
    conv = pd.DataFrame(convolutions, index=df.index)
    exploded = pd.concat([pd.DataFrame(conv[c].to_list(), columns=[c + '_q1', c + '_q2',c + '_q3']) for c in characters], axis=1)
    exploded[df.keep].to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/convolutions/conv_{chunk_id}.pq")
        
    return f"Chunk {chunk_id} processed sucessfully in {time() - s} seconds."
# I am afraid that we would run out of memory if we did this in parallel
for i in tqdm(range(103), total=103):
    print(convolute(i))