Urban Grammar AI research project

We know little about how the way we organise cities over space influences social, economic and environmental outcomes, in part because it is hard to measure.

Satellite imagery, combined with cutting-edge AI, can provide a source of data to track the evolution of the built environment at unprecedented detail.

This project develops a conceptual framework to characterise urban structure through the notions of spatial signatures and urban grammar, and will deploy it to generate open data products and insight about the evolution of cities.

GDSL University of Liverpool ATI


  • 04 June - Google Summer of Code

    Members of the Urban Grammar project are getting involved in developing the next generation set of tools for distributing processing of geospatial vector data. In its first part, the Urban Grammar project heavily depends on the processing of vector geospatial data using GeoPandas Python library. However, to scale GeoPandas algorithms to the extent of Great Britain, we need to do more than the library can do by default. GeoPandas operations are currently all single-threaded, severely limiting the scalability of its usage and leaving most of the CPU cores just laying around, doing nothing. Dask is a library that brings parallel and distributed computing to the ecosystem. For example, it provides a Dask DataFrame that consists of partitioned pandas DataFrames. Each partition can be processed by a different process enabling the computation to be done in parallel or even out-of-core.

    We are using Dask within our workflows in bespoke scripts. However, Dask could provide ways to scale geospatial operations in GeoPandas in a similar way it does with pandas. There has been some effort to build a bridge between Dask and GeoPandas, currently taking the shape of the dask-geopandas library. While that already supports basic parallelisation, which we used in our code, some critical components are not ready yet. That should change during this summer within the Google Summer of Code project Martin is (co-)mentoring. We hope that this effort will allow us to significantly simplify and even speed up the custom machinery we built to create spatial signatures in WP2.

  • 29 April - Second Advisory Board

    On April 15th. 2021, we held the second meeting of the Advisory Board for the project. We are delighted that all board members joined us on Zoom for a few hours of exciting discussions on the progress and the future of the project.

    Dani started with an overview of our progress since the last meeting, which you can check in his UBDC talk. We followed by the focused discussion on the concepts of Spatial Signature and Enclosed Tessellation and our initial paper illustrating both on the sample of cities worldwide. We discussed the clarity of our ideas and the needs for new spatial units and classification methods, and their potential drawbacks and enhancements. In the last part, we tried to zoom out to see a bigger picture and fit the research within existing projects within academia and the public sector.

    After three hours of a very fruitful discussion, we finished with a lot of food for thought and ideas to be explored in the future. Let’s just hope that the Advisory Board meetings will soon happen physically in Liverpool to have an even more productive and friendly environment!

  • 27 April - Clustergam: visualisation of cluster analysis

    In this post, we introduce a new Python package to generate clustergrams from clustering solutions. The library has been developed as part of the Urban Grammar project, and it is compatible with scikit-learn and GPU-enabled libraries such as cuML or cuDF within RAPIDS.AI.

    When we want to do some cluster analysis to identify groups in our data, we often use algorithms like K-Means, which require the specification of a number of clusters. But the issue is that we usually don’t know how many clusters there are.

    There are many methods on how to determine the correct number, like silhouettes or elbow plot, to name a few. But they usually don’t give much insight into what is happening between different options, so the numbers are a bit abstract.

    Matthias Schonlau proposed another approach - a clustergram. Clustergram is a two-dimensional plot capturing the flows of observations between classes as you add more clusters. It tells you how your data reshuffles and how good your splits are. Tal Galili later implemented clustergram for K-Means in R. And I have used Tal’s implementation, ported it to Python and created clustergram - a Python package to make clustergrams.

    clustergram currently supports K-Means and using scikit-learn (inlcuding Mini-Batch implementation) and RAPIDS.AI cuML (if you have a CUDA-enabled GPU), Gaussian Mixture Model (scikit-learn only) and hierarchical clustering based on scipy.hierarchy. Alternatively, we can create clustergram based on labels and data derived from alternative custom clustering algorithms. It provides a sklearn-like API and plots clustergram using matplotlib, which gives it a wide range of styling options to match your publication style.

  • 26 April - Spatial Signatures at the Spatial Analytics + Data Seminar Series

    On March 30th, Martin presented the current progress in the development of Spatial Signatures at the Spatial Analytics + Data Seminar Series organised by the University of Newcastle, the University of Bristol and the Alan Turing Institute. Martin presented the basics of urban morphometrics, showing examples of relevant research based on the momepy Python package to provide a background for the second part of the talk focusing on Spatial Signatures. The research was well received and initiated a great discussion, which we hope will continue on some other platforms soon.

    The recording is available on Youtube:

    The slides used are here or, if you prefer it, here is the PDF file (27Mb).

  • 08 April - Spatial Signatures at UBDC

    On March 16th, Dani presented ongoing work on the development of Spatial Signatures at the University of Glasgow’s Urban Big Data Centre webinar series. This was the first time we took the project on tour and it was very well received. With an audience that peaked at about 75 folks and many great questions that extended the session well over one hour, we are super happy with how the foundational ideas of the project were received.

    The link to the webinar, for posterity, is here and you can have a look at an edited version of the video at:

    The slides used are here or, if you prefer it, here is the PDF file (15Mb).

  • 17 February - Visual style and a graphics package

    Any work of the size of our Urban Grammar AI project has many outputs. All of them should ideally share the same design language, so once we combine them, they tell a coherent story. Therefore, we have defined a visual style applied to any graphical output we will produce.

    We have started with a basic colourmap. A significant part of our work will result in categorical maps, so we need diverse colours. We have looked back at the excellent cartography our predecessors produced and found a gem. A study of wage and nationality in Chicago by Jane Addams and Florence Kelley from 1895 resulted in a series of beautiful maps like this one:

    Hull House Maps and Papers (1895) by Jane Addams and Florence Kelley.

    We based our primary colours on the six colours you can see on this map. This colour map offers a variety while retaining readability, colours nicely play with each other and, importantly, are colour blind safe.

    Primary colours derived from Adams and Kelly.

  • 16 February - Contribution to tobler: Speeding up areal interpolation

    Have you ever needed to link two sources of data, each attached to a different geometry? In our work in the WP2, we do. We have to transfer data from various sources, linked to output areas, urban blocks or other spatial units to our own bespoke set of geographies. Therefore, we often need to do areal interpolation to correctly map data from one layer to another. Luckily, the open-source Python ecosystem can help.

    Tobler, a part of PySAL family, is a library for areal interpolation and dasymetric mapping which already offered what we needed. However, our data tends to be large, up to 15 million rows on which we need to interpolate several hundreds of thousands of rows of input data. That can take a while, so each performance improvement can help a lot.

    We have looked into the existing code of tobler and contributed to the refactoring if its area_interpolate function. The original implementation was using custom code for spatial indexing, which was replaced by a performant vectorised implementation based on the more recent pygeos project.

    With sample data, we’ve been able to speed up the interpolation from 2.4 seconds to less than 400 milliseconds, having the same result 6x faster.

    Such an improvement is great, but it still uses only a single core (as most of the geospatial code to be honest), leaving the rest in a modern computer (four or more cores is not uncommon these days…) lazily laying around. We have tried to change this and contributed a (still experimental) parallel implementation of the same algorithm (based on joblib).

  • 23 November - 30DayMapChallenge 2020

    On November 1st Topi Tjukanov started a #30DayMapChallenge 2020 - one day, one map, one theme. Because it is a lot of fun, the Geographic Data Science Lab wanted to be a part of it and on 23rd day, it was our turn.

    Since the topic was boundaries, we decided to share with you the process of creation of boundaries of morpohlogical tessellation - the (smallest) spatial unit used in urban morphometrics.

    Five cities, five different urban patterns. Morphological tessellation is in principle Voronoi tessellation based on building footprint polygons. In practice, we first shrink our polygons (you need a gap between adjacent buildings) by a small margin, then generate a dense array of points along the polygon boundary which is passed to Voronoi algorithm. Finally, resulting polygons are dissolved based on the building it belongs to, and morphological tessellation is done. See by yourself how each step looks and compares across different patterns on a matrix below. If you click on the image, you can see the full resolution (16.2 MB).

    Do you want to play with the algorithm and create your own sequences? We have a notebook just for you! And if you’re going to generate tessellation on your data, momepy has you covered. For further details head to Martin’s blog post about a paper on tessellation he has published earlier this year.

    Stay tuned for new advances in this space!

  • 02 November - Implementing morphological functions to the ecosystem

    This blog post covers some of the efforts the team has made to contribute code developed for the project to the broader Python eco-system for (geographic) data science. Processing of data within WP1 and morphometric assessment within WP2 entail the development of new bespoke algorithms and implementation of some which are currently available in the Python ecosystem. However, even those already existing were often not performant enough for the scale of this project.

    As part of the data processing stage of the project, we have refactored some of them to gain the performance enhancements we needed. Since we strongly believe in replicability of research, all software developed within Urban Grammar AI project should be available for other researchers, optimally packaged in a friendly shape of a Python library. At the same time, we want to support open-source software which we use for the research.

    We think the natural approach is to include enhancements made within the area of urban morphometrics to momepy an existing toolkit for urban morphology. WP2 heavily builds on momepy’s code and every relevant piece of code we made is now merged back into momepy. That covers both performance-focused changes to implementation (#219, #209, #207, #205), mostly based on pygeos and vectorization, and new additions.

    Two key features of Spatial Signatures, the concepts of enclosures and enclosed tessellation are now available in momepy.elements module and you can create both using only a few lines of code:

    See the detailed guidance in momepy’s documentation.

  • 01 October - First Advisory Board

    On October 1st. 2020, Dani and Martin held the first meeting of the Advisory Board for the project. We are thrilled to have a board that includes Alistair Edwardes, Rachel Franklin, Isabel Sargent and Antonio Miguel Vieira Monteiro.

    The meeting took place, as it’s become customary in 2020, on Zoom. Dani provided an overview of the main components of the project, and Martin updated on progress so far; throughout the three hours of the meeting, there was plenty of discussion and great questions about what the project is trying to do and how it’ll tackle its main challenges. This is by no means a full replacement of the physical meeting we would have had in Liverpool for a full day, but it was an excellent way to connect and kickstart the role of the Advisory Board.

    One of the conclusions from the discussion was that we might adapt to the current situation by trying to have these meetings a bit more frequent (initially only four were scheduled for the entire project) and of shorter duration than a full day (maybe up to three hours). This will allow us to focus on specific aspects of the project for every meeting. Next one will hopefully take place early in the next year and, by then, we might even have something in the form of deliverables to show!

  • 01 July - Welcome to Martin Fleischmann

    Martin Fleischmann The project is thrilled to welcome Martin Fleischmann as the postdoctoral researcher who will work with me (Dani) for the next two years of the Fellowship. Here is a quick bio of Martin:

    Martin Fleischmann is research associate in the Geographic Data Science Lab at the University of Liverpool and a member of the Urban Design Studies Unit at the University of Strathclyde. His research focuses on urban morphology and geographic data science focusing on quantitative analysis and classification of urban form, remote sensing and AI.

    He is the author of momepy, the open source urban morphology measuring toolkit for Python and member of development teams of GeoPandas, the open source Python package for geographic data and PySAL, the Python library for spatial analysis.

    Martin brings with him a theoretical background in urban morphology combined with a lot of experience in Python open source development around geospatial. There could not be a better combination for the project. As the postdoctoral researcher, Martin will be heavily involved in the implementation of much of the code required to develop the idea of Spatial Signatures, teach a computer to recognise them from satellite imagery, and use them to develop an Urban Grammar. At the same time, he is also joining the Geographic Data Science Lab and getting involved in its day to day life, participating in internal seminars, coordinating the Brown Bags series and, more generally, chipping in where possible to make the lab a great place to be part of.

    Welcome Martin, this will be a fun ride!

  • 15 April - Firing up the engines

    This is the first post going out form the Urban Grammar project. We will use this blog to keep track of progress on the project and to announce milestones we are reaching along the way. If you are interested in cities, satellites and AI, keep an eye on the blog and feel free to get in touch with either Dani or Martin!