Skip to content

Analysis scripts and tools for the register of buildings (RBD) completion

License

Notifications You must be signed in to change notification settings

swiss-territorial-data-lab/regbl-poc-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

This repository is related to the RegBL (RBD/GWR/RegBL) completion research project. The STDL was contacted by the Swiss Federal Statistical Office (OFS) to determine in which extend it could be possible to complete the construction date of Swiss buildings based on the analysis of a temporal sequence of the Swiss federal maps produced by swisstopo. With an initial target of 80% of correct guesses, the goal of this research project was to demonstrate the possibility to reach such goal using a reliable validation metric.

This repository holds the developed scripts used to analyse the results of the detection made through the research project primary processing pipeline.

Research Project Links

The following links give access to the codes related to the project :

The following links give access to official documentations on the considered data :

regbl-poc-analysis

In this repository are stored scripts used around the results of the primary pipeline to compute plots and to automate computation of representations. The plots are mainly created through octave/MATLAB codes while automation is mainly achieved through bash scripts.

In the following documentation, the main storage path always refers to the main storage directory of the primary pipeline, used to gather processing configuration and to export results. In addition, all the scripts can only be used on an already and fully processed main storage directory.

Plots Computation

Three main plots are available for computation. All the computed plots are automatically stored in the main storage path, in the analysis directory. The first one allows to displays the distribution of correct guesses along the time dimension. The usage is (octave prompt) :

> regbl_poc_analysis_metric( '.../regbl_process', '.../path/to/metric/file',
                             'Location Name', histogram_beam_size );

This first parameter simply gives the path of the primary pipeline storage directory. The second parameter has to provide a path to a text file containing all the building EGID used as a reference to compute the histogram. This file has to give a list of buildings valid EGID for which a construction date is available and sufficiently reliable to be considered as a ground truth. A content example of such file could be :

11102923
11114701
11114710
11114957
...

for the Biasca area. It is of the responsibility of the researcher to compose such validation list.

The location name provided as third parameter only gives the name of the studied area and is only used for the title of the plot. The last parameter has to be a non-zero positive number giving the size of the bins, in years, used for the histogram. A default value of 10 years is set in case the parameter is missing. The left image below gives an example of such plot.

The blue portion correspond to the correct guess while the red one complete it to one, thus giving the proportion of incorrect guesses. The line at 0.8 is the desired target while the line with two numbers gives the overall success and failure rate over the whole buildings set.

The second plot available through the scripts is used to compute a distance-based representation of the results. It shows the distribution of errors in the attribution of a construction date to the buildings (octave prompt) :

> regbl_poc_analysis_distance( '.../regbl_process', '.../path/to/metric/file',
                               'Location Name' );

The parameters are exactly the same as for the previous script. The right image above gives and example of the obtained plot.


Example of the histogram (left) and distance (right) plots - Biasca

On this plot, two distribution are shown : the blue distribution only shows building with a detection date within the range covered by the maps. The red distribution show the full buildings set distribution, assigning zero-valued distance above and beyond the last and first map. The grey zones indicate the mean temporal separation of maps around zero for the first (dark) and twice the separation for the second (light). The numbers shown in the zones indicates which proportion of the buildings are within these ranges in terms error on construction date.

The last available script allows to compute a graphical representation of the zones where correct and incorrect guesses are the most located (octave prompt) :

> regbl_poc_analysis_area( '.../regbl_process', '.../path/to/metric/file',
                           image_width, kernel_factor = 16 );

The two first parameters are the same as before. The two last parameters have to give the size of the images to compute and reduction factor used to determine the size of the spreading kernel according to the image size. By default, 16 is considered, meaning that the kernel will be 16 times smaller that the size of the image. A smaller factor leads to smoother representation while a greater one lead to more resolution, but requires more entries in the considered metric.

The output of this script is two transparent images, one for the correct guess rate and one for the incorrect one, that can be used as overlays on each of the considered maps of the temporal sequence. The following images give an illustration of the composition of the transparent images with maps :


Example of usage of the two transparent representation rate on top of a corresponding maps - Bern, 2010

As the script only exports the transparent overlays, additional work has to be done do superimpose them on a chosen map.

Automation Scripts

Three automation scripts are available to ease computation of result representation. The first one is used to compute images composing detection and maps. As the primary pipeline creates transparent overlays showing the detection or absence of each building for each of the considered maps, they can be superimposed to the maps themselves and to their segmentation (bash prompt) :

$ ./regbl-poc-analysis-overlay .../regbl_process

The only parameter is a path pointing to the desired processing directory. The results of the composition of the overlays with the maps and segmented maps are exported in the overlay sub-directory of analysis directory. The following image gives and crop of a resulting composition :


Detection overlay superimposed with its map (left) and its segmented counterpart (right) - Basel, 2000

The two last automation scripts are used to simplify the computation of timelines of specific buildings. They are both using the primary pipeline software to compute the timeline of the selected building. The usage is the following (bash prompt) :

$ ./regbl-poc-analysis-timeline-list .../main/storage/path .../path/to/list/file .../path/of/tracker

and :

$ ./regbl-poc-analysis-timeline-random .../main/storage/path count .../path/of/tracker

In both case, the first parameter is the main storage directory on which to apply the computation. For the list script, the second parameter has to be a path pointing to a file containing a list of valid buildings EGID for which a timeline has to be computed. In case of the random script, the list file is replaced by a whole number giving the amount of randomly buildings to select and extract a timeline. The last parameter is identical for both script an has to give the path of the tracker software (executable file) of the primary pipeline used to compute the timelines.

The following image gives and example of such timeline (see primary pipeline documentation) :


Example of timeline for a specific building of Basel city

Both script automatically exports the timelines in the analysis directory in the main storage directory. The list script uses the name of the list file to create the directory in which timelines are exported. The random script always exports its timeline in the timeline-random directory.

Results Exportation

A script is available to translate the detector output files into a single DSV file. The usage follows (octave prompt) :

> regbl_poc_analysis_todsv( '.../regbl_process', '.../path/to/output.dsv' );

This first parameter gives the path of the primary pipeline storage directory while the second gives the path of the DSV file to create using the detector data. The tabulation is used as a separator. The header of the created DSV file is :

EGID    GBAUJ_STDL_LOW    GBAUJ_STDL_HIGH

which gives the buildings EGID and the lower and upper boundaries of the detected construction date range as a date. In case the lower boundary is missing, it means that the building was detected as older that the oldest map. The opposite if the upper boundary is missing.

Copyright and License

regbl-poc-analysis - Nils Hamel, Huriel Richel
Copyright (c) 2020-2021 Republic and Canton of Geneva

This program is licensed under the terms of the GNU GPLv3. Documentation and illustrations are licensed under the terms of the CC BY 4.0.

Dependencies

The regbl-poc-analysis comes with the following package (Ubuntu 20.04 LTS) dependencies (Instructions) :

  • bash
  • imagemagick
  • octave
  • octave-image

The primary pipeline is also needed for some automation scripts, but without requiring it to be installed on the system.