From 840f211b6f9633b326509a2c383866517ba1ae1e Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Mon, 20 Jan 2025 23:50:37 +0000 Subject: [PATCH] style(pre-commit.ci): auto fixes from pre-commit hooks --- README.md | 4 ++-- detectree/utils.py | 3 +-- paper/paper.md | 12 ++++++------ 3 files changed, 9 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index 8be6177..63fb1b2 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ ## Overview -DetecTree is a Pythonic library to perform semantic segmentation of aerial imagery into tree/non-tree pixels, following the methods of Yang et al. \[1\]. A pre-trained model is available at [Hugging Face hub](https://huggingface.co/martibosch/detectree), which can be used as follows: +DetecTree is a Pythonic library to perform semantic segmentation of aerial imagery into tree/non-tree pixels, following the methods of Yang et al. [1]. A pre-trained model is available at [Hugging Face hub](https://huggingface.co/martibosch/detectree), which can be used as follows: ```python from urllib import request @@ -83,7 +83,7 @@ The target audience is researchers and practitioners in GIS that are interested Bosch M. 2020. “DetecTree: Tree detection from aerial imagery in Python”. *Journal of Open Source Software, 5(50), 2172.* [doi.org/10.21105/joss.02172](https://doi.org/10.21105/joss.02172) -Note that DetecTree is based on the methods of Yang et al. \[1\], therefore it seems fair to reference their work too. An example citation in an academic paper might read as follows: +Note that DetecTree is based on the methods of Yang et al. [1], therefore it seems fair to reference their work too. An example citation in an academic paper might read as follows: > The classification of tree pixels has been performed with the Python library DetecTree (Bosch, 2020), which is based on the approach of Yang et al. (2009). diff --git a/detectree/utils.py b/detectree/utils.py index bb8a109..de4812c 100644 --- a/detectree/utils.py +++ b/detectree/utils.py @@ -201,8 +201,7 @@ def get_img_filename_ser(split_df, img_cluster, train): ] except KeyError: raise ValueError( - "If `method` is 'cluster-II', `split_df` must have a " - "'img_cluster' column" + "If `method` is 'cluster-II', `split_df` must have a 'img_cluster' column" ) diff --git a/paper/paper.md b/paper/paper.md index d270b81..b1bd581 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -31,22 +31,22 @@ The aim of DetecTree is therefore to provide an open source library that perform DetecTree is based on the supervised learning approach of @yang2009tree, which requires an RGB aerial imagery dataset as the only input, and consists of the following steps: - **Step 0**: split of the dataset into image tiles. Since aerial imagery datasets often already come as a mosaic of image tiles, this step might not be necessary. In any case, DetecTree provides a `split_into_tiles` function that can be used to divide a large image into a mosaic of tiles of a specified dimension. -- **Step 1**: selection of the tiles to be used for training a classifier. As a supervised learning task, the ground-truth maps must be provided for some subset of the dataset. Since this part is likely to involve manual work, it is crucial that the training set has as few tiles as possible. At the same time, to enhance the classifier's ability to detect trees in the diverse scenes of the dataset, the training set should contain as many of the diverse geographic features as possible. Thus, in order to optimize the representativity of the training set, the training tiles are selected according to their GIST descriptor \[@oliva2001modeling\], *i.e.*, a vector describing the key semantics of the tile's scene. More precisely, *k*-means clustering is applied to the GIST descriptors of all the tiles, with the number of clusters *k* set to the number of tiles of the training set (by default, one percent of the tiles is used). Then, for each cluster, the tile whose GIST descriptor is closest to the cluster's centroid is added to the training set. In DetecTree, this is done by the `train_test_split` method of the `TrainingSelector` class. +- **Step 1**: selection of the tiles to be used for training a classifier. As a supervised learning task, the ground-truth maps must be provided for some subset of the dataset. Since this part is likely to involve manual work, it is crucial that the training set has as few tiles as possible. At the same time, to enhance the classifier's ability to detect trees in the diverse scenes of the dataset, the training set should contain as many of the diverse geographic features as possible. Thus, in order to optimize the representativity of the training set, the training tiles are selected according to their GIST descriptor [@oliva2001modeling], *i.e.*, a vector describing the key semantics of the tile's scene. More precisely, *k*-means clustering is applied to the GIST descriptors of all the tiles, with the number of clusters *k* set to the number of tiles of the training set (by default, one percent of the tiles is used). Then, for each cluster, the tile whose GIST descriptor is closest to the cluster's centroid is added to the training set. In DetecTree, this is done by the `train_test_split` method of the `TrainingSelector` class. - **Step 2**: provision of the ground truth tree/non-tree masks for the training tiles. For each tile of the training set, the ground-truth tree/non-tree masks must be provided to get the pixel-level responses that will be used to train the classifier. To that end, an image editing software such as GIMP or Adobe Photoshop might be used. Alternatively, if LIDAR data for the training tiles is available, it might also be exploited to create the ground truth masks. -- **Step 3**: train a binary pixel-level classifier. For each pixel of the training tiles, a vector of 27 features is computed, where 6, 18 and 3 features capture characteristics of color, texture and entropy respectively. A binary AdaBoost classifier \[@freund1995desicion\] is then trained by mapping the feature vector of each pixel to its class in the ground truth masks (*i.e.*, tree or non-tree). -- **Step 4**: tree detection in the testing tiles. Given a trained classifier, the `classify_img` and `classify_imgs` methods of the `Classifier` class can respectively be used to classify the tree pixels of a single image tile or of multiple image tiles at scale. For each image tile, the pixel-level classification is refined by means of a graph cuts algorithm \[@boykov2004experimental\] to avoid sparse pixels classified as trees by enforcing consistency between adjacent tree pixels. An example of an image tile, its pre-refinement pixel-level classification and the final refined result is displayed below: +- **Step 3**: train a binary pixel-level classifier. For each pixel of the training tiles, a vector of 27 features is computed, where 6, 18 and 3 features capture characteristics of color, texture and entropy respectively. A binary AdaBoost classifier [@freund1995desicion] is then trained by mapping the feature vector of each pixel to its class in the ground truth masks (*i.e.*, tree or non-tree). +- **Step 4**: tree detection in the testing tiles. Given a trained classifier, the `classify_img` and `classify_imgs` methods of the `Classifier` class can respectively be used to classify the tree pixels of a single image tile or of multiple image tiles at scale. For each image tile, the pixel-level classification is refined by means of a graph cuts algorithm [@boykov2004experimental] to avoid sparse pixels classified as trees by enforcing consistency between adjacent tree pixels. An example of an image tile, its pre-refinement pixel-level classification and the final refined result is displayed below: ![Example of an image tile (left), its pre-refinement pixel-level classification (center) and the final refined result (right).](figure.png) -Similar methods of tree classification from aerial imagery include the work of @jain2019efficient, who follow the train/test split method based on GIST descriptors as proposed by @yang2009tree but rely on the Mask R-CNN framework \[@he2017mask\] instead of the AdaBoost classifier. Another approach by @tianyang2018single employs a cascade neural network over texture and color features which detects single trees in a variety of forest images. Nonetheless, since the former approaches ultimately aim at single tree detection, the accuracy evaluation metrics that they provide are hard to compare with the pixel-level classification accuracy of DetecTree. The experiments performed by @yang2009tree in New York achieve a pixel classification accuracy of 91.7%, whereas the example applications of DetecTree in Zurich and Lausanne achieve accuracies of 85.98% and 91.75% respectively. +Similar methods of tree classification from aerial imagery include the work of @jain2019efficient, who follow the train/test split method based on GIST descriptors as proposed by @yang2009tree but rely on the Mask R-CNN framework [@he2017mask] instead of the AdaBoost classifier. Another approach by @tianyang2018single employs a cascade neural network over texture and color features which detects single trees in a variety of forest images. Nonetheless, since the former approaches ultimately aim at single tree detection, the accuracy evaluation metrics that they provide are hard to compare with the pixel-level classification accuracy of DetecTree. The experiments performed by @yang2009tree in New York achieve a pixel classification accuracy of 91.7%, whereas the example applications of DetecTree in Zurich and Lausanne achieve accuracies of 85.98% and 91.75% respectively. -The code of DetecTree is organized following an object-oriented approach, and relies on NumPy \[@van2011numpy\] to represent most data structures and perform operations upon them in a vectorized manner. The Scikit-learn library \[@pedregosa2011scikit\] is used to implement the AdaBoost pixel-level classifier as well as to perform the *k*-means clustering to select the training tiles. The computation of pixel-level features and GIST descriptors makes use of various features provided by the Scikit-image \[@van2014scikit\] and SciPy \[@virtanen2020scipy\] libraries. On the other hand, the classification refinement employs the graph cuts algorithm implementation provided by the library [PyMaxFlow](https://github.com/pmneila/PyMaxflow). Finally, when possible, DetecTree uses the Dask library \[@rocklin2015dask\] to perform various computations in parallel. +The code of DetecTree is organized following an object-oriented approach, and relies on NumPy [@van2011numpy] to represent most data structures and perform operations upon them in a vectorized manner. The Scikit-learn library [@pedregosa2011scikit] is used to implement the AdaBoost pixel-level classifier as well as to perform the *k*-means clustering to select the training tiles. The computation of pixel-level features and GIST descriptors makes use of various features provided by the Scikit-image [@van2014scikit] and SciPy [@virtanen2020scipy] libraries. On the other hand, the classification refinement employs the graph cuts algorithm implementation provided by the library [PyMaxFlow](https://github.com/pmneila/PyMaxflow). Finally, when possible, DetecTree uses the Dask library [@rocklin2015dask] to perform various computations in parallel. The features of DetecTree are implemented in a manner that enhances the flexibility of the library so that the user can integrate it into complex computational workflows, and also provide custom arguments for the technical aspects. Furthermore, the functionalities of DetecTree can be used through its Python API as well as through its command-line interface (CLI), which is implemented by means of the Click Python package. # Availability -The source code of DetecTree is fully available at [a GitHub repository](https://github.com/martibosch/detectree). A dedicated Python package has been created and is hosted at the [Python Package Index (PyPI)](https://pypi.org/project/detectree/). The documentation site is hosted at [Read the Docs](https://detectree.readthedocs.io/), and an example repository with Jupyter notebooks of an example application to an openly-available orthophoto of Zurich is provided at a [dedicated GitHub repository](https://github.com/martibosch/detectree-example), which can be executed interactively online by means of the Binder web service \[@jupyter2018binder\]. An additional example use of DetecTree can be found at a [dedicated GitHub repository](https://github.com/martibosch/lausanne-tree-canopy) with the materials to obtain a tree canopy map for the urban agglomeration of Lausanne from the SWISSIMAGE 2016 orthophoto \[@swisstopo2019swissimage\]. +The source code of DetecTree is fully available at [a GitHub repository](https://github.com/martibosch/detectree). A dedicated Python package has been created and is hosted at the [Python Package Index (PyPI)](https://pypi.org/project/detectree/). The documentation site is hosted at [Read the Docs](https://detectree.readthedocs.io/), and an example repository with Jupyter notebooks of an example application to an openly-available orthophoto of Zurich is provided at a [dedicated GitHub repository](https://github.com/martibosch/detectree-example), which can be executed interactively online by means of the Binder web service [@jupyter2018binder]. An additional example use of DetecTree can be found at a [dedicated GitHub repository](https://github.com/martibosch/lausanne-tree-canopy) with the materials to obtain a tree canopy map for the urban agglomeration of Lausanne from the SWISSIMAGE 2016 orthophoto [@swisstopo2019swissimage]. Unit tests are run within the [Travis CI](https://travis-ci.org/martibosch/detectree) platform every time that new commits are pushed to the GitHub repository. Additionally, test coverage [is reported on Coveralls](https://coveralls.io/github/martibosch/detectree?branch=master).