Building-height Estimation using Street-view Images, Deep Learning, and Building Footprints
Introduction
Building height is an important piece of information that has various important applications such as economic analysis, urban planning, and development of 3D city maps. For many geographical areas, such information is not accessible or inexistent. Collection of such information on a large scale would be a very labor-intensive task. As such, automating this process will be extremely useful in such cases. Deploying street-view imagery for information extraction has been increasingly considered recently by both industry and academia. This article presents an open-source system that has been developed for automatic estimation of building height from street-view images using Deep Learning (DL), advanced image processing techniques, and geospatial data. Both street-view images and the needed geospatial data are becoming pervasive and available through multiple platforms. The goal of the developed system is to ultimately be used to enrich the Open Database of Buildings (ODB), that has been published by Statistics Canada, as a part of the Linkable Open Data Environment (LODE).
Building height estimation
In the past, different technologies have been used for building height estimation such as Light Detection and Ranging (LiDAR) data (Sampath & Shan, 2010), as well as Synthetic Aperture Radar (SAR) (Brunner et al., 2010). LiDAR is a laser-based technology for measuring distances (Sampath & Shan, 2010). With this technology, laser light is illuminated on the target and the reflection is measured with a sensor. 3D representations of the area/objects of interest are built by measuring the differences in laser return times and wavelengths. SAR is a radar-based technology that is used for 3D reconstructions of objects or landscapes. With SAR, fine spatial resolution can be achieved using the motion of the radar antenna over the object/region of interest. The issue with the approaches above is that it is costly to obtain such data, which make their usability on a large scale infeasible.
Openly licensed street-view imagery is spreading rapidly. Over the past few years, several platforms for street-view imagery have emerged, which offer different terms and conditions of services, and different degrees of “openness”. Google Street-View (Google, 2020), Mapillary (Mapillary, 2014), and OpenStreetCam (Grab Holdings, 2009) are examples of such platforms. Such imagery provides remarkable opportunities to enrich existing data on buildings with complementary information relevant for further economic and spatial analyses. Furthermore, such imagery can be utilized on a large scale as they are largely available and easily accessible. The facts above have motivated this effort to develop an open-source system for building height estimation from street-view imagery.
Building height estimation from street-view imagery
Figure 1 below illustrates the flow of the developed system for building height estimation. First, street-view image of the building of interest is downloaded. The Google Street-View Static API is used to obtain test imagery. The Street View Static API allows downloading static (non-interactive) street-view panorama or thumbnail with HTTP requests (Google, 2020). A standard HTTP request can be used to request the street-view image and a static image is returned. Several parameters can be provided with the request such as the camera pitch and the angle of view. After obtaining the image, a DL-based semantic segmentation is applied to identify the building in the image. Afterwards, a series of image-processing steps are implemented on the image to extract the height of the building in the image. Thereafter, building footprint data is obtained. In the current implementation, we get such data from OSM using HTTP requests. In future work, we will be updating the system to obtain this data from Statistics Canada’s ODB (Statistics Canada, 2019). After obtaining all the needed building footprint data, computations are performed on the data and the camera location to estimate the distance between the camera and the building. Finally, the values obtained from the steps above are all used in the camera-projection model, to estimate the building height.
Figure 1: A high level illustration of the developed system’s workflow
Semantic segmentation
Measurements of the building in the image are needed to estimate the building actual height. These measurements are extracted through a combination of advanced image processing techniques. However, before these techniques are applied, identification of the building in the image is required. Identification of the building in the image is done through semantic segmentation. Image segmentation partitions an image into multiple segments (Gonzalez & Woods, 2017). This technique is widely used in image processing as it simplifies further analysis and makes it possible to extract certain information. Grouping pixels together is done on the basis of specific characteristics such as intensity, color, or connectivity between pixels.
Semantic segmentation is a special type of image segmentation (Mottaghi et al., 2014). With semantic segmentation, a class is assigned to every pixel in the input image. This means that it is a classification problem. It is different from image classification where a label is assigned to the whole image. Semantic segmentation attempts to partition the image into meaningful parts and associate every pixel in an input image with a class label. For example, as in our case, it can be used to divide the pixels in the image to different classes (e.g., building, sky, or pedestrian). A very common application of semantic segmentation is self-driving vehicles. Nowadays, deep neural networks are usually used to solve semantic segmentation problems. In particular, CNNs achieve great results in terms of accuracy and efficiency. CNNs is a class of deep neural networks, most commonly applied to analyzing visual imagery.
Samples of obtained results
Figure 2 below shows some of the images obtained from the system over the process of building height estimation. Figure 4-a shows an image of a residential building downloaded by the system. Figure 4-b shows the output of the CNN, which is a semantically segmented image. As can be seen in Figure 4, the building shows up as green subregions in the output image. Advanced image processing techniques are then implemented on this image to extract its border and find the roofline, as shown if Figure 4-b. The top points and the roofline are superimposed on the segmented image. Steps 4-6 in Figure 1 are then implemented to estimate building height. The actual height for the building in Figure 2-a is 45m, and Figure 2-b show that the result obtained for that building is 48.052m.
Figure 2: (a) A residential building (b) The semantically segmented image with the top points superimposed (c) The estimated height printed on the image.
The developed system was tested on a group of residential and commercial buildings in Ottawa, Canada. The obtained results show that the system can be used to provide accurate building-height estimation. Furthermore, scalability analysis shows that the system can be utilized for building-height estimation on a larger scale (e.g., at the level of a big city or a country). Interested readers can refer to (Al-Habashna, 2020) for more detail on results and analysis. The system is currently being improved to handle some challenging cases and improve its speed and scalability.
Acknowledgments
This project is funded by the R&D Board of Statistics Canada.
References
Al-Habashna, A., “An Open-source System for Building-height Estimation using Street-view Images, Deep Learning, and Building Footprints,” Statistics Canada Articles and Reports: 18-001-X2020002, Dec. 2020.
Brunner, D., Lemoine, G., Bruzzone, L., & Greidanus, H. (2010). Building height retrieval from VHR SAR imagery based on an iterative simulation and matching technique. IEEE Transactions on Geoscience and Remote Sensing, 48(3 PART2), 1487–1504. https://doi.org/10.1109/TGRS.2009.2031910
Gonzalez, R., & Woods, R. (2017). Digital image processing (4th ed.). Pearson.
Google. (2020). Street View Static API. https://developers.google.com/maps/documentation/streetview/intro
Grab Holdings. (2009). Open Street Cam. https://openstreetcam.org/map/@45.37744755572422,-75.65142697038783,18z
Mapillary. (2014). Mapillary. https://www.mapillary.com/
Mottaghi, R., Chen, X., Liu, X., Cho, N. G., Lee, S. W., Fidler, S., Urtasun, R., & Yuille, A. (2014). The role of context for object detection and semantic segmentation in the wild. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 891–898. https://doi.org/10.1109/CVPR.2014.119
Sampath, A., & Shan, J. (2010). Segmentation and reconstruction of polyhedral building roofs from aerial lidar point clouds. IEEE Transactions on Geoscience and Remote Sensing, 48(3 PART2), 1554–1567. https://doi.org/10.1109/TGRS.2009.2030180
Statistics Canada. (2019). Open Database of Buildings. https://www.statcan.gc.ca/eng/lode/databases/odb