Calculating forest loss using open source deep learning tools and publicly available training data

Deep learning algorithms have been developed by academia and as a result the code for the most part is open source but the successful training of deep networks requires thousands of labeled training samples and at the present time this training data is typically not open. In their presentation at this year’s FOSS4GNA (Free and Open Source Software for Geospatial North America) get together in St Louis Jason Brown and Courtney Whalen, data scientists at astraea, showed how deep learning using publicly available labeled data for training was applied to track deforestation and reforestation in Mato Grosso, a state in the central amazon region of Brazil.

Chris Holmes of Planet Labs, in his insightful talk at FOSS4GNA in St Louis about the application of deep learning to geospatial data, identified a challenge in making this technology open. The deep learning algorithms have been developed by academia and as a result the code for the most part is open source. For example, a deep neural network model developed originally for medical image segmentation called U-Net is open source and has been applied to identifying building footprints. Successful training of deep networks requires thousands of labeled training samples. Labeled data involves people on the ground manually ground-truthing land use types and other features so that the deep learning algorithms can learn what to recognize. At the present time this training data is typically not open source. In this presentation by Jason Brown and Courtney Whalen, both data scientists at astraea, deep learning using publicly available labeled data for training was used to track deforestation and reforestation in Mato Grosso, a state in the central amazon region of Brazil.

This is computationally intensive and a distributed engine was used. The computation engine used open source components. Spark is a top level Apache project which enables distributed processing for global scale computation. RasterFrames is a free and open source toolkit allowing scientists, data scientists, and software developers to process and analyze geo

Forest cover in Mato Grosso in 2002

patial-temporal raster data with the same flexibility and ease as any other data type in Spark DataFrames. This is a LocationTech raster project and is built on GeoTrellis. Using this software each year required 6 to 7 hours of computation using 48 cores.

The imagery that was used was captured by the MODIS satellite for the years 2001 through 2017. MODIS monitors the reflection back from ground cover for several bands including red, green, blue, short wave infrared and near infrared. Its cameras have a spatial resolution of 500 by 500 meters and a revisit rate of one to two days. From the bands that it captures the normalized difference vegetation index (NDVI) can be calculated. From the data monthly means and yearly aggregates can be calculated.

Deforestation Mato Grosso Brazil 2011

Forest cover in Mato Grosso in 2011

The training data used came from the System for Terrrestrial Ecosystem Parameterization (STEP) which has 2000 manually labeled sites covering 17 different land cover types including five forest types scattered across all continents. The model was trained on MODIS 2012 data. 80% of the data was used for training. After training was completed the remaining 20% was used to test the model.

DSC04697ab

Comparison of rate of forest loss with Global Forest Watch for 2001 to 2016

After training and testing the first application was to Mato Grosso in central Brazil, a large state that has seen a lot of deforestation. The rate of deforestation tracked the rate estimated independently by the Global Forest Watch for the years 2001 to 2017. The major feature, the slowing down of the rate of deforestation in 2011 probably as a result of increased enforcement by the state government, is very clearly discernible.

The successful application of the deep learning technology in Mato Grosso has encouraged astraea to aim at applying this approach globally. They also intend to use satellite data with higher resolution and to handle seasonal differences better.

About STEP

The System for Terrestrial Ecosystem Parameterization (STEP) is a model for deriving vegetation and land surface parameters from remote sensing data for use in remote sensing-based classification of land cover, ecosystems, and vegetation types. The model defines parameters that relate to important ecological and biogeophysical parameters and that can be reliably measured or inferred from remote sensing, collateral, and field plot data. STEP is maintained as a database of training polygons drawn on high spatial resolution imagery that can be extracted with GIS to produce a global land cover classification. STEP is periodically reviewed to filter out inconsistent sites and augmented to fill gaps in biogeographical coverage. The database was originally created to follow the International Geosphere-Biosphere Programme (IGBP) land cover legend but it has since evolved to support any number of additional classifications.

Geoff Zeiss

Geoff Zeiss

Geoff Zeiss has more than 20 years experience in the geospatial software industry and 15 years experience developing enterprise geospatial solutions for the utilities, communications, and public works industries. His particular interests include the convergence of BIM, CAD, geospatial, and 3D. In recognition of his efforts to evangelize geospatial in vertical industries such as utilities and construction, Geoff received the Geospatial Ambassador Award at Geospatial World Forum 2014. Currently Geoff is Principal at Between the Poles, a thought leadership consulting firm. From 2001 to 2012 Geoff was Director of Utility Industry Program at Autodesk Inc, where he was responsible for thought leadership for the utility industry program. From 1999 to 2001 he was Director of Enterprise Software Development at Autodesk. He received one of ten annual global technology awards in 2004 from Oracle Corporation for technical innovation and leadership in the use of Oracle. Prior to Autodesk Geoff was Director of Product Development at VISION* Solutions. VISION* Solutions is credited with pioneering relational spatial data management, CAD/GIS integration, and long transactions (data versioning) in the utility, communications, and public works industries. Geoff is a frequent speaker at geospatial and utility events around the world including Geospatial World Forum, Where 2.0, MundoGeo Connect (Brazil), Middle East Spatial Geospatial Forum, India Geospatial Forum, Location Intelligence, Asia Geospatial Forum, and GITA events in US, Japan and Australia. Geoff received Speaker Excellence Awards at GITA 2007-2009.

View article by Geoff Zeiss

Be the first to comment

Leave a Reply

Your email address will not be published.


*