GeoBuiz 2016: High volume geospatial imagery processing requires new architecture

DigitalGlobe owns and operates a constellation of commercial earth observation satellites.  The latest is Worldview-3 which is capable of about 30 cm precision.  Digital Globe’s satellites collect 3,000,000 square kilometers of Earth imagery every day, that’s about 70 terabytes of data per day.  Digital Globe also provides online access to 15 years of earth imagery, about 90 petabytes.  Of all high-resolution commercial imagery collected since 2010, Digital Globe has collected approximately 80% of it.  Clearly as a user you can’t download all of this data, so how can users do things like search through this huge data set and generate a product such as a vegetation map for Tanzania ?

Digital Globe GBDX ArchitectureAt the GeoBuiz 2016 Summit in Bethesda this week, Walter Scott, CTO and Founder, outlined how  Digital Globe is making this vast amount of high resolution data accessible to and usable by customers.  To make it possible for customers to search and process this data efficiently, Digital Globe has put all of the data on the public cloud (Amazon) and has provided a platform that supports efficient search, processing and workflows. GBDX is an online environment for efficiently running advanced algorithms for information extraction from imagery datasets at scale.  It is a set of RESTful APIs, and algorithms that have already been implemented by Digital Globe or its partners include car counting, orthorectification, land use, land cover, and atmospheric compensation. 

One of the most important efficiencies of the GBDX architecture is that it avoids having to move huge volumes of data over the internet.  It does this by bringing the application to the data, rather than the data to the application.  For example, to search through the data, GBDX provides a catalog API that uses 39 different image attributes such as geographic location, cloud cover percentage, sensor type, sun angle and others.  The search runs locally on Amazon, avoiding pushing data over the internet.

The GBDX API includes support for workflows.   A workflow is a sequence of tasks executed against sets of data. A typical workflow is initiated with a search for the data you want to process.  The selected data is assigned to a working set, and then processed  through a sequence of steps to generate your desired product.   For example, a typical product would be an orthorectified vegetation map.  Again the workflow runs locally on Amazon, avoiding pushing data over the internet during processing.  Only the final product resulting from the workflow, such as a vegetation map, would be transferred over the internet to the customer.

GBDX uses the cloud infrastructure of Amazon Web Service (AWS).  Data is stored in S3 ‘buckets’ that are accessed through GBDX RESTful web service APIs. GBDX includes built in tools supporting remote sensing algorithms from DigitalGlobe and ENVI.  GBDX is python-friendly and allows you create your own tools.

One of the application areas that the platform makes possible is deep learning.  The basic concept is that software can simulate the brain’s large array of neurons as an artificial “neural network”.  Large volumes of input data and some way of quantitatively assessing the results are required to train the system.  With the greater depth enabled by new computers and algorithms, remarkable advances in speech and image recognition are being achieved.  For example, a Google deep-learning system that had been shown 10 million images from YouTube videos proved almost twice as good as any previous image recognition effort at identifying objects such as cats.  Microsoft has demonstrated speech software that transcribes spoken words into English text with an error rate of only 7 percent.

But deep learning requires a lot of compute power and a lot of training.  GBDX provides the computational power to apply deep learning to earth observation and Digital Globe’s data library can provide the training data sets.  What is required is a scalable way of assessing and quantifying the results. Walter Scott suggested crowd sourcing as a way of achieving the scale required for earth observation training.

Geoff Zeiss

Geoff Zeiss

Geoff Zeiss has more than 20 years experience in the geospatial software industry and 15 years experience developing enterprise geospatial solutions for the utilities, communications, and public works industries. His particular interests include the convergence of BIM, CAD, geospatial, and 3D. In recognition of his efforts to evangelize geospatial in vertical industries such as utilities and construction, Geoff received the Geospatial Ambassador Award at Geospatial World Forum 2014. Currently Geoff is Principal at Between the Poles, a thought leadership consulting firm. From 2001 to 2012 Geoff was Director of Utility Industry Program at Autodesk Inc, where he was responsible for thought leadership for the utility industry program. From 1999 to 2001 he was Director of Enterprise Software Development at Autodesk. He received one of ten annual global technology awards in 2004 from Oracle Corporation for technical innovation and leadership in the use of Oracle. Prior to Autodesk Geoff was Director of Product Development at VISION* Solutions. VISION* Solutions is credited with pioneering relational spatial data management, CAD/GIS integration, and long transactions (data versioning) in the utility, communications, and public works industries. Geoff is a frequent speaker at geospatial and utility events around the world including Geospatial World Forum, Where 2.0, MundoGeo Connect (Brazil), Middle East Spatial Geospatial Forum, India Geospatial Forum, Location Intelligence, Asia Geospatial Forum, and GITA events in US, Japan and Australia. Geoff received Speaker Excellence Awards at GITA 2007-2009.

View article by Geoff Zeiss

Be the first to comment

Leave a Reply

Your email address will not be published.


*