Cloud Optimized GeoTIFFs — The Gold Standard of Raster Storage
You may have heard about or worked with GeoTIFFs, but did you know about the cloud-friendly version developed called, Cloud Optimized GeoTIFF (COG)? COGs are the gold standard format for storing raster data in the cloud (developers.planet.com). COGs differ from GeoTIFFs in that they are formatted to work on the cloud. To learn more about COGs and why they are important we will first go over a few definitions and then explore how COGs work and why using a COG rather than a GeoTIFF might be a better choice.
Some Definitions
TIFF
TIFF stands for Tag Image File Format, a format for storing raster graphic images (developers.planet.com). It was designed to establish a general agreement on a common scanned image file format (developers.planet.com).
GeoTIFF
GeoTIFF is a public domain metadata standard that enables georeferencing information to be embedded within an image file so that it can be used in GIS applications (heavy.ai). A GeoTIFF file has additional metadata information about where the image exists on earth, including map projections, coordinate systems and datums (developers.planet.com).
Cloud Optimized GeoTIFF (COG)
A COG is a GeoTIFF file that is internally organized in a way that enables more efficient workflows in the cloud environment (usgs.gov). This is done by leveraging the ability of clients issuing HTTP GET range requests to ask for just the parts of a file they need (usgs.gov).
Source: developers.planet.com
A COG contains at its beginning the metadata of the full resolution imagery, followed by the optional presence of overview metadata, and finally the actual imagery (github.com).
How COGs work
COGs rely on two complementary pieces of technology: organizing pixels in a particular way and HTTP GET range requests (cogeo.org).
Organizing pixels in a particular way
A GeoTIFF stores the raw pixels of the image and organizes those pixels in particular ways (cogeo.org). The two main organization techniques that COGs use are tiling and overviews (cogeo.org). The data is also compressed for more efficient use online (cogeo.org).
HTTP GET range requests
HTTP GET range requests to let the users ask for just the portions of a file that they need (cogeo.org). The range requests are an optional field, so web servers are not required to implement it. Almost all the object storage options on the cloud (Amazon, Google, Microsoft, OpenStack, etc.) support the field of data stored on their servers (cogeo.org). Therefore, most of the data that is stored on the cloud is automatically able to serve up parts of itself, as long as clients know what to ask for (cogeo.org).
Bringing the two technologies together
Adding tiling and overviews to the GeoTIFF put the right structure on the files on the cloud so that the range queries can request just the relevant part of the file (cogeo.org). Together these enable fully online processing of data, and the right parts of the GeoTIFF are streamed as needed, instead of having to download the whole file (cogeo.org).
Why using a COG over a GeoTIFF might be a better choice
- Efficient Imagery Data Access (cogeo.org).
- COGs can handle exponential growth of data (developers.planet.com)
- COG-aware software can stream just the portion of data that it needs, improving processing times and creating real-time workflows previously not possible (cogeo.org)
- COGs depend on technology that works in conjunction with each other (cogeo.org). How pixels are organized (internal tiling, overviews, compression) makes it easier for users to access parts of the data corresponding to their particular area of interest, without needing to download the entire file first (cogeo.org).
- Reduced Duplication of Data (cogeo.org).
- Accessing COGs with cloud workflows enables a variety of software to all access a single file online instead of needing to copy and cache the data (cogeo.org).
- Legacy Compatibility (cogeo.org).
- Traditional GIS software can treat Cloud Optimized GeoTIFFs just like normal GeoTIFFs, so data providers need to only produce one format (cogeo.org).
- Democratizing Data Science (developers.planet.com)
- COGs allow geospatial data to be more accessible and available (developers.planet.com).
More and more geospatial data is migrating to the cloud, and it most often is stored on cloud-based object storage, such as Google Cloud Storage (cogeo.org). Traditional GIS file formats can easily exist on the cloud, but doing on-the-fly processing is difficult to do efficiently with those formats as they often have to be downloaded to another location and then converted to an optimized format or stored in memory (cogeo.org). COGs enable efficient streaming of data to enable fully cloud-based geospatial workflows (cogeo.org).
Furthermore, providing data in the COG format can help decrease how much data is copied (cogeo.org). Since online software can stream the data, they do not need to keep their copy of the data for efficient access (cogeo.org). Also, data providers do not need to provide multiple file formats, because legacy systems can read the same GeoTIFF that the online streaming software is reading (cogeo.org). Data providers can just put up one version of their data, and multiple online programs can all access it at the same time, with no additional copies necessary for download purposes (cogeo.org).
Future of COGs
COG is rapidly maturing, with many new software libraries and tools coming online (cogeo.org). There are many sources of COG data already, from sources such as Planet, Mundi and OpenAerialMap, and more and more are coming online all the time (cogeo.org).
Sources
https://github.com/cogeotiff/cog-spec/blob/master/spec.md
https://www.heavy.ai/technical-glossary/geotiff
https://www.usgs.gov/faqs/what-are-cloud-optimized-geotiffs-cogs