Let the Scientists do Science: Efficient Data Management Design for Biological Field Surveys

Defining the problem

Environmental consulting is a fast-paced industry for good reason. Whether it’s the constraints of the season, the expectations of the client, or the immediate call of an emergency, there is almost always urgency. Both field and office time for project scientists can be exacerbated when it comes to data, which is (not surprisingly) a sobering reality of being a scientist. Turning observations into actionable information at an efficient pace is what selling science is all about – and the key is to maintain the quality of the science throughout the process.

Scientists are the ones best equipped to perform their observations, and they’re also the ones who are best equipped to make sense of the data. But because of the urgency and complexity of their work, it’s sometimes too arduous a task to manage the complexity of their data collection efforts without a sophisticated data management system.

Case in point

Biological sciences such as aquatics and wildlife are a great demonstration of how data management can be so complex. Consider a fish capture project: there are dozens of variables that can go along with a single fish capture event on any given day of a given project. The data captured includes general context information (such as site/stream name, start and end time, start and end coordinates, etc.), water quality information (such as water depth, temperature, dissolved O2, pH, etc.), type of fish capture effort (electro-fishing, angling, minnow trapping, etc.), the details of the effort (number of passes, time passed, etc.), and of course the fish themselves (species, length, weight, sex, health status, etc.). In addition to the number of variables that are required, they are also required to be captured in a hierarchy: many captured fish correspond to a single fish capture effort, many fish capture efforts correspond to a single site on a stream, and many streams are a part of a single project. This tree hierarchy is conceptualized below:

Figure 1 – A hierarchy as a ‘flat’ table – the product of many ‘left joins’.

The Solution

To accommodate a hierarchical data collection system, the right tools for the job are Flowfinity and Microsoft SQL Server, along with well-known office tools like Microsoft Excel. Flowfinity is a mobile form application that sits on Microsoft SQL Server and runs on any internet browser or on mobile devices such as iPads. By its very nature, a database server such as Microsoft SQL Server is a living entity; it’s dynamic. Flowfinity capitalizes on this dynamic nature with an easy-to-use form interface that is endlessly customizable. It’s most capable function is ‘nesting’, where each ‘nest’ is synonymous with each level of the hierarchy.

As we saw in figure 1, the hierarchy is really just a series of left joins, where in order to view everything about a specific fish caught (the bottom level) there need to be just as many records at each upper level. This is why it’s so difficult to use a data collection system that is, by its design, non-hierarchical: a non-hierarchical system doesn’t capture the upper level data for you, a hierarchical system does. In addition, in the field most of the upper-level elements are already established and the focus is on capturing fish: a scientist knows where she is and the method she is fishing with, she just needs to log the captured fish information. But during her reporting, she needs to know everything about the fish, including the upper-level information. Figure 2 demonstrates how this is possible.

Figure 2 – Simplified SQL statement for ‘flattening’ a hierarchy

Most field data collection solutions suffer from three primary challenges:

Collecting/modifying data on the fly
Storing data centrally
Allowing near-synchronous integration of the first two

What’s paramount in importance is the two-way transfer of collection and storage. This is evident in the interconnectivity of Figure 3 – the moment data is collected it can be seen by anyone else via Flowfinity, SQL Server Management Studio, and excel. Of course, all of this hinges on the mobile device being connected to the web: it’s impossible to centralize something without being connected to it. This is perhaps the only limiting factor, where if a mobile device is not connected to the web the user will only be able to see existing records and the ones they’ve collected that day. They won’t be able to see records logged by others, and office staff won’t be able to see their records until they’ve connected to the web and records have been pushed onto the database.

What isn’t evident in figure 3 is the alternative: capturing the data in a non-standardized, non-centralized medium (paper, personal computer), transcribing/exporting it into a disconnected format (excel, word) that is also non-hierarchical (or it is and is too cumbersome to navigate), and a lag time between collection and review.

Figure 3 – Conceptual design of a biological data management system

Conclusion

The complete system is as close to a ‘closed box’ as a scientist can hope for: they can create and edit hierarchical records, see their records as they come in, and view their data in the perspectives they need. That’s not to say that improvements can’t be made. Mobile collection systems are notoriously bound to network connections, large objects such as photos make storage an issue, and depending on the type of work, the location services of the on-board device may be inadequate.Where’s the spatial connection in all of this you ask? Flowfinity uses the onboard location services of mobile devices (and custom coordinate entry), which are easily accessible via a database connection through your preferred GIS software. By creating dynamically linked mapping layers, data can be quickly accessed spatially in a ready-to-use format tied to Microsoft SQL Server. More on this in my next article.