- Researchers at Missouri Botanical Garden in the U.S. have launched an initiative to create a digital repository of the 6 million plant specimens stored in the herbarium there.
- The six-year Revolutionizing Species Identification initiative aims to combine data obtained from visual and hyperspectral scanning with artificial intelligence to build up a plant repository unlike any before.
- The team behind the project says they hope the reference database will speed up plant identification; it could also potentially be used to gauge the health of forests in the face of climate change.
How can dried plant specimens protect a rainforest? In myriad ways.
Kept in collections known as herbariums, they store critical data on the physical characteristics, or morphology, of plants, as well as their origins and global distribution. Scale that up to 6 million specimens, and you have a treasure trove of information that could potentially inform conservation strategies and aid restoration efforts around the world.
That’s exactly what one of the world’s largest herbariums is attempting to do.
Missouri Botanical Garden has announced a six-year project, the Revolutionizing Species Identification (RSI) initiative, to extract all kinds of data from the 6 million species it houses. Throw in artificial intelligence to automate and speed up the identification of plant species around the world, and the outcome should be a digital plant repository unlike any before it.
“You can’t conserve what you don’t know,” Gunter Fischer, senior vice president of science and conservation research at Missouri Botanical Garden, told Mongabay in a video interview. He added the goal of the project is to “speed up the identification of species to make sure that they don’t go extinct.”
In addition to visual scans of the specimens in the herbarium, the team will also use a technology known as hyperspectral imaging to develop a database that gives each species a unique fingerprint. Hyperspectral imaging, based on how plants reflect different wavelengths of light, will provide data on the chemical composition of the plants as well as information on plant health and other environmental factors. Once the data have been gathered, the team will use AI tools to create a reference database of plant features and traits. This, the team says, should enable scientists to identify species by uploading images or hyperspectral data of unidentified plants.
“We frequently have many more plants to identify than we have botanists to identify them,” Jordan Teisher, curator and director of the herbarium, told Mongabay in a video interview. “Often, the specimens will come in and sit for quite a while until someone identifies them. If this tool works like we hope it will, that can speed that process up massively.”
Gunter Fischer and Jordan Teisher spoke with Mongabay’s Abhishyant Kidangoor about why this work is important at this point in time, how they envision the digital repository being used, and the challenges they anticipate as they build it out. The following interview has been lightly edited for length and clarity.
Mongabay: What is the Revolutionizing Species Identification initiative? How would you describe it to someone who isn’t aware of it?
Gunter Fischer: The RSI project is a large initiative to scan one of the largest herbaria in the world. We have got 6 million herbarium specimens, and the idea is to create a really large data set containing visual scans as well as scans outside the visual spectrum, the so-called hyper spectrum. The different spectra will give you a series of traits, as we call them, and those traits can be used to identify species. We would like to automate plant species identification far beyond what common apps can do.
Jordan Teisher: If you’re familiar with those kinds of apps, and if you have a relatively small number of species you are dealing with, they perform very well. As you start to add species, especially species that look very similar to one another, the apps perform less well because they’re using basically images and a finite number of possible patterns of those images to identify things. If we’re going to try to scale up so that we can identify species from the tropics, we need all of the data that we can to be incorporated into that. That’s where the hyperspectral scanning comes in in combination with visual scanning. Coming from an herbarium like this, we have an incredibly rich collection. We have tremendously well-curated specimens. We can get that process going with a high degree of confidence. We can start training models, and then say, “All right, can you predict, based on these data, what species this is?”
Mongabay: I’m curious to know why this hasn’t been done before. What have been the challenges in doing this?
Gunter Fischer: Technology is advancing so fast. What was not possible, say five to 10 years ago, is possible now. We are trying to use all the technology there is to build this new project. Our herbarium has 6 million specimens in decades of curation. The combination of having cutting-edge AI technology together with curated data sets will give a much higher likelihood of identification. Many herbaria in the world do not have the history of decades and decades of botanical work. Now, we can benefit from this really greatly by using these data sets for training artificial intelligence.
Mongabay: Why would you say this work is particularly important at this point in time?
Gunter Fischer: We all know that we are in a biodiversity crisis. What’s really important is to be able to identify species as quickly as possible simply because we need to know the species to be able to conserve them. You can’t conserve what you don’t know. The issue really is how we can speed up the identification of species to make sure that they don’t go extinct. It’s very straightforward to do this in more temperate regions around the world, but in tropical regions with incredible biodiversity, it’s really hard to identify a species on the fly. Very often, it takes decades until a botanist finds time to actually have a specimen in her or his hands to be able to identify it. This is a long process. But if we can involve algorithms which can do this in a very quick way, it advances conservation significantly and there are all the downstream applications as well.
For example, if someone wants to restore a forest, they would need to know what they plant. Putting the right tree in the right place is based on the idea that you have to know your tree and then understand the ecology in order to be able to plant the tree in the right location. Especially in the tropics where you have so many different tree species, this is incredibly important. It’s all about knowing your species as quickly as possible to be able to support conservation and restoration.
Jordan Teisher: If you haven’t spent the time to try and identify a plant specimen or an insect or some member of some highly diverse group, you don’t realize the degree to which identification is an obstacle to conservation. We have the benefit of hundreds of years of expert human labor that has identified specimens in the herbarium, but we are generating data, specimens and field observations at a rate that’s very hard to keep up with. We still need that expertise to refine all those identifications, but we frequently have many more plants to identify than we have botanists to identify them. Often, the specimens will come in and sit for quite a while until someone identifies them closely enough to give them to a specialist. It might come in with only a plant family-level identification, or it might just come in with no identification at all. Just that it looked like something interesting, and someone collected it. And then, until it makes its way through that process, it can sometimes be decades. We obviously expedite things that are of really critical concern. But if this tool works like we hope it will, that can speed that process up massively.
Gunter Fischer: The idea is not to replace botanists. The idea is really that the AI technology will supplement what botanists are doing. It’s just complementing what botanists do.
Mongabay: What sparked this idea? Could you walk me through the journey of building it?
Gunter Fischer: My background is in restoration ecology, and we were thinking about how we can make use of this vast collection of 6 million specimens in order to advance conservation and restoration. Obviously there’s a huge amount of information stored in an herbarium specimen. You have the morphological shape of the dried plant specimen. You can see the margins of leaves. You can see the size of the leaves. You can see the arrangements of leaves. There are so many different things you can see [visually], but so far, we can’t really quantify this automatically. Also, there’s a huge amount of information stored in the plant label. It says where the specimen was collected, when it was collected, what’s the ecology, what kind of conditions. There’s very valuable information stored in an herbarium specimen. That information somehow needs to find its way into an application. I think this sparked the idea. How can we speed up to make that information available as quickly as possible?
Hyperspectral scanning, which is beyond the visual spectrum, picks up on things you can’t see [visually]. So basically, things like chemicals in leaves. For example, if you have aluminum in a leaf, the light gets reflected in a different way. This is another character which basically helps to identify a species. We were thinking about what kind of characters we can get from an herbarium specimen in an ideal way to basically inform conservation and restoration. I think this is sort of what sparked this.
Jordan Teisher: The first question is if we could use herbarium specimens in hyperspectral scanning to develop traits that are relevant to living species. This was addressed in part by past research that found that the hyperspectral identification improves with dried material compared to wet material, presumably because of the effect that the water content has on that signal. So that was very encouraging. There’s also a hyperspectral herbarium working group that’s working on different details of methodology and understanding on what’s the best way to scan to make it the most consistent data set. We can maybe make a fingerprint for the species, visual traits as well as hyperspectral traits, and then translate that into the field, which is the ultimate goal. We were discussing it with a private donor. We were then asked to put together a proposal to outline some more details, and the key things there were how to efficiently and quickly digitize the herbarium while maintaining the highest degree of accuracy.
Mongabay: What qualifies as good data? Could you also walk me through what the data gathering and curation process looks like?
Jordan Teisher: You want accurate transcriptions of the labels of the herbarium specimens. There’s an enormous number of transcriptions that need to be done of these labels. We start by taking a very high-resolution picture of the specimens. Those get put onto a shared portal, and then transcribers go through them and make sure they’re capturing all the information. That then goes through multiple rounds of quality control checks. We’re going to be hiring a team of 10 herbarium digitization assistants, and one of their roles will be reviewing all the transcriptions and making sure that all of that information is accurate.
The really difficult part, and the part that’s really hard to assess, is when you, for example, download data from an online repository and you want to know if the specimen is identified correctly. So one-quarter of the time of the herbarium digitization assistants and those working on the hyperspectral scanning is going to be spent in a curatorial mentorship program where they’ll be taking workshops, attending lectures, classes, pairing with a curator or curatorial assistant to develop expertise in an area of the collection where we have historically not had that much expertise. We have certain families of plants that are extraordinarily well-curated because we’ve had curators working on them for a long time. There are other families or parts of the world where we’re a little bit less confident. We want to make sure that someone is taking a look at them, updating the names, identifying anything that might be not identified, or identified incorrectly. That’s the really critical part, and it’s the part that, at least at this stage, there’s no faster way to do. You have to have a human who’s looked at the plants before who’s going to go through them and identify the errors. Having those accurate species identifications, again, is essential to building the models to identify the species.
Mongabay: How do you envision the final model? How do you see it being used by someone working in the field?
Gunter Fischer: The idea is that we create a web portal where someone can upload either an herbarium scan or hyperspectral data, and then the underlying algorithm will give you a likelihood for identification. In other words, you upload your specimen, and it will say there’s a 90% confidence that it will be species A, or 100% confidence that it is species B. That’s the ultimate outcome of the project and, in fact, it would be then available to everyone. Someone in the field collecting plant specimens can take a photo and then upload it to the website and get a likelihood back for an identification. The sensors we are using can also be mounted on a drone. So one could imagine, in the future, the drone flies over a canopy of a tropical forest scanning the canopy, and then the images taken of the individual trees and their characters can be compared with our reference library from the herbarium. This should answer a lot of questions for you. What are the trees in the canopy? How rare is a tree species? How many individuals do you have in the population? Is sustainable forestry working? Are there any species stressed by climate change? This would be a game changer in conservation biology.
Mongabay: Could you walk me through the training data and how it is being used to build the model?
Jordan Teisher: We have these portable units that scan a leaf. That is where we’re going to start, and probably scan multiple leaves per specimen so that you’re getting the full range of absorption or reflectance across the light spectrum. That goes as one portion of the data. On the visual side, there are already programs that exist to extract leaf shape information from herbarium specimens. So as long as your leaf is laid out nicely, you can extract leaf information really easily. We’re going to be hiring a postdoc to look at developing tools to extract fruit and flower morphology information from the herbarium images. It’s a combination of the leaf traits, fruit and flower traits, and the hyperspectral.
Gunter Fischer: In terms of wavelengths measured, it’s about more than 2,000 nanometers of wavelengths that are measured. For each wavelength, you get a reflectance pattern. The amount of data we get out of the hyperspectral scanning is enormous. It’s almost like a spectral fingerprint you get from a species.
Mongabay: What have been the challenges in doing this? And what hurdles do you anticipate as you build this over the next few years?
Jordan Teisher: One of the big challenges, from my perspective, is trying to digitize the herbarium without having to close down large chunks of it. We’re very lucky to have this private donor supporting us who’s been flexible with the timeline on which this is done. This six-year project enables us to digitize the herbarium as quickly as we can while maintaining functional access for our researchers.
Technologically, it will be very cutting-edge. For example, we don’t know which of those more than 2,000 wavelengths are going to be the key ones. We’ve not yet worked out all the details. The background of the paper, for example, that you scan on has an impact on your hyperscan. We need to make sure that you’re either statistically correcting for that, which we don’t yet have a means to do, or that you are doing some kind of step in there like putting up a piece of black paper behind it, for example. In terms of developing the models, there are ways in which AI performs unexpectedly well. Then there are other ways, at least in my experience, in which it performs unexpectedly poorly. We have been playing with it already for doing label transcriptions. There are times that it just kind of blows your mind about how well it does. And then other times where you kind of want to shake it and say, “I don’t know why you’re struggling so hard.” I think there’ll be lots of technical challenges, but it’s inherent to trying to do something really new.
Gunter Fischer: The active part of the research really is to figure out some of those questions. I think one of the biggest challenges ahead is figuring out how we can use the methods developed for herbarium specimens in the field. Basically, how to calibrate the same method for live material? It makes a huge difference whether you scan a dried leaf or a leaf which is alive. And so the challenge then would be how do we calibrate the reference library created from dried specimens to make sure it works with live material? If we achieve this, that’s an absolute game changer.
Mongabay: How do you see this project making an impact on the ground in the next 10 years?
Gunter Fischer: The best-case scenario, as I mentioned, would be that we have a drone which flies over the canopy and it tells you what species are there. In addition to the species composition of a forest, because you can combine the hyperspectral sensors with lidar sensors, you basically get a three-dimensional image of a forest. You can make on-the-spot conservation decisions.
One thing you can learn from hyperspectral scanning, especially the near-infrared spectrum, is the health of plants. It’s commonly used in agriculture already. In cornfields, drones are used to take near-infrared images and the infrared spectrum would tell you whether the cornfield is healthy or not or if it is water-stressed. The same technology also applies to forests. We are talking about the Amazon or rainforests in Borneo being stressed by climate change because of unusual droughts. Well, scanning in the infrared and the hyperspectral wavelength will tell you right away where the forest is stressed. It would go one step further to say, “Well, it’s not actually the entire forest which is stressed. It’s a few particular species which are actually more stressed than others.” This allows for much more targeted conservation action. I think this is a key issue because people are talking about climate change resilience. Knowing which species can cope with climate change and which ones can’t is an extremely important part of addressing that question.
Jordan Teisher: I would also hope that it would highlight the importance of the kind of expertise that we have much too little of in terms of the ability to accurately identify species. We take it for granted because we are sitting on the shoulders of generations of botanists and biologists who’ve come before us and generated all this data. My hope is that this work encourages taxonomy as part of regular training. What I’m excited about is the element of reintroducing society to the idea of taxonomy as something worth doing and investing in.
Gunter Fischer: There are so many other applications. Climate change resilience is one aspect of it. But sustainable forestry is another important topic. What is the resilience of a forest? Is sustainable forestry really sustainable? How successful are we in restoring our ecosystems? The new technology and what we envisage could possibly help to address some of these questions. There are so many applications where a calibrated reference library of identified species could be used to accelerate downstream applications, especially in the fields of conservation and restoration.
Banner image: A golden shrimp plant (Pachystachys lutea) in Missouri Botanical Garden. Image by Sharon Mollerus via Flickr (CC BY 2.0).
Abhishyant Kidangoor is a staff writer at Mongabay. Find him on 𝕏 @AbhishyantPK.
One atlas to map all ecosystems on Earth: Interview with Yana Gevorgyan