Today, food and herbs don’t come from people’s own or their grandparents’ backyards, so it is not always easy to make sure that the package contains exactly what it says it should.
A new opportunity to do just that, can now be provided by the methodology developed within the framework of the doctoral thesis defended at the University of Tartu, which allows the identification of species on the basis of short DNA sequences, even in processed foods.
“People want to know what their food consists of and for some people, such as people with food allergies, this is not just interesting information, but it’s also important for their health and safety. Even small amounts of allergenic plant residues can mean anaphylaxis or, in the worst case, death,” said Kairi Raime, author of the doctoral thesis and bioinformatics specialist of the Competence Centre on Health Technologies.
There is a similar story with natural remedies. There are cases where one type of herbal active substance has been secretly exchanged for another in order to keep the price favourable. In the worst case, the consumption or use of such products could result in serious poisoning for the consumer.
The methods that have served scientists in previous decades have their drawbacks. In the case of finely ground powder, there is not much to do with analyses of the shape, colour and other similar properties of the food. Nut or lupine flour is not easily detected in this way. Conventional chemical analyses can also be complex.
Here’s where the revolution in DNA sequencing of hereditary material comes to the rescue. “When we take a food, we find traces of the DNA of many organisms that have been used to make it or that have been in contact with it in some way,” Raime pointed out. Even a small amount of DNA is enough to detect them.
In most cases, the PCR method is used. The same approach has been used to confirm the presence of the virus in corona samples. To do this, characteristic sections of the genome of a virus or other organism have been pre-selected, and if present, the so-called corresponding primers attach to them. From there, it is necessary to amplify the labelled DNA until it becomes visible to the analytical equipment. This solution is good enough if the goal is to identify only a few specific species.
“At this point, it (PCR) is a little out of date. If we want to sequence all the DNA in food and identify dozens of species, the development of this methodology will be significantly more expensive and may not be cost-effective at all,” the new doctor explained.
The use of the PCR method is also complicated by the presence of many repeats in plant genomes. PCR primers generated for genomic regions like that bind to many different sites in the genome, which reduces the yield of propagation of the selected genomic region and may also affect species detection by PCR.
Kairi Raime and her colleagues used a more innovative approach. DNA sequencing technologies developed to date are cheap enough to sequence the entire hereditary substance in a sample. As a result, scientists are already developing a huge database of DNA sequences. All you have to do is figure out what questions are reasonable to seek the answer for. That’s exactly what the bioinformatics specialist did.
Raime identified characteristic k-mers in several human food-critical species. Although plant genomes may consist of DNA sequences of billions of base pairs in length, it can be relatively easy to encounter a specific combination of letters, or nucleotides, long enough to be the specific DNA sequence of the species of interest.
In this case, the bioinformatics specialist used 32-mers, or DNA sequences of 32 nucleotides in length. The advantage of the method is that hundreds or thousands of sequences unique to that species, located throughout the genome are used to identify the species.
The finding of suitable k-mers in nuclear genomes was hampered by the fact that the nuclear genomes of many plant species have not yet been sequenced. “There are hundreds of thousands of plant species. Nuclear genome sequences are still not documented enough to find k-mers unique to a particular plant species form the nuclear genomes,” said Raime.
However, there are enough databases of plastid genome sequences that could still be used to find unique k-mers for plants. Plastids are similar in nature to the mitochondria of animal cells. However, instead of having the genes essential for energy production, they have genes necessary for the synthesis of organic matter or the formation of reserves.
There is another advantage to using plastids. Unlike in the cell nucleus, there are more copies of them in plant cells. This ensures that each plant species leaves its mark on the sample.
In the pilot studies, Kairi Raime used the developed methodology to find unique k-mers for, among others, tomatoes, rice, and corn. As regards to experiments with lupins, their flour and wheat flour confirmed that the method is sufficiently sensitive. For example, lupine DNA could be detected in both its seeds and biscuits, which were made from more than 99 percent wheat flour. It’s true that the lower the allergen content, the more sequences per sample had to be considered for a reliable result and the more expensive the analysis became.
“It all depends on what level of sensitivity you’re after. Deeper sequencing is more expensive, but in terms of price, 20 million to 30 million sequences are still optimal, and we found lupine flour in a cookie even if there was only 0.02 percent present,” Raime noted. According to the bioinformatics specialist, the k-mer approach is comparable to the accuracy of the PCR method, which is considered to be the most sensitive in the eyes of the community.
Since the identification of plant species is based on short DNA sequences, it does not matter much if its strands begin to break down into shorter pieces due to heat.
In the future, verification of food safety by k-mer analysis could be reminiscent of taking a single blood sample. Once all the DNA in the sample has been sequenced, it’s easy to examine which species are present there in addition to the original subject of interest as required. “In addition to identifying specific allergens, we can use the DNA sequence data from the same sample to determine if, for example, bacteria, mould, or anything else is present, without additional laboratory work,” added the bioinformatics specialist.
In principle, after the development of the methodology, similar analyses could now be performed by any company that can handle second-generation DNA sequencing and sequence analysis.
“However, I think that currently, the Estonian food industry does not have such opportunities and capabilities for DNA sequence analysis. In the case of interesting research questions, they can always contact the UT Chair of Bioinformatics or the Competence Centre on Health Technologies, where we are currently developing a methodology for identifying different species from food,” confirmed Raime.
Get acquainted with the doctoral thesis in full in the digital collection of the University of Tartu. The work was supervised by Maido Remm, Head of the Chair of Bioinformatics at the University of Tartu.
The translation of this article from Estonian Public Broadcasting science news portal Novaator was funded by the European Regional Development Fund through Estonian Research Council.The translated article was first publishe on the website of Research in Estonia.