Geneticists Debate Over How many sequences of genes does a human being have?
In 2000, researchers were faced with just one questions, “How many sequences of genes does a human being have?”
Today, researchers have released a new data of the human genome sequence yet the answer to the above-mentioned question remains on the debate. According to the researchers, this is a result of a “knowledge gap” which is the biggest challenge also faced by researchers working to discover new ways to block disease-related mutations.
BioRXiv posted the latest attempt to plug the gap on the 29th of May by leveraging data from almost 5, 000 genes which are previously spotted in early studies. Nearly 1, 200 of these genes were identified to carry protein-making instructions. Overall, there are at least 20, 000 – 21, 000 protein-coding genes in the present tally.
However, many geneticists are still not convinced with the estimates despite the massive amount of data collected since the previous research. Steven Salzberg, a computational biologist at Johns Hopkins University says that the situation is simply a proof of how difficult it is to identify new genes. “Researchers have been working hard on this for 20 years but until now, we still do not have the answers,” says Salzberg.
In 2000, the genomics community buzzed over the question of how many genes would be found in a human being, leading Ewan Birney to launch the GeneSweep contest. Birney called everyone to bet during the annual genetics meeting and ended up placing the first bet. The contest attracted more than 1, 000 entries and hit a US$3, 000 jackpot, with people betting between 312, 000 and 26, 000. However, no one had really won as the number continued to shrink – now less than 21, 000.
Currently, Salzberg’s team is using data from the Genotype-Tissue Expression (GTEx) project which sequenced RNA from more than 30 different tissues. Then they assembled GTex’s 900 billion tiny RNA snippets with one human genome. This structure allows the team to identify the genes responsible for encoding proteins.
Salzberg explains that the number can still vary depending on a lot of factors such as the amount of data being analyzed and the criteria used to weed out negative genes. “Just because a stretch of DNA is expressed as RNA, doesn’t mean it’s a gene,” says Salzberg. Researchers have to divide the genes between the good ones and the bad ones. Hence, the team leveraged different criteria to separate the positive and negative geneses, then compare the results to several other species for validation.
The process yield to approximately 21,306 protein-coding genes and 21,856 non-coding genes. However, the results were still not enough to convince the researchers as the team considers the discrepancy in the record kept by GENCODE, which includes 19, 901 protein-coding genes and 15, 779 non-coding genes.
Kim Pruitt, a genome researcher at the NCBI in Bethesda, says that the difference is probably due to the variation in the volume of data analyzed between the two teams. Salzberg’s team apparently used a larger chunk of data and solely relied on computers for data analysis while the latter used manual curation.
Further validation needed
Despite the latest gene tally, scientists recommend to conduct further research and gather more evidence on the precise number of human genes. According to Adam Frankish, a computational biologist at the RBI, his group have conducted revalidation on Salzberg’s new gene tally but his team only found 1 out of 100 to be true protein-coding genes.
Pruitt’s team also scanned at least 12 of Salzberg’s protein-coding genes but claimed that most of them actually belong to retroviruses while others are repetitive genes, which cannot be considered as protein-coding genes.
In his defense, Salzberg says that some repetitive sequence, such as the ERV3-1, can still be considered genes despite the fact that they may be cancer-carrying. Nevertheless, he understands that before his new gene tally becomes fully accepted, it has to undergo a series of validation from other teams as well.
Emmanouil Dermitzakis, a geneticist at the University of Geneva says that some of the genes identified by Salzberg’s group are still valid but the team needs to conduct more validations to segregate the invalid genes. “Perhaps the inclusion of negative genes is the reason why Salzberg’s protein count increased to 5 percent compared to the previous tallies,” says Dermitzakis.
Many geneticists were impressed with Salzberg’s effort to release a new gene tally in nearly after 15 years. However, an emphasis was placed on the significance of getting the accurate tally of genes. Eventually, people will question the inconsistencies between the previous and latest gene tallies and Salzberg needs to find an explanation to this.