It has become clear that text mining is a great tool for rapid analysis of large volumes of biomedical literature. It delivers valuable information that helps to improve the accuracy and speed of the R&D process, as well as creating the opportunity to make more informed business decisions. As we’ve seen in our previous articles, text mining is clearly the tool to revolutionize Big Data!
An important issue encountered in text mining is that in many cases, the search function looks at freely available abstracts, extracted from databases such as PubMed. Many scientists that use text-mining are therefore running analyses on abstracts of the article rather than on the full text version.
And there might be a problem there… While it is true that abstracts contain valuable sections of information, there are limitations. A brief example is that the ‘materials & methods’ section is missing, which dramatically affects the final results of text mining.
With RightsDirect and its XML for Mining tool, you will have access to millions of full text and high quality peer reviewed papers from 44 participating publishers as input for your text mining project. This database includes your company’s subscribed articles, but unsubscribed articles can also be added to give you the broadest text mining corpus available.
Let us explain some key advantages to mining full-text articles when compared to abstract-only methods:
More Facts and Relationships
As you can guess, full articles contain more named entities and relationships between them than an abstract. A study from the Journal of Biomedical Informatics shows that less than 8% of the scientific claims made within articles were found in their abstracts!
On further investigation, one publisher conducted a study that used abstracts in PubMed to derive relevant information about drugs and proteins involved in the progression of fibromyalgia.
When mining the abstract, only 31 relationships were found, compared to the 53 additional relationships that were revealed when the full-text of the same articles were mined.
Therefore, there is no doubt that full-text articles are superior to abstract-only methods in the delivery of valuable information concerning potential causal relationships in pathology to researchers. But that is not all…
As highlighted before, text mining is also great to disclose some adverse effects before they have been determined in the preclinical stage of your drug discovery process.
That drives us to the second point:
Access to Secondary Study Findings and Adverse Event Data
As an abstract is limited, authors often do not include discoveries and observations that are considered to be less relevant or out of scope with the main idea of the publication; Only the most important findings are presented.
Because of this, there is an issue with timelines in reporting subsequent study findings in the abstracts. In fact, it can take one to two years for a study finding to get into the abstract of a subsequent article.
And lastly, negative study results are often missing from abstracts. According to further research, “abstracts published in high impact factor medical journals under-report harm, even when the articles provide information in the main body of the article“.
This impacts the value of abstracts directly, especially in the context of pharmacovigilance (linked to adverse events).
As you all know, text-mining is revolutionizing Big Data, but mining only using abstracts instead of full-text article versions is a bit like traveling on a highway with a horse…