Guillaume Cabanac tracks fake science

03.06.2022, by

According to the journal Nature, Guillaume Cabanac is one of the ten people who helped shape science in 2021.

“Nucleic corrosive” for “nucleic acid”. “Counterfeit conscience” in place of “artificial intelligence”. These are some of the “tortured” phrases that the CNRS researcher Guillaume Cabanac tracks in scientific publications in order to identify those that are unreliable. The software program that he co-developed for this purpose earned him a spot in the 2021 Top 10 selected by the journal Nature.

The scientific journal Nature, one of the world’s oldest and most reputable, has nicknamed him the “deception sleuth”. His talent for detecting unreliable publications has earned the 39-year-old Cabanac, an academic at the IRIT,1 the honour of being among the ten people who helped shape science in 2021 according to the journal.

In collaboration with Cyril Labbé, a computer scientist from Grenoble (southeas France), and the Russian mathematician Alexander Magazinov, he conceived and developed the Problematic Paper Screener software. Each night, this tool’s Torture detector scrutinises the 120 million publications indexed by the bibliographical database Dimensions, in an effort to ferret out tortured expressions.

This term refers to incongruous assemblies of words, which are symptomatic of misconduct and fraud, and unfortunately get past the filters of certain ineffective or even fraudulent journals, thereby corrupting a small fraction of the scientific literature.

A team of invisible detectives

To understand why, we must put ourselves in the shoes of fraudsters. For them, simply cutting and pasting paragraphs from legitimate articles – the building block for creating a fake paper – is not possible due to plagiarism detectors.

However, they can paraphrase the text with synonyms thanks to software designed for this very purpose, although this may generate the famous tortured phrases. Paste the result without checking it, and you get “kidney disappointment” instead of kidney failure. “Bosom peril” rather than breast cancer. “Nucleic corrosive” in place of nucleic acid. And “counterfeit consciousness” stands in for artificial intelligence!

The trio chose a participatory approach for post-publication reassessment. In short, an invisible college of fifty “detectives” made up of researchers and enthusiasts identify new tortured phrases that Cabanac adds to the software. Articles that contain such terms are then flagged as problematic and reported on PubPeer.org, a platform managed by two CNRS researchers where scientists from all disciplines and countries can contribute their expertise by adding comments to reassess flagged papers.

Currently the software is tracking 550 tortured phrases among the 16,000 new papers Dimensions indexes daily. Of the nearly 6 million articles published in 2021, more than 1,500 were identified as containing at least two tortured phrases, a criterion that ensures that at least 98% are genuine cases of plagiarism after reassessment. The phenomenon is on the increase, since using the same criteria, there were 416 such cases in 2018, 638 in 2019, and 1,135 in 2020.

In addition, another problematic paper screener, developed in collaboration with Cyril Labbé and Australian researchers, enabled them to flag some 700 articles on oncology published between 2014 and 2018 that included errors in genetic sequences, and are nonetheless cited more than 20,000 times in other scientific articles.

Cabanac’s colleague Labbé enjoys “uncovering certain ‘surreal’ papers. But for Guillaume, indignation often wins out”. Fewer than three “pollutions” for every 10,000 articles may not seem much, but “would we accept three of the world’s 10,000 daily flights ending in a crash?” Cabanac wonders. “Like a house, science is made of building blocks. The use of unreliable and even corrupt elements – and their reuse by the scientific community – could eventually threaten the structure itself!”

Cleaners versus paper mills

This enthusiasm for “cleaning” potentially originated in the Cabanac’s family and moral values. The researcher’s grandfather was a trade union representative driven by the light of learning throughout his life, and Cabanac, who is from a modest background, has always been attracted to culture. He believes in the ethos of science, especially in impartiality and organised scepticism.

However, due to the mass dissemination of publications (which have almost doubled in ten years), “these two safeguards theorised by the sociologist Robert K. Merton2 have today been trampled underfoot by ‘paper mills’,”3 which churn out patchwork articles by the kilometre, stealing from and paraphrasing reliable publications. All of this takes place in a highly competitive world where researchers are often subject to the famous “publish or perish” imperative.

It was during a post-graduate degree course in business informatics that research seriously drew Cabanac’s attention. His internship as a programmer working with a doctoral student at Université Paul-Sabatier in Toulouse proved to be an arduous, stimulating, and foundational experience. He pursued a master’s degree at the university, and secured government funding for his dissertation on collective annotation activity for digital texts, doing so in the same laboratory that would welcome him as an associate professor in 2009, nine months after he defended his thesis.

“When does science fail to correct itself?”

Motivated by an interdisciplinary approach, social bonds, and human relations, he approached researchers in the sociology of science, and in 2016 completed his HDR (Accreditation to Supervise Research) dissertation entitled “Questioning scientific texts”. He believes that his work, which requires very few resources, reflects his social background: “All I need is a pencil, a notebook, a computer, and an Internet connection!” He thinks of himself as a full stack person: as a researcher he has ideas; as an engineer he solves problems; as a technician he programs and gets his hands dirty in digital grease; and as a teacher he believes that his courses help fuel his research.

Today Cabanac is a member of the Scientific Board of the CNRS’s Institute for Humanities and Social Sciences. He works on the preventive component of cleaning operations for the French-Dutch ERC project Nanobubbles, which tackles a crucial issue: “How, when and why does science fail to correct itself?” The trio of detectives has been approached by a dozen publishers such as ACM, Elsevier, IEEE, the Institute of Physics, SAGE, and Springer Nature to integrate the Problematic Paper Screener ahead of the publishing process.

Given today’s infodemic, the Toulouse researcher’s initiative appears increasingly relevant. The French Office for Research Integrity (OFIS) still considers “post-publication comments” to be part of “a scientist’s ordinary activity, in the same manner as the traditional peer evaluation”4, Cabanac points out. With contagious enthusiasm, he invites everyone to join in the effort to help clean scientific literature.

Footnotes

1. Institut de recherche en informatique de Toulouse (CNRS / Toulouse INP / Université Toulouse III - Paul Sabatier).
2. In “A note on science and democracy,” Robert K. Merton, 1942, Journal of Legal and Political Sociology.
3. “Paper mills” is a common expression used in major scientific publications such as Nature (see for example https://doi.org/10.1038/d41586-021-00733-5).
4. Note from 27 September 2021, https://www.hceres.fr/en/node/30562040