Retraction Watch

This notebook finds Rogue Scholar blog posts about the Retraction Watch project using the Rogue Scholar API.

Author
Affiliation

Martin Fenner

Front Matter

Published

November 10, 2023

Introduction

This notebook finds Rogue Scholar blog posts about the Retraction Watch project using the Rogue Scholar API. Retraction Watch reports on retractions of scientific papers. the project was started in 2010 by Ivan Oransky and Adam Marcus.

Note
  • We use the query retraction watch.
  • We limit results to posts published since 2010 (the year Retraction Watch launched) and en as language.
  • We retrieve the title, authors, publication date, abstract, blog name, blog_slug, and doi
  • We sort the results in reverse chronological order (newest first)

Results

We found 22 blog posts mentioning retraction watch out of 10560 total posts, and ended up with 12 posts after manual curation:

flowchart LR
  A[10560] -- Query: retraction watch --> B(22)
  B -- Manual curation --> C(12)
Code
import requests
import locale
import json
import pydash as py_
import re
import html
from typing import Optional
import datetime
from IPython.display import Markdown
locale.setlocale(locale.LC_ALL, "en_US")
baseUrl = "https://api.rogue-scholar.org/"
query = "retraction watch"
published_since = "2010"
feature_image = 0
curated = [1,2,3,9,12,16]

include_fields = "title,authors,published_at,summary,blog_name,blog_slug,doi,url,image"
url = baseUrl + f"posts?query={query.replace(' ', '+')}&published_since=2010&language=en&sort=published_at&order=desc&per_page=50&include_fields={include_fields}"
response = requests.get(url)
result = response.json()

def get_post(post):
    return post["document"]

def format_post(post):
    doi = post.get("doi", None)
    url = f"[{doi}]({doi})\n<br />" if doi else ""
    title = f"[{post['title']}]({doi})" if doi else f"[{post['title']}]({post['url']})"
    published_at = datetime.datetime.utcfromtimestamp(post["published_at"]).strftime("%B %-d, %Y")
    blog = f"[{post['blog_name']}](https://rogue-scholar.org/blogs/{post['blog_slug']})"
    author = ", ".join([ f"{x['name']}" for x in post.get("authors", None) or [] ])
    summary = post["summary"]
    return f"### {title}\n{url}Published {published_at} in {blog}<br />{author}<br /><br />{summary}\n"

posts = [ get_post(x) for i, x in enumerate(result["hits"]) if i not in curated]
posts_as_string = "\n".join([ format_post(x) for x in posts])

def doi_from_url(url: str) -> Optional[str]:
    """Return a DOI from a URL"""
    match = re.search(
        r"\A(?:(http|https)://(dx\.)?(doi\.org|handle\.stage\.datacite\.org|handle\.test\.datacite\.org)/)?(doi:)?(10\.\d{4,5}/.+)\Z",
        url,
    )
    if match is None:
        return None
    return match.group(5).lower()

# Get csl-formatted metadata for all posts that have a DOI
def get_csl(post):
    doi = doi_from_url(post["doi"])
    res = requests.get(baseUrl + "posts/" + doi + "?format=csl")
    csl = res.json()
    csl["title"] = html.unescape(csl["title"])
    return json.dumps(csl, indent=2)

csl_list = "[\n" + ",\n".join([ get_csl(x) for x in posts if x.get("doi", None) is not None ]) + "\n]"
with open('references.json', 'w') as f:
    f.write(csl_list)

images = [ x["image"] for x in posts if x.get("image", None) is not None ]
image = images[feature_image]
markdown = f"![]({image})\n\n"
markdown += posts_as_string
Markdown(markdown)

Generating Overlay blog posts

https://doi.org/10.53731/gzrse-p5d35
Published October 11, 2023 in Front Matter
Martin Fenner

On Monday the Rogue Scholar science blog archive launched a dedicated API.

(The?) 3 kinds of papermills

https://doi.org/10.59350/dakrb-j7a75
Published October 31, 2022 in Stories by Adam Day on Medium
Adam Day

TL;DR: Join me at ConTech Live to hear about a recent project with Open Credo to see if we could detect unusual co-authorships in a dataset created by Anna Abalkina. Sign up here! Papermilling has a few definitions which you see here and there.

This blog turned 15 (years old) this month

https://doi.org/10.53731/bs60jms-sqaehsk
Published August 11, 2022 in Front Matter
Martin Fenner

The first post on this blog was published on August 3, 2007 (Open access may become mandatory for NIH-funded research). This is post number 465, and in the past 15 years the blog has seen changes in technology and hosting location – but I wrote all posts (with the exception of a few guest posts). The overall theme remained unchanged: technology used in scholarly communication.

When your journal reads you

Published April 14, 2021 in Elephant in the Lab
Elias Koch

Introduction Renke Siems In December 2018, a University of Minnesota web librarian, Cody Hanson, participated in a workshop hosted by the Coalition for Networked Information. The topic of this, and a number of other events to date, is the drive by major scholarly publishers to more fully integrate authentication systems for accessing electronic media into their platforms.

What’s going on with Oculudentavis?

https://doi.org/10.59350/hk3jx-6sv77
Published July 22, 2020 in Sauropod Vertebra Picture of the Week
Mike Taylor

Back in March, Nature published “Hummingbird-sized dinosaur from the Cretaceous period of Myanmar” by Xing et al. (2020), which described and named a tiny putative bird that was preserved in amber from Myanmar (formerly Burma). It’s a pretty spectacular find. Today, though, that paper is retracted. That’s a very rare occurrence for a palaeontology paper.

Suppression as a form of liberation?

https://doi.org/10.59350/v5rp0-nde12
Published July 3, 2020 in A blog by Ross Mounce
Ross Mounce

On Monday 29th June 2020, I learned from Retraction Watch that Clarivate, the for-profit proprietor of Journal Impact Factor ™ has newly “suppressed” 33 journals from their indexing service.

The R2R debate, part 1: opening statement in support

https://doi.org/10.59350/c8fz7-kyc20
Published February 27, 2020 in Sauropod Vertebra Picture of the Week
Mike Taylor

This Monday and Tuesday, I was at the R2R (Researcher to Reader) conference at BMA House in London.

Guest Blog: Data in the time of Coronavirus

https://doi.org/10.59350/qh3na-ehy20
Published February 4, 2020 in GigaBlog
Scott Edmunds

With much of the GigaScience team spanning the Hong Kong-Shenzhen border and now confined to remote working, the current 2019-novel coronavirus outbreak has been particularly disruptive and close to home.

The lowest common denominator: marketing science with jIF

https://doi.org/10.59350/8wafm-6yc04
Published July 8, 2016 in GigaBlog
Scott Edmunds

Shallow Impact. Tis the season. In case people didn’t know— the world of scientific publishing has seasons: There is the Inundation season, which starts in November as authors rush to submit their papers before the end of year. Then there is the Recovery season beginning in January as editors come back from holidays to tackle the glut.

My Blank Pages V: Raw Data

https://doi.org/10.59350/j2nba-89g15
Published March 31, 2016 in quantixed
Stephen Royle

Raw Data: A novel on Life in Science by Pernille Rørth (Springer, 2016) I was keen to read this “lab lit” novel written by renowned cell biologist Pernille Rørth. I’d seen lots of enthusiastic comments about the book, and it didn’t disappoint.

What Difference Does It Make?

https://doi.org/10.59350/cmyz9-ms451
Published January 1, 2016 in quantixed
Stephen Royle

A few days ago, Retraction Watch published the top ten most-cited retracted papers. I saw this post with a bar chart to visualise these citations. It didn’t quite capture what the effect (if any) a retraction has on citations. I thought I’d quickly plot this out for the number one article on the list. The plot is pretty depressing. The retraction has no effect on citations.

The Medical Journal of Australia vs Elsevier

https://doi.org/10.59350/6m0m9-bng28
Published May 6, 2015 in Sauropod Vertebra Picture of the Week
Matt Wedel

While Mike’s been off having fun at the Royal Society, this has been happening: Lots of feathers flying right now over the situation at the Medical Journal of Australia (MJA). The short, short version is that AMPCo, the company that publishes MJA, made plans to outsource production of the journal, and apparently some sub-editing and […]

Getting Techy With It: GigaScience Technology Update 2014

https://doi.org/10.59350/tzayy-hvn20
Published November 27, 2014 in GigaBlog
Nicole Nogoy

When it comes to technology, GigaScience has always been open and willing to embrace new ways of integrating technology in its publishing processes, with the ultimate goal of working towards more reproducible, interactive and executable papers.

Continuing the push beyond static documents. ISMB, and more on our “What Bioinformaticians need to know about digital publishing beyond the PDF2” workshop

https://doi.org/10.59350/gtw2x-fc921
Published July 31, 2014 in GigaBlog
Scott Edmunds

Boston 2014: More than a (Bioinformatics) Feeling Following from our previous posting on BOSC, our birthday and the BMC Open Data award party in Boston, on top of having to dash between the many great talks and sessions at ISMB, we were kept even busier than usual helping to organize and present in a special Beyond-the-PDF inspired “What Bioinformaticians need to know about digital publishing beyond the PDF” workshop at the end

Rewarding Reproducibility: First Papers in our Galaxy Series utilizing our GigaGalaxy platform

https://doi.org/10.59350/w8kyr-0e629
Published February 6, 2014 in GigaBlog
Scott Edmunds

Push the button! GigaScience moves toward more interactive articles Research articles are being published with increasingly large and complicated supporting datasets, together with the software code used in analyses of the data.

The difficulties sharing neuroscience data: can data publishing help?

https://doi.org/10.59350/kmne3-5xg86
Published May 9, 2013 in GigaBlog
Scott Edmunds

Last week we published our first neuroscience data note containing 10GB of fMRI data hosted and integrated into the paper by a DOI to our GigaDB database. While we have published a number of genomics datasets and data notes (see the Puerto Rican Parrot genome data note and its associated data DOI), this is a nice example of us providing a home for “orphan data”, the long tail of data types without community agreed curated repositories.

References

Day, A. (2022). (The?) 3 kinds of papermills. Stories by Adam Day on Medium. https://doi.org/10.59350/dakrb-j7a75
Edmunds, S. (2013). The difficulties sharing neuroscience data: can data publishing help? GigaBlog. https://doi.org/10.59350/kmne3-5xg86
Edmunds, S. (2014a). Continuing the push beyond static documents. ISMB, and more on our “What Bioinformaticians need to know about digital publishing beyond the PDF2” workshop. GigaBlog. https://doi.org/10.59350/gtw2x-fc921
Edmunds, S. (2014b). Rewarding Reproducibility: First Papers in our Galaxy Series utilizing our GigaGalaxy platform. GigaBlog. https://doi.org/10.59350/w8kyr-0e629
Edmunds, S. (2016). The lowest common denominator: marketing science with jIF. GigaBlog. https://doi.org/10.59350/8wafm-6yc04
Edmunds, S. (2020). Guest Blog: Data in the time of Coronavirus. GigaBlog. https://doi.org/10.59350/qh3na-ehy20
Fenner, M. (2022). This blog turned 15 (years old) this month. Front Matter. https://doi.org/10.53731/bs60jms-sqaehsk
Fenner, M. (2023). Generating Overlay blog posts. Front Matter. https://doi.org/10.53731/gzrse-p5d35
Mounce, R. (2020). Suppression as a form of liberation? A Blog by Ross Mounce. https://doi.org/10.59350/v5rp0-nde12
Nogoy, N. (2014). Getting Techy With It: GigaScience Technology Update 2014. GigaBlog. https://doi.org/10.59350/tzayy-hvn20
Royle, S. (2016a). My Blank Pages V: Raw Data. Quantixed. https://doi.org/10.59350/j2nba-89g15
Royle, S. (2016b). What Difference Does It Make? Quantixed. https://doi.org/10.59350/cmyz9-ms451
Taylor, M. (2020a). The R2R debate, part 1: opening statement in support. Sauropod Vertebra Picture of the Week. https://doi.org/10.59350/c8fz7-kyc20
Taylor, M. (2020b). What’s going on with Oculudentavis? Sauropod Vertebra Picture of the Week. https://doi.org/10.59350/hk3jx-6sv77
Wedel, M. (2015). The Medical Journal of Australia vs Elsevier. Sauropod Vertebra Picture of the Week. https://doi.org/10.59350/6m0m9-bng28
Back to top