flowchart LR A[10560] -- Query: retraction watch --> B(22) B -- Manual curation --> C(12)
Retraction Watch
This notebook finds Rogue Scholar blog posts about the Retraction Watch project using the Rogue Scholar API.
Introduction
This notebook finds Rogue Scholar blog posts about the Retraction Watch project using the Rogue Scholar API. Retraction Watch reports on retractions of scientific papers. the project was started in 2010 by Ivan Oransky and Adam Marcus.
- We use the query
retraction watch
. - We limit results to posts published since
2010
(the year Retraction Watch launched) anden
as language. - We retrieve the
title
,authors
,publication date
,abstract
,blog name
,blog_slug
, anddoi
- We sort the results in reverse chronological order (newest first)
Results
We found 22 blog posts mentioning retraction watch
out of 10560 total posts, and ended up with 12 posts after manual curation:
Code
import requests
import locale
import json
import pydash as py_
import re
import html
from typing import Optional
import datetime
from IPython.display import Markdown
"en_US")
locale.setlocale(locale.LC_ALL, = "https://api.rogue-scholar.org/"
baseUrl = "retraction watch"
query = "2010"
published_since = 0
feature_image = [1,2,3,9,12,16]
curated
= "title,authors,published_at,summary,blog_name,blog_slug,doi,url,image"
include_fields = baseUrl + f"posts?query={query.replace(' ', '+')}&published_since=2010&language=en&sort=published_at&order=desc&per_page=50&include_fields={include_fields}"
url = requests.get(url)
response = response.json()
result
def get_post(post):
return post["document"]
def format_post(post):
= post.get("doi", None)
doi = f"[{doi}]({doi})\n<br />" if doi else ""
url = f"[{post['title']}]({doi})" if doi else f"[{post['title']}]({post['url']})"
title = datetime.datetime.utcfromtimestamp(post["published_at"]).strftime("%B %-d, %Y")
published_at = f"[{post['blog_name']}](https://rogue-scholar.org/blogs/{post['blog_slug']})"
blog = ", ".join([ f"{x['name']}" for x in post.get("authors", None) or [] ])
author = post["summary"]
summary return f"### {title}\n{url}Published {published_at} in {blog}<br />{author}<br /><br />{summary}\n"
= [ get_post(x) for i, x in enumerate(result["hits"]) if i not in curated]
posts = "\n".join([ format_post(x) for x in posts])
posts_as_string
def doi_from_url(url: str) -> Optional[str]:
"""Return a DOI from a URL"""
= re.search(
match r"\A(?:(http|https)://(dx\.)?(doi\.org|handle\.stage\.datacite\.org|handle\.test\.datacite\.org)/)?(doi:)?(10\.\d{4,5}/.+)\Z",
url,
)if match is None:
return None
return match.group(5).lower()
# Get csl-formatted metadata for all posts that have a DOI
def get_csl(post):
= doi_from_url(post["doi"])
doi = requests.get(baseUrl + "posts/" + doi + "?format=csl")
res = res.json()
csl "title"] = html.unescape(csl["title"])
csl[return json.dumps(csl, indent=2)
= "[\n" + ",\n".join([ get_csl(x) for x in posts if x.get("doi", None) is not None ]) + "\n]"
csl_list with open('references.json', 'w') as f:
f.write(csl_list)
= [ x["image"] for x in posts if x.get("image", None) is not None ]
images = images[feature_image]
image = f"\n\n"
markdown += posts_as_string
markdown Markdown(markdown)
Generating Overlay blog posts
https://doi.org/10.53731/gzrse-p5d35
Published October 11, 2023 in Front Matter
Martin Fenner
On Monday the Rogue Scholar science blog archive launched a dedicated API.
(The?) 3 kinds of papermills
https://doi.org/10.59350/dakrb-j7a75
Published October 31, 2022 in Stories by Adam Day on Medium
Adam Day
TL;DR: Join me at ConTech Live to hear about a recent project with Open Credo to see if we could detect unusual co-authorships in a dataset created by Anna Abalkina. Sign up here! Papermilling has a few definitions which you see here and there.
This blog turned 15 (years old) this month
https://doi.org/10.53731/bs60jms-sqaehsk
Published August 11, 2022 in Front Matter
Martin Fenner
The first post on this blog was published on August 3, 2007 (Open access may become mandatory for NIH-funded research). This is post number 465, and in the past 15 years the blog has seen changes in technology and hosting location – but I wrote all posts (with the exception of a few guest posts). The overall theme remained unchanged: technology used in scholarly communication.
When your journal reads you
Published April 14, 2021 in Elephant in the Lab
Elias Koch
Introduction Renke Siems In December 2018, a University of Minnesota web librarian, Cody Hanson, participated in a workshop hosted by the Coalition for Networked Information. The topic of this, and a number of other events to date, is the drive by major scholarly publishers to more fully integrate authentication systems for accessing electronic media into their platforms.
What’s going on with Oculudentavis?
https://doi.org/10.59350/hk3jx-6sv77
Published July 22, 2020 in Sauropod Vertebra Picture of the Week
Mike Taylor
Back in March, Nature published “Hummingbird-sized dinosaur from the Cretaceous period of Myanmar” by Xing et al. (2020), which described and named a tiny putative bird that was preserved in amber from Myanmar (formerly Burma). It’s a pretty spectacular find. Today, though, that paper is retracted. That’s a very rare occurrence for a palaeontology paper.
Suppression as a form of liberation?
https://doi.org/10.59350/v5rp0-nde12
Published July 3, 2020 in A blog by Ross Mounce
Ross Mounce
On Monday 29th June 2020, I learned from Retraction Watch that Clarivate, the for-profit proprietor of Journal Impact Factor ™ has newly “suppressed” 33 journals from their indexing service.
The R2R debate, part 1: opening statement in support
https://doi.org/10.59350/c8fz7-kyc20
Published February 27, 2020 in Sauropod Vertebra Picture of the Week
Mike Taylor
This Monday and Tuesday, I was at the R2R (Researcher to Reader) conference at BMA House in London.
The lowest common denominator: marketing science with jIF
https://doi.org/10.59350/8wafm-6yc04
Published July 8, 2016 in GigaBlog
Scott Edmunds
Shallow Impact. Tis the season. In case people didn’t know— the world of scientific publishing has seasons: There is the Inundation season, which starts in November as authors rush to submit their papers before the end of year. Then there is the Recovery season beginning in January as editors come back from holidays to tackle the glut.
My Blank Pages V: Raw Data
https://doi.org/10.59350/j2nba-89g15
Published March 31, 2016 in quantixed
Stephen Royle
Raw Data: A novel on Life in Science by Pernille Rørth (Springer, 2016) I was keen to read this “lab lit” novel written by renowned cell biologist Pernille Rørth. I’d seen lots of enthusiastic comments about the book, and it didn’t disappoint.
What Difference Does It Make?
https://doi.org/10.59350/cmyz9-ms451
Published January 1, 2016 in quantixed
Stephen Royle
A few days ago, Retraction Watch published the top ten most-cited retracted papers. I saw this post with a bar chart to visualise these citations. It didn’t quite capture what the effect (if any) a retraction has on citations. I thought I’d quickly plot this out for the number one article on the list. The plot is pretty depressing. The retraction has no effect on citations.
The Medical Journal of Australia vs Elsevier
https://doi.org/10.59350/6m0m9-bng28
Published May 6, 2015 in Sauropod Vertebra Picture of the Week
Matt Wedel
While Mike’s been off having fun at the Royal Society, this has been happening: Lots of feathers flying right now over the situation at the Medical Journal of Australia (MJA). The short, short version is that AMPCo, the company that publishes MJA, made plans to outsource production of the journal, and apparently some sub-editing and […]
Getting Techy With It: GigaScience Technology Update 2014
https://doi.org/10.59350/tzayy-hvn20
Published November 27, 2014 in GigaBlog
Nicole Nogoy
When it comes to technology, GigaScience has always been open and willing to embrace new ways of integrating technology in its publishing processes, with the ultimate goal of working towards more reproducible, interactive and executable papers.
Continuing the push beyond static documents. ISMB, and more on our “What Bioinformaticians need to know about digital publishing beyond the PDF2” workshop
https://doi.org/10.59350/gtw2x-fc921
Published July 31, 2014 in GigaBlog
Scott Edmunds
Boston 2014: More than a (Bioinformatics) Feeling Following from our previous posting on BOSC, our birthday and the BMC Open Data award party in Boston, on top of having to dash between the many great talks and sessions at ISMB, we were kept even busier than usual helping to organize and present in a special Beyond-the-PDF inspired “What Bioinformaticians need to know about digital publishing beyond the PDF” workshop at the end
Rewarding Reproducibility: First Papers in our Galaxy Series utilizing our GigaGalaxy platform
https://doi.org/10.59350/w8kyr-0e629
Published February 6, 2014 in GigaBlog
Scott Edmunds
Push the button! GigaScience moves toward more interactive articles Research articles are being published with increasingly large and complicated supporting datasets, together with the software code used in analyses of the data.
The difficulties sharing neuroscience data: can data publishing help?
https://doi.org/10.59350/kmne3-5xg86
Published May 9, 2013 in GigaBlog
Scott Edmunds
Last week we published our first neuroscience data note containing 10GB of fMRI data hosted and integrated into the paper by a DOI to our GigaDB database. While we have published a number of genomics datasets and data notes (see the Puerto Rican Parrot genome data note and its associated data DOI), this is a nice example of us providing a home for “orphan data”, the long tail of data types without community agreed curated repositories.