Rogue Scholar Digest February 14, 2024

digest

This is a summary of the Rogue Scholar blog posts published February 7 - February 13, 2024.

Author
Affiliation

Martin Fenner

Front Matter

Published

February 14, 2024

Code
import requests
import locale
import re
from typing import Optional
import datetime
from IPython.display import Markdown

locale.setlocale(locale.LC_ALL, "en_US")
baseUrl = "https://api.rogue-scholar.org/"
published_since = "2024-02-07"
published_until = "2024-02-13"
feature_image = 0
include_fields = "title,authors,published_at,summary,blog_name,blog_slug,doi,url,image"
url = (
    baseUrl
    + f"posts?&published_since={published_since}&published_until={published_until}&language=en&sort=published_at&order=asc&per_page=50&include_fields={include_fields}"
)
response = requests.get(url)
result = response.json()


def get_post(post):
    return post["document"]


def format_post(post):
    doi = post.get("doi", None)
    url = f"[{doi}]({doi})\n<br />" if doi else ""
    title = f"[{post['title']}]({doi})" if doi else f"[{post['title']}]({post['url']})"
    published_at = datetime.datetime.utcfromtimestamp(post["published_at"]).strftime(
        "%B %-d, %Y"
    )
    blog = f"[{post['blog_name']}](https://rogue-scholar.org/blogs/{post['blog_slug']})"
    author = ", ".join([f"{x['name']}" for x in post.get("authors", None) or []])
    summary = post["summary"]
    return f"### {title}\n{url}Published {published_at} in {blog}<br />{author}<br />{summary}\n"


posts = [get_post(x) for i, x in enumerate(result["hits"])]
posts_as_string = "\n\n".join([format_post(x) for x in posts])

def doi_from_url(url: str) -> Optional[str]:
    """Return a DOI from a URL"""
    match = re.search(
        r"\A(?:(http|https)://(dx\.)?(doi\.org|handle\.stage\.datacite\.org|handle\.test\.datacite\.org)/)?(doi:)?(10\.\d{4,5}/.+)\Z",
        url,
    )
    if match is None:
        return None
    return match.group(5).lower()

images = [x["image"] for x in posts if x.get("image", None) is not None]
image = images[feature_image]
markdown = f"![]({image})\n\n"
markdown += posts_as_string
Markdown(markdown)

JOSSCast #3: Studying Superbugs – Juliette Hayer on Baargin

https://doi.org/10.59349/x672c-6ep91
Published February 8, 2024 in Journal of Open Source Software Blog |
Arfon M. Smith
Subscribe Now: Apple, Spotify, YouTube, RSS Juliette Hayer joins Arfon and Abby to discuss Baargin, an open source tool she created to analyze bacterial genomes, especially those resistant to antibiotics.

Introducing the Rogue Scholar Advisory Board

https://doi.org/10.53731/9yf86-p8541
Published February 8, 2024 in Front Matter
Martin Fenner
In January 2024 the new Rogue Scholar Advisory Board had its first meeting. It consists of six people with diverse expertise in scholarly blogging. Advisory Board members come from different scholarly disciplines and geographic regions, write in several languages besides English, and have different levels of technical expertise.

Announcing Rogue Scholar Preview

https://doi.org/10.53731/xwxf3-92j40
Published February 12, 2024 in Front Matter
Martin Fenner
Today the Rogue Scholar science blog archive launched a new feature: Rogue Scholar Preview . This new functionality enables the import of new science blogs into the preview version of the production service, located at https://preview.rogue-scholar.org. This allows users to see how their blog posts will look like in the Rogue Scholar service, and to resolve issues if necessary.

How to use GROBID

https://doi.org/10.59350/hz1er-vrh59
Published February 12, 2024 in Stories by Research Graph on Medium
Research Graph
How to use GROBID to extract text from PDF Author: Aland Astudillo https://orcid.org/0009-0008-8672-3168 GROBID is a powerful and useful tool based on machine learning that can extract text information from PDF files and other files to a structured format. One of the key challenges in knowledge mining from academic articles is reading the content of PDF files.

INFORMATE: When Are the Data?

https://doi.org/10.54900/08pke-hyy45
Published February 13, 2024 in Upstream
Ted Habermann, Jamaica Jones, Howard Ratner, Tara Packer
In a recent Upstream blog post we explored where data connected to papers funded by several U.S. Federal Agencies are published. Different data sharing practices across these agencies led to very different distributions of datasets across various repositories. We used CHORUS reports that combine linked article and dataset metadata as input for that work.

Back to top