Rogue Scholar Digest May 22, 2024


This is a summary of the Rogue Scholar blog posts published May 16 - May 21, 2024.


Martin Fenner

Front Matter


May 22, 2024

import requests
import locale
import re
from typing import Optional
import datetime
from IPython.display import Markdown

locale.setlocale(locale.LC_ALL, "en_US")
baseUrl = ""
published_since = "2024-05-16"
published_until = "2024-05-21"
feature_image = 1
include_fields = "title,authors,published_at,summary,blog_name,blog_slug,doi,url,image"
url = (
    + f"posts?&published_since={published_since}&published_until={published_until}&language=en&sort=published_at&order=asc&per_page=50&include_fields={include_fields}"
response = requests.get(url)
result = response.json()

def get_post(post):
    return post["document"]

def format_post(post):
    doi = post.get("doi", None)
    url = f"[{doi}]({doi})\n<br />" if doi else ""
    title = f"[{post['title']}]({doi})" if doi else f"[{post['title']}]({post['url']})"
    published_at = datetime.datetime.utcfromtimestamp(post["published_at"]).strftime(
        "%B %-d, %Y"
    blog = f"[{post['blog_name']}]({post['blog_slug']})"
    author = ", ".join([f"{x['name']}" for x in post.get("authors", None) or []])
    summary = post["summary"]
    return f"### {title}\n{url}Published {published_at} in {blog}<br />{author}<br />{summary}\n"

posts = [get_post(x) for i, x in enumerate(result["hits"])]
posts_as_string = "\n\n".join([format_post(x) for x in posts])

def doi_from_url(url: str) -> Optional[str]:
    """Return a DOI from a URL"""
    match =
    if match is None:
        return None

images = [x["image"] for x in posts if x.get("image", None) is not None]
image = images[feature_image]
markdown = f"![]({image})\n\n"
markdown += posts_as_string

JOSSCast #10: Defect Structure Searching – Irea Mosquera-Lois and Seán Kavanagh on ShakeNBreak
Published May 16, 2024 in Journal of Open Source Software Blog |
Arfon M. Smith
Subscribe Now: Apple, Spotify, YouTube, RSS Irea Mosquera-Lois and Seán Kavanagh join Arfon and Abby to discuss releasing software based on important research observations, earning a PhD, and building ShakeNBreak, a defect structure searching method that better identifies low-energy structures. Irea is a PhD Student at Imperial College London. Seán is an Environmental Fellow at Harvard University.

The integration of large language models (LLMs) with Neo4j-based knowledge graphs
Published May 16, 2024 in Stories by Research Graph on Medium
Wenyi Pi
Enhancing Data Interactivity with LLMs and Neo4j Knowledge Graphs Author Wenyi Pi ( ORCID : 0009–0002–2884–2771) Introduction Since OpenAI launched ChatGPT, a large language model (LLM) based chatbot, in 2023, it has set off a technological wave.

What’s the Content of Fact-checks and Misinformation in Germany?

Published May 16, 2024 in Elephant in the Lab
Sascha Schönig
2024 is a year of elections: USA and EU but there are also local elections in Germany for which current polls predict strong increases for the far-right. Misinformation is a constant concern regarding these elections.

Exploring Methanetriol – “the Formation of an Impossible Molecule”
Published May 16, 2024 in Henry Rzepa’s Blog
Henry Rzepa
What constitutes an “impossible molecule”? Well, here are two, the first being the topic of a recent article[1]. The second is a favourite of organic chemistry tutors, to see if their students recognise it as an unusual (= impossible) form of a much better known molecule. Perhaps we could define impossible molecules into two slightly different classes.

Communication Tips for your Open-Source Project
Published May 17, 2024 in rOpenSci - open tools for open science
Maëlle Salmon
Do you maintain an open-source project like an R package or a collection thereof, and wonder how to best use various communication channels to inform and engage with your community of users?We’ve consolidated this list of tips.Some of them are required in our opinion, others are simply nice to have.Required: Having good release notes Since you’re developing a product, the first act of communication is to write informative release notes.Release

CompactifAI: Large Language Models Don’t Actually Have To Be Large
Published May 17, 2024 in Stories by Research Graph on Medium
Amanda Kau
A novel compression technique ensuring comparable performance with 70% less parameters Author Amanda Kau ( ORCID : 0009–0004–4949–9284) Introduction The sizes of large language models (LLMs) have been steadily increasing over the last few years.

New Editorial Board Members
Published May 17, 2024 in re3data COREF Project Blog
re3data Team
The re3data Editorial Board is pleased to welcome seven new members: Dalal Hakim Rahme, Coordinator of Content Curation, United Nations Rene Faustino Gabriel Junior, Professor, Universidade Federal do Rio Grando do Sul Sandra Gisela Martín, Library Director, Universidad Católica de Córdoba Tekleweyni Geday, Lecturer, Mekelle University Theodora Bloom, Executive Editor, BMJ Vaidas Morkevičius, Professor, Lithuanian Data Archive for Social

cdk2024 #2: publishing grant proposals
Published May 18, 2024 in chem-bla-ics
Egon Willighagen
Publishing grant proposal is still not very common. The proposal published in Research Ideas and Outcomes) (doi:10.3897/rio.10.e124884) for the NWO Open Science grant for the CDK is, however, not the first and hopefully not the last. Interestingly, it is already cited in (the German) Wikipedia. It is used there to support a statement which tools use the Chemistry Development Kit.

A Tour of the Jevons Paradox: How Energy Efficiency Backfires
Published May 18, 2024 in Economics from the Top Down
Blair Fix
Your browser does not support the audio tag. Download: PDF | EPUB | MP3 | WATCH VIDEO [R]esource productivity can — and should — grow fourfold. … Thus we can live twice as well — yet use half as much. — Factor Four , 1997 When it comes to our sustainability problems, striving for greater resource efficiency seems like an obvious solution.

New paper: From papers to RDF-based integration of physicochemical data and adverse outcome pathways for nanomaterials
Published May 20, 2024 in chem-bla-ics
Egon Willighagen
Making something FAIR is hard, particularly when you do more than making something findable. We’ve seen before that making something usefully findable requires deep indexing, and already that continues to be difficult, because we are not seeing it enough. So, when I thought convert a paper led by Hoet’s lab in Leuven into machine-actionable RDF to make it FAIR, I gravely underestimated the amount of work.

Possible Formation of an Impossible Molecule?
Published May 20, 2024 in Henry Rzepa’s Blog
Henry Rzepa
In the previous post, I explored the so-called “impossible” molecule methanetriol. It is regarded as such because the equilbrium resulting in loss of water is very facile, being exoenergic by ~14 kcal/mol in free energy. Here I explore whether changing the substituent R could result in suppressing the loss of water and stabilising the triol.

Publishing Citizen Science Data: Q&A with the Hong Kong Jellyfish Project
Published May 20, 2024 in GigaBlog
Scott Edmunds
Today we publish a new Data Release presenting a dataset of jellyfish sightings collected by citizen scientists from 2021 through 2023 within Hong Kong waters. This is the first example where our curation team have worked with a Citizen Science project to share their observations in the GBIF biodiversity database.

Variation, a cool glass, and my Tate talk
Published May 21, 2024 in Sauropod Vertebra Picture of the Week
Matt Wedel
1. VARIATION You know what’s variable? Apatosaur cervicals. Top: NSMT-PV 20375, cervical 7 in anterior and left lateral views (Upchurch et al. 2005). Middle: YPM 1861, cervical ?13, in posterior and left lateral views (Ostrom & McIntosh 1966). Bottom: YPM 1980, cervical 8 in anterior and left lateral views (Ostrom & McIntosh 1966). An anatomical variant that shows up in 1 in 500 or 1 in 1000 humans is by medical standards pretty common;

Retrieval Augmented Generation and academic search engines - some suggestions for system builders
Published May 21, 2024 in Aaron Tay’s Musings about librarianship
Aaron Tay
As academic search engines and databases incorporate the use of generative AI into their systems, an important concept that all librarian should grasp is that of retrieval augmented generation (RAG).   You see it in use in all sorts of “AI products” today from chatbots like Bing Copilot, to Adobe’s Acrobat Ai assistant that allow you to chat with your PDF.

Back to top