Rogue Scholar Digest February 28, 2024

digest

This is a summary of the Rogue Scholar blog posts published February 21 - February 27, 2024.

Author

Affiliation

Martin Fenner

Front Matter

Published

February 28, 2024

Code

import requests
import locale
import re
from typing import Optional
import datetime
from IPython.display import Markdown

locale.setlocale(locale.LC_ALL, "en_US")
baseUrl = "https://api.rogue-scholar.org/"
published_since = "2024-02-21"
published_until = "2024-02-27"
feature_image = 0
include_fields = "title,authors,published_at,summary,blog_name,blog_slug,doi,url,image"
url = (
    baseUrl
    + f"posts?&published_since={published_since}&published_until={published_until}&language=en&sort=published_at&order=asc&per_page=50&include_fields={include_fields}"
)
response = requests.get(url)
result = response.json()


def get_post(post):
    return post["document"]


def format_post(post):
    doi = post.get("doi", None)
    url = f"[{doi}]({doi})\n<br />" if doi else ""
    title = f"[{post['title']}]({doi})" if doi else f"[{post['title']}]({post['url']})"
    published_at = datetime.datetime.utcfromtimestamp(post["published_at"]).strftime(
        "%B %-d, %Y"
    )
    blog = f"[{post['blog_name']}](https://rogue-scholar.org/blogs/{post['blog_slug']})"
    author = ", ".join([f"{x['name']}" for x in post.get("authors", None) or []])
    summary = post["summary"]
    return f"### {title}\n{url}Published {published_at} in {blog}<br />{author}<br />{summary}\n"


posts = [get_post(x) for i, x in enumerate(result["hits"])]
posts_as_string = "\n\n".join([format_post(x) for x in posts])

def doi_from_url(url: str) -> Optional[str]:
    """Return a DOI from a URL"""
    match = re.search(
        r"\A(?:(http|https)://(dx\.)?(doi\.org|handle\.stage\.datacite\.org|handle\.test\.datacite\.org)/)?(doi:)?(10\.\d{4,5}/.+)\Z",
        url,
    )
    if match is None:
        return None
    return match.group(5).lower()

images = [x["image"] for x in posts if x.get("image", None) is not None]
image = images[feature_image]
markdown = f"![]({image})\n\n"
markdown += posts_as_string
Markdown(markdown)

Betting against the future

https://doi.org/10.59348/y2pn2-rs963
Published February 21, 2024 in Martin Paul Eve
Martin Paul Eve
I am tired of medical decisions with a trade-off. On a regular basis I am presented with decisions that have deferred negative consequences in order to fix something in the present. The two examples that spring to mind are the BK virus nephropathy and hip replacement surgery.

How reliable is the scholarly literature?

https://doi.org/10.59350/geexp-5zj30
Published February 21, 2024 in bjoern.brembs.blog
Björn Brembs
A few years ago, I came across a cartoon that seemed to capture a particular aspect of scholarly journal publishing quite well: The academic journal publishing system sure feels all too often a bit like a sinking boat.

Beautiful Code, Because We’re Worth It!

https://doi.org/10.59350/tbdps-5xc82
Published February 22, 2024 in rOpenSci - open tools for open science
Maëlle Salmon, Yanina Bellini Saibene
rOpenSci’s second cohort of champions was onboarded!Their training started with a session on code style, which we will summarize here in this post.Knowing more about code quality is relevant to all Champion projects, be it creating a new package, submitting a package to software review, or reviewing a package.This training session consisted of a talk and discussion, whereas the next package development training sessions will be more hands-on.Why

JOSSCast #4: Applying ML to Quantum Monte Carlo simulations – Nicolas Renaud on QMCTorch

https://doi.org/10.59349/qheph-wex21
Published February 22, 2024 in Journal of Open Source Software Blog |
Arfon M. Smith
Subscribe Now: Apple, Spotify, YouTube, RSS Nicolas Renaud joins Arfon and Abby to discuss QMCTorch, a PyTorch implementation of real-space Quantum Monte-Carlo simulations of molecular systems, and work to promote research software as a research output. Nico is the head of the Natural Sciences and Engineering section of the Netherlands eScience Center and Senior Researcher at the Quantum Application Lab.

rOpenSci News Digest, February 2024

https://doi.org/10.59350/3qdf9-xrq67
Published February 23, 2024 in rOpenSci - open tools for open science
The rOpenSci Team
Dear rOpenSci friends, it’s time for our monthly news roundup!

A bite-marked Apatosaurus pubis, in bone and in bronze

https://doi.org/10.59350/gaewv-x6f11
Published February 23, 2024 in Sauropod Vertebra Picture of the Week
Matt Wedel
MWC 861 in bone (left) and in bronze (right). Here’s a cool comparo.

The Comics Grid: New and Recent Articles (2023-2024)

https://doi.org/10.59350/477jz-wjx10
Published February 23, 2024 in Everything is Connected
Ernesto Priego
R ecently we announced there’s new content in the journal, corresponding to our 13th and 14th volumes.Both volumes include a variety of work by 13 international scholars with affiliations in academic institutions based in nine different countries. I’d like to pesonally thank every author, editorial board member and peer reviewer who contributed to making the publication of these articles possible.

AkademieNL’s new admin

https://doi.org/10.59350/ky2cy-1fk34
Published February 23, 2024 in Chris Hartgerink
Chris Hartgerink
The social media landscape is so different today from when I started participating around 2011. Things change, and that is okay — when the old dies, it becomes the compost for something new. For me, something new sprouted from Twitter’s compost beyond just joining Mastodon: I am taking over as the admin for the Mastodon server akademienl.social.

How learning evidence synthesis (systematic reviews etc) changed the way I search and some thoughts about semantic search as a complement search technique

https://doi.org/10.59350/n01x0-tk156
Published February 24, 2024 in Aaron Tay’s Musings about librarianship
Aaron Tay
I’ve spent a large part of my career as an academic librarian studying the question of discovery from many angles.

Motivation for an open-source flow battery

https://doi.org/10.59350/fknwy-k7a54
Published February 25, 2024 in Dual Power Supply
Kirk Pollard Smith
Towards the end of 2022 I drafted this, consider it a work in progress - it was before I had joined forces with Daniel to form the Flow Battery Research Collective Motivation for an open-source flow battery This project aims to develop an open-source flow battery design suitable for mid-scale manufacturing by a well-equipped hackerspace or conventional machine shop.

Data Citation – a snapshot of the chemical landscape.

https://doi.org/10.59350/vy0e7-fkc63
Published February 26, 2024 in Henry Rzepa’s Blog
Henry Rzepa
The recent release of the DataCite Data Citation corpus, which has the stated aim of providing “a trusted central aggregate of all data citations to further our understanding of data usage and advance meaningful data metrics” made me want to investigate what the current state of citing data in the area of chemistry might be. Chemistry is known to be a “data rich” science (as most of the physical sciences are) and here on this very blog I

Moving to open source

https://doi.org/10.59350/j026n-acb89
Published February 26, 2024 in Europe PMC News Blog
Maria Levchenko
Europe PMC POSI update – 2 years on Two years have sailed by since Europe PMC adopted the Principles of Open Scholarly Infrastructure (POSI) in February 2021. POSI is a set of guidelines for open scholarly infrastructure providers and outlines how these organisations should be run and sustained. It offers a framework to uphold transparency and accountability.

Rogue Scholar launches a free personal plan

https://doi.org/10.53731/bdrgt-cjh09
Published February 26, 2024 in Front Matter
Martin Fenner
The Rogue Scholar science blog archive today has launched a new free Personal Plan : similar to the Starter Plan launched last year, but only for personal (single-author) blogs, and with no limitations on the number of blog posts that can be archived and registered with a DOI per year.

Help make assertr better! Come close issues

https://doi.org/10.59350/gjjjr-1ka42
Published February 27, 2024 in rOpenSci - open tools for open science
Tony Fischetti, Maëlle Salmon
The package assertr maintained by Tony Fischetti, provides functionality to assert conditions that have to be met so that errors in data used in analysis pipelines can fail quickly.The provided functionality is similar to stopifnot() but more powerful, friendly, and easier for use in pipelines.Contributed to assertr!

Efficient Information Retrieval and Response Generation with Retrieval-Augmented Generation (RAG)

https://doi.org/10.59350/q2pq3-0fv85
Published February 27, 2024 in Stories by Research Graph on Medium
Research Graph
How to efficiently retrieve information for different applications Author: Wenyi Pi https://orcid.org/0009-0002-2884-2771 This article aims to explore various ways in which Retrieval-Augmented Generation (RAG) can be utilised to retrieve information and generate responses effectively within the dialogue system.

Understanding Retrieval Pitfalls: Challenges Faced by Retrieval Augmented Generation (RAG) models

https://doi.org/10.59350/wek8b-sbj84
Published February 27, 2024 in Stories by Research Graph on Medium
Research Graph
Improving the performance and application of Large Language Models Author: Amanda Kau https://orcid.org/0009-0004-4949-9284 Large language models (LLMs) like GPT-4, the engine of products like ChatGPT, have taken centre stage in recent years due to their astonishing capabilities. Yet, they are far from perfect.

RAG: The next big thing after LLMs?

https://doi.org/10.59350/9xds0-bcq66
Published February 27, 2024 in Stories by Research Graph on Medium
Research Graph
Improving the performance of Large Language Models Author: Dhruv Gupta ChatGPT, which first came out in late 2022, took the world by storm. Since then, various LLM models and LLM based products such as Meta’s Llama and Google’s Gemini have emerged, demonstrating the power of LLMs.

Hallucination in Large Language Models and Two Effective Alleviation Pathways

https://doi.org/10.59350/g73v1-rq310
Published February 27, 2024 in Stories by Research Graph on Medium
Research Graph
An Introduction to Retrieval Augmented Generation (RAG) and Knowledge Graph Author: Qingqin Fang(0009–0003–5348–4264) Introduction Large Language Models (LLMs) have transformed the landscape of natural language processing, demonstrating exceptional proficiency in generating text that closely resembles human language.

Refining AI Vision: How Retrieval-Augmented Generation Transforms Image Captioning in Large…

https://doi.org/10.59350/pn1ra-vpc97
Published February 27, 2024 in Stories by Research Graph on Medium
Research Graph
Refining AI Vision: How Retrieval-Augmented Generation Transforms Image Captioning in Large Language Models Leveraging External Knowledge to Enhance the Descriptive Capabilities of AI Systems Author: Vaibhav Khobragade (0009–0009–8807–5982) Introduction Large Language Models (LLMs) are artificial intelligence models that are trained on massive amounts of text data in order to generate human-like

Recent Advances in using Retrieving Multimodal Information for Augmented Generation

https://doi.org/10.59350/s5yn9-my222
Published February 27, 2024 in Stories by Research Graph on Medium
Research Graph
An Introduction to RA-CM3, MuRAG and RACE Author: Xuzeng He (0009-0005-7317-7426) Generative Artificial Intelligence (GAI) has demonstrated impressive performances in tasks such as text generation and text-to-image generation.

What are the challenges in integrating LLMs into organisations’ data workflows?

https://doi.org/10.59350/8y2q9-a4d89
Published February 27, 2024 in Stories by Amir Aryani on Medium
Amir Aryani
Integrating Large Language Models (LLMs) such as GPT into organizations’ data workflows is a complex process with various challenges. These obstacles include but are not limited to technical, operational, ethical, and legal dimensions, each presenting hurdles that organisations must navigate to harness the full potential of LLMs effectively.

Newsletter

Thank you!