Rogue Scholar Digest May 8, 2024

digest

This is a summary of the Rogue Scholar blog posts published April 24 - May 7, 2024.

Author

Affiliation

Martin Fenner

Front Matter

Published

May 8, 2024

Code

import requests
import locale
import re
from typing import Optional
import datetime
from IPython.display import Markdown

locale.setlocale(locale.LC_ALL, "en_US")
baseUrl = "https://api.rogue-scholar.org/"
published_since = "2024-04-24"
published_until = "2024-05-07"
feature_image = 1
include_fields = "title,authors,published_at,summary,blog_name,blog_slug,doi,url,image"
url = (
    baseUrl
    + f"posts?&published_since={published_since}&published_until={published_until}&language=en&sort=published_at&order=asc&per_page=50&include_fields={include_fields}"
)
response = requests.get(url)
result = response.json()


def get_post(post):
    return post["document"]


def format_post(post):
    doi = post.get("doi", None)
    url = f"[{doi}]({doi})\n<br />" if doi else ""
    title = f"[{post['title']}]({doi})" if doi else f"[{post['title']}]({post['url']})"
    published_at = datetime.datetime.utcfromtimestamp(post["published_at"]).strftime(
        "%B %-d, %Y"
    )
    blog = f"[{post['blog_name']}](https://rogue-scholar.org/blogs/{post['blog_slug']})"
    author = ", ".join([f"{x['name']}" for x in post.get("authors", None) or []])
    summary = post["summary"]
    return f"### {title}\n{url}Published {published_at} in {blog}<br />{author}<br />{summary}\n"


posts = [get_post(x) for i, x in enumerate(result["hits"])]
posts_as_string = "\n\n".join([format_post(x) for x in posts])

def doi_from_url(url: str) -> Optional[str]:
    """Return a DOI from a URL"""
    match = re.search(
        r"\A(?:(http|https)://(dx\.)?(doi\.org|handle\.stage\.datacite\.org|handle\.test\.datacite\.org)/)?(doi:)?(10\.\d{4,5}/.+)\Z",
        url,
    )
    if match is None:
        return None
    return match.group(5).lower()

images = [x["image"] for x in posts if x.get("image", None) is not None]
image = images[feature_image]
markdown = f"![]({image})\n\n"
markdown += posts_as_string
Markdown(markdown)

An update on the Scholars on Twitter dataset

https://doi.org/10.59350/abapf-y4f53
Published April 24, 2024 in Leiden Madtrics
Philippe Mongeon, Timothy D. Bowman, Rodrigo Costas, Wenceslao Arroyo-Machado
Introduction On August 21, 2022, we made available the first version of our dataset of scholars on Twitter created with two open data sources: Crossref Event Data and OpenAlex.

Automating data exports from Apple Health

https://doi.org/10.59350/cap2n-agh49
Published April 24, 2024 in Bastian Greshake Tzovaras
Bastian Greshake Tzovaras
The dynamic footer of my website has been powered by a little aggregation of some of my personal data for about 5 1/2 years by now. Until recently, all the data related to my activity and physiology (steps, heart rate, sleep, body) came from an Oura Ring.

Markov Chain Monte What?

https://doi.org/10.59350/mxfyk-6av39
Published April 25, 2024 in Bayesically Speaking
Matías Castillo-Aguilar
Introduction Alright, folks, let’s dive into the wild world of statistics and data science! Picture this: you’re knee-deep in data, trying to make sense of the chaos.

The courage to discuss

https://doi.org/10.59350/bjh0z-gw219
Published April 25, 2024 in Chris Hartgerink
Chris Hartgerink
I want to recommit to writing, to recommit to actively and publicly think about what is happening. I want to recommit to the idea that thoughts are dynamic and never settled — that thinking in public helps move away from sharing only finalized arguments. Thoughts are produced and reproduced through the conversations we have, be it directly on the phone or indirectly through writing and reading.

DNA Day Launch for Hong Kong’s Moonshot for Biology

https://doi.org/10.59350/eemcp-h8f94
Published April 25, 2024 in GigaBlog
Scott Edmunds
The first emblematic species sequenced by the Hong Kong Biodiversity Genomics Consortium are published to coincide with International DNA Day.

Molecular Symmetry Analysis Made Easy

https://doi.org/10.59350/s4ctj-7k667
Published April 25, 2024 in Corin Wagen
Corin Wagen
Pure mathematics has all sorts of unexpected connections to other fields, and chemistry is no exception.

The MHONGOOSE survey of atomic gas in and around galaxies

https://doi.org/10.59350/686jg-5dh08
Published April 26, 2024 in Triton Station
Stacy McGaugh
I have been spending a lot of time lately writing up a formal paper on high redshift galaxies, so haven’t had much time to write here. The paper is a lot more involved than I told you so, but yeah, I did. Repeatedly. I do have a start on a post on self-interacting dark matter that I hope eventually to get back to. Today, I want to give a quick note about the MHONGOOSE survey. But first, a non-commercial interruption.

Five Years of Economics from the Top Down

https://doi.org/10.59350/8a58k-yg969
Published April 27, 2024 in Economics from the Top Down
Blair Fix
My how time flies. As of April 11th, 2024, I’ve been blogging for five years. To celebrate, I thought I’d engage in some obligatory naval gazing. Why blog? I started this blog on a whim. In the spring of 2019, I was one year post PhD and busy publishing pieces of my dissertation. It was about as much fun as licking sandpaper. The problem, I now realize, is that I hate academic writing.

Atlantal ribs of the Carnegie Diplodocus, Moscow and Vienna casts

https://doi.org/10.59350/0ezp4-a1h55
Published April 27, 2024 in Sauropod Vertebra Picture of the Week
Mike Taylor
Eighteen months ago, I noted that the Carnegie Museum’s Diplodocus mount has no atlantal ribs (i.e. ribs of the first cervical vertebra, the atlas). But that the Paris cast has long atlantal ribs — so long the extend past the posterior end of the axis. There were two especially provocative comments to that post. First, Konstantin linked to a photo of the Russian cast (first mounted in St. Petersburg but currently residing in Moscow).

Large Language Models for Code Writing: Security Assessment

https://doi.org/10.59350/c9qeh-m6z87
Published April 28, 2024 in Stories by Research Graph on Medium
Xuzeng He
Latest effort in assessing the security of the code generated by large language models Author · Xuzeng He ( ORCID: 0009–0005–7317–7426) Introduction With the surge of Large Language Models (LLMs) nowadays, there is a rising trend among developers to use Large Language Models to assist their daily code writing. Famous products include GitHub Copilot or simply ChatGPT.

“I put the ways of childhood behind me” — my remembrance of Dan Dennett

https://doi.org/10.59350/17mxp-ng136
Published April 29, 2024 in Quintessence of Dust
Stephen Matheson
For five years through 2018, our humanist community, the Humanist Hub*, met every Sunday afternoon at our suite in Harvard Square for fellowship, music, and a speaker. Our advisory board included luminaries of humanism such as Rebecca Goldstein, Steven Pinker, and Dan Dennett. These friends of the organization regularly spoke at Humanist Hub events.

Commonmeta grows up

https://doi.org/10.53731/zkrxq-mj859
Published April 29, 2024 in Front Matter
Martin Fenner
The Commonmeta standard for scholarly metadata continues towards version 1.0 with some important changes in version v0.14, released this week. And metadata for all DOIs from Crossref and DataCite can now be retrieved in commonmeta format via a new web service.

Multimodal Large Language Models for Misinformation Detection and Reasoning

https://doi.org/10.59350/mtep9-gwy69
Published April 29, 2024 in Stories by Research Graph on Medium
Wenyi Pi
Exploring innovative Strategies in Combating Misinformation with Enhanced Multimodal Understanding Author Wenyi Pi ( ORCID : 0009–0002–2884–2771) Introduction Misinformation refers to false or inaccurate information that is often given to someone in a deliberate attempt to make them believe something that is not true. This has a significantly negative impact on public health, political stability and social trust and harmony.

RAG 2.0 is Coming?

https://doi.org/10.59350/6frhg-zxp80
Published April 30, 2024 in Stories by Research Graph on Medium
Qingqin Fang
A Unified and Collaborative Framework for LLM Author · Qingqin Fang ( ORCID: 0009–0003–5348–4264) Introduction In today’s rapidly evolving field of artificial intelligence, large language models (LLMs) are demonstrating unprecedented potential. Particularly, the Retrieval-Augmented Generation (RAG) architecture has become a hot topic in AI technology due to its unique technical capabilities.

RNNs vs GRUs vs LSTMs

https://doi.org/10.59350/t6mga-7zd77
Published April 30, 2024 in Stories by Research Graph on Medium
Dhruv Gupta
The Three Oldest Pillars of NLP Author Dhruv Gupta ( ORCID : 0009–0004–7109–5403) Introduction Natural Language Processing (NLP) has almost become synonymous with Large Language Models (LLMs), Generative AI, and fancy chatbots. With the ever-increasing amount of textual data and exponential growth in computational knowledge, these models are improving every day.

Detecting anomeric effects in tetrahedral boron bearing four oxygen substituents.

https://doi.org/10.59350/dybzk-cs537
Published April 30, 2024 in Henry Rzepa’s Blog
Henry Rzepa
In an earlier post, I discussed[1] a phenomenon known as the “anomeric effect” exhibited by tetrahedral carbon compounds with four C-O bonds. Each oxygen itself bears two bonds and has two lone pairs, and either of these can align with one of three other C-O bonds to generate an anomeric effect. Here I change the central carbon to a boron to explore what happens, as indeed I promised earlier.

The Brachiosaurus altithorax holotype FMNH PR 25107 in the ground

https://doi.org/10.59350/tfx7z-p1v71
Published May 1, 2024 in Sauropod Vertebra Picture of the Week
Mike Taylor
I was cleaning out my Downloads directory — which, even after my initial forays, still accounts for 11 Gb that I really need to reclaim from my perptually almost-full SSD. And I found this beautiful image under the filename csgeo4028.jpeg . Brachiosaurus altithorax holotype FMNH PR 25107 during excavation. The thing is, I have no idea where this image came from.

JOSSCast #9: Reproducibility in Neuroscience – Mats van Es on FieldTrip reproducescript

https://doi.org/10.59349/4pk73-a2h25
Published May 2, 2024 in Journal of Open Source Software Blog |
Arfon M. Smith
Subscribe Now: Apple, Spotify, YouTube, RSS Mats van Es joins Arfon and Abby to discuss reproducible science and the functionality he added to FieldTrip, a MATLAB software toolbox for analyzing brain imaging data. Mats is a cognitive neuroscientist at the University of Oxford. You can follow Mats on Twitter/X @mats_van_es.

Calculating birthday probabilities with R instead of math

https://doi.org/10.59350/r419r-zqj73
Published May 3, 2024 in Andrew Heiss’s blog
Andrew Heiss
Even though I’ve been teaching R and statistical programming since 2017, and despite the fact that I do all sorts of heavily quantitative research, I’m really really bad at probability math . Like super bad. The last time I truly had to do set theory and probability math was in my first PhD-level stats class in 2012.

The Flowdown #1

Published May 3, 2024 in Dual Power Supply
Kirk Pollard Smith
I sip slowly from the information firehouse that is my cornucopia of industry newsletters, RSS feeds, and Google Scholar updates on new publications in the flow battery landscape. There’s not enough time to give everything a proper read, and often if I see something useful I just file it away in Zotero for it to collect digital dust.

The Potter Creek Brachiosaurus humerus, in various states of repair

https://doi.org/10.59350/rhh6a-m8195
Published May 5, 2024 in Sauropod Vertebra Picture of the Week
Mike Taylor
As iconic as Brachiosaurus altithorax is, it’s known from surprisingly little material.

Archiving scholarly blogs with Rogue Scholar

https://doi.org/10.59350/pp903-gve38
Published May 6, 2024 in Bastian Greshake Tzovaras
Bastian Greshake Tzovaras
The pews of the Internet Archive back in 2018. tl;dr: Posts on this blog are now automatically archived, indexed and full-text searchable through The Rogue Scholar . The jury might still be out on whether the small or indie web will make a comeback, but I’ve personally enjoyed posting more on my blog here in recent months.

Are Large Language Models Our Allies or Enemies in the Fight Against Fake News?

https://doi.org/10.59350/st0jr-ad818
Published May 7, 2024 in Stories by Research Graph on Medium
Amanda Kau
Large Language Models for Fake News Generation and Detection Author Amanda Kau ( ORCID : 0009–0004–4949–9284) Introduction In recent years, fake news has become an increasing concern for many, and for good reason. Newspapers, which we once trusted to deliver credible news through accountable journalists, are vanishing en masse along with their writers.

Transformers Models in NLP

https://doi.org/10.59350/c7nrg-xay43
Published May 7, 2024 in Stories by Research Graph on Medium
Dhruv Gupta
Attention mechanism not getting enough attention Author Dhruv Gupta ( ORCID : 0009–0004–7109–5403) Introduction As discussed in this article, RNNs were incapable of learning long-term dependencies. To solve this issue both LSTMs and GRUs were introduced. However, even though LSTMs and GRUs did a fairly decent job for textual data they did not perform well.

Fine-tuning Large Language Models: A Brief Introduction

https://doi.org/10.59350/1aezq-kk827
Published May 7, 2024 in Stories by Research Graph on Medium
Xuzeng He
Supervised Fine-tuning, Reinforcement Learning from Human Feedback and the latest SteerLM Author · Xuzeng He ( ORCID: 0009–0005–7317–7426) Introduction Large Language Models (LLMs), usually trained with extensive text data, can demonstrate remarkable capabilities in handling various tasks with state-of-the-art performance. However, people nowadays typically want something more personalised instead of a general solution.

Three Paradigms of RAG

https://doi.org/10.59350/5j7tt-5y328
Published May 7, 2024 in Stories by Research Graph on Medium
Vaibhav Khobragade
From Naive to Modular: Tracing the Evolution of Retrieval-Augmented Generation Author · Vaibhav Khobragade ( ORCID: 0009–0009–8807–5982) Introduction Large Language Models (LLMs) have achieved remarkable success.

Brief Introduction to the History of Large Language Models (LLMs)

https://doi.org/10.59350/m4c7t-epg97
Published May 7, 2024 in Stories by Research Graph on Medium
Wenyi Pi
Understanding the Evolutionary Journey of LLMs Author Wenyi Pi ( ORCID : 0009–0002–2884–2771) Introduction When we talk about large language models (LLMs), we are actually referring to a type of advanced software that can communicate in a human-like manner. These models have the amazing ability to understand complex contexts and generate content that is coherent and has a human feel.

The longer the context, the better? Unlimited Context Length in Megalodon

https://doi.org/10.59350/dx6a6-yy475
Published May 7, 2024 in Stories by Research Graph on Medium
Qingqin Fang
An improvement architecture superior to the Transformer, proposed by Meta Author · Qingqin Fang ( ORCID: 0009–0003–5348–4264) Introduction Recently, researchers from Meta and the University of Southern California have introduced a model called Megalodon. They claim that this model can expand the context window of language models to handle millions of tokens without overwhelming your memory.

Newsletter

Thank you!