Traces on Non-Rhodian Shores: LLM Papers that I have been recommended

Wednesday, May 8, 2024

These are papers that I have been recommended with their links. Who knows if they are as good for the Humanities as some people think?

Edward Hu et al's LoRA: Low-Rank Adaptation of LLMs (Oct 2021).
The so-called Chinchilla paper. (i.e. Jordan Hoffmann et al, Training Compute-Optimal Large Language Models, Mar 2022); also available at Papers with Code.
Harm DeVries, Go Smol or go Home blog-post discussing the chinchilla paper.
Lewis Tunstall et al (October 2023), describing the Zephyr model, a smaller LM "attuned to user intent"
Suriya Gunisekar et al's Textbooks Are All you Need (June 2023) which describes the phi-1 system that uses higher quality tokens to train smaller models that still perform well (plus synthetic data generated by ChatGPT 3.5)
Dohnmatob et al's Model Collapse Demystified (February 2024) that explains the problem when training sets reuse their own outputs---giving rise to some of the strategies that DeVries discusses in the README of the bigcode-dataset project.
Nilesh Barla's blog post on how to train a custom embedding LLM model given the Zephyr model to help generate training data (April 2024).

And if all of these papers make you want to try stuff, consider RunPod ....

Traces on Non-Rhodian Shores