Large language models

Articles and videos

Inside a radical new project to democratize AI — A group of over 1,000 AI researchers has created a multilingual large language model bigger than GPT-3—and they’re giving it out for free. by Melissa Heikkilä (July 12^th, 2022) ► The description of the BLOOM project: building a free LLM.
Ch(e)at GPT? - Computerphile by Mike Pound (February 16^th, 2023) ► Some researchers propose a hidden statistical signature for text generated by a large language model.
LLaMA: Open and Efficient Foundation Language Models (Paper Explained) by Yannic Kilcher (March 2^nd, 2023) ► Some comments on a LLaMA paper.
Glitch Tokens - Computerphile↑ by Robert Miles (March 3^rd, 2023) ► The problem of meaningless tokens learned by LLMs and resulting in crazy answers.
Emergent Abilities of Large Language Models — Emergence can be defined as the sudden appearance of novel behavior. Large Language Models apparently display emergence by suddenly gaining new abilities as they grow. Why does this happen, and what does this mean? by Ryan O’Connor (March 7^th, 2023) ► The emergent capabilities of LLM as they get larger and two possible explanations.
Baidu shares fall after Ernie AI chatbot demo disappoints — After demo, no one knows if Ernie can compete with ChatGPT. by Ryan McMorrow and Qianer Liu (March 16^th, 2023) ► The title says it all.
What's Up With Bard? 9 Examples + 6 Reasons Google Fell Behind [ft. Muse, Med-PaLM 2 and more] by Philip (March 22^nd, 2023) ► A comparison of Bard and GPT-4 and so hypotheses why Bard is bad.
Brace Yourself for a Tidal Wave of ChatGPT Email Scams — Thanks to large language models, a single scammer can run hundreds or thousands of cons in parallel, night and day, in every language under the sun. by Bruce Schneier and Barath Raghavan (April 3^rd, 2023) ► The authors claim that AI will help scammers because it will be possible to easily deal with many potential victims in parallel, but will these scams be really effective?
ChatGPT vs Google Bard: Which is better? We put them to the test. — We compare two top AI language models in seven categories to pick a winner. by Benj Edwards (April 5^th, 2023) ► How to easily write an article.
Why ChatGPT and Bing Chat are so good at making things up — A look inside the hallucinating artificial minds of the famous text prediction bots.↑ by Benj Edwards (April 6^th, 2023) ► A good basic explanation of how Chat LLMs work.
China slaps security reviews on AI products as Alibaba unveils ChatGPT challenger — Regulator warns AI-created content should embody "socialist values." by Ryan McMorrow and Nian Liu (April 11^th, 2023) ► The title says it all.
The mounting human and environmental costs of generative AI — Op-ed: Planetary impacts, escalating financial costs, and labor exploitation all factor. by Sasha Luccioni (April 12^th, 2023) ► Some problems with LLMs. There is nothing new here, but this is still a good overview.
“A really big deal”—Dolly is a free, open source, ChatGPT-style AI model — Dolly 2.0 could spark a new wave of fully open source LLMs similar to ChatGPT. by Benj Edwards (April 13^th, 2023) ► Databricks released Dolly 2.0, an open source LLM that can be used even in commercial products.
Stability AI launches StableLM, an open source ChatGPT alternative — StableLM's 3B and 7B models are available now on GitHub under CC 4.0 license. by Benj Edwards (April 24^th, 2023) ► Yet another open source LLM.
Understanding Parameter-Efficient LLM Finetuning: Prompt Tuning And Prefix Tuning by Sebastian Raschka (April 30^th, 2023) ► The title says it all.
Exploring ChatGPT vs open-source models on slightly harder tasks by Marco Túlio Ribeiro and Scott Lundberg (May 12^th, 2023) ► A comparison of ChatGPT 3.5, Vicuna, and MPT.
Big Tech Isn’t Prepared for A.I.’s Next Chapter — Open source is changing everything by Bruce Schneier and Jim Waldo (May 30^th, 2023) ► An analysis of the impact of open source LLMs.
Direct Preference Optimization: Forget RLHF (PPO)⇊ by "code_your_own_AI" (June 6^th, 2023) ► A description of the paper "(Direct Preference Optimization: Your Language Model is Secretly a Reward Model)", but the guy does not seem to understand what he is talking about.
De l'art superflu d'écrire des dissertations à l'heure de ChatGPT by Thibaut Giraud (June 10^th, 2023) ► Should we teach students to use LLM rather than asking them to still write dissertations.
Sarah Silverman sues OpenAI, Meta for being “industrial-strength plagiarists” — AI models allegedly trained on books copied from popular pirate e-book sites. by Ashley Belanger (July 10^th, 2023) ► Will the AI companies have to pay back for the data they illegally reaped from Internet?
Redditors prank AI-powered news mill with “Glorbo” in World of Warcraft — "Glorbo" isn't real, but a news-writing AI model didn't know it—and then it wrote about itself. by Benj Edwards (July 21^st, 2023) ► People start to trick news sites which are using AI to automatically generate articles.
A New Attack Impacts Major AI Chatbots—and No One Knows How to Stop It — Researchers found a simple way to make ChatGPT, Bard, and other chatbots misbehave, proving that AI is hard to tame. by Will Knight (August 1^st, 2023) ► Yet another LLM jail breaking.
Can Two AIs Play the TDD Pairing Game? by Roberto Ostinelli (August 16^th, 2023) ► Two AIs practising Ping-pong Programming.
Tiny Language Models Come of Age — To better understand how neural networks learn to simulate writing, researchers trained simpler versions on synthetic children’s stories. by Ben Brubaker (October 5^th, 2023) ► Some Microsoft researchers trained "small" models with child stories generated by GPT4, these models are able to generate stories.
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution (Paper Explained) by Yannic Kilcher (October 7^th, 2023) ► Yannic Kilcher is not convinced by "Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution", an experiment using an evolutionary algorithm to find better prompts.
Avoiding LLM hallucinations through analytical AI, is it possible? by Martin Deramecourt (October 25^th, 2023) ► The experience of a company evaluating the use an LLM to answer to customer queries.
↪LLM Performance Optimization with Nvidia GPUs from Scaleway: A Technical Study by Kevin Baude (October 27^th, 2023) ► Some information on running Llama-2 70B model using llama.cpp.
This is EXACTLY HOW some LLMs RANK TOP!!! by Abdul Majed Raja (November 9^th, 2023) ► A paper "Don't Make Your LLM an Evaluation Benchmark Cheater" states the obvious: leaking benchmark data in the training data will result in better benchmark scores.
[1hr Talk] Intro to Large Language Models↑ by Andrej Karpathy (November 23^rd, 2023) ► A good introduction and overview of LLMs.
"trust me", Google Bard REALLY launched a killer feature!!! by Abdul Majed Raja (November 24^th, 2023) ► Bard is now able to get information from YouTube captions.
Extracting Training Data from ChatGPT by Milad Nasr, Nicholas Carlini, Jon Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Eric Wallace, Florian Tramèr, and Katherine Lee (November 28^th, 2023) ► A summary of a research paper ("Scalable Extraction of Training Data from (Production) Language Models") studying training data extraction attacks and a basic explanation of patching an exploit vs. fixing a vulnerability.
↪Scalable Extraction of Training Data from (Production) Language Models (Paper Explained) by Yannic Kilcher (December 3^rd, 2023) ► Some comments about the paper.
Round 2: We test the new Gemini-powered Bard against ChatGPT — We run the models through seven categories to determine an updated champion. by Kyle Orland (December 8^th, 2023) ► An informal comparison of the new Bard (powered by Gemini), the old Bard (PaLM), ChatGPT 4, and ChatGPT 3.5.
Phi-2, Imagen-2, Optimus-Gen-2: Small New Models to Change the World? by Philip (December 13^th, 2023) ► Some information about Phi-2 and the problems with MMLU.
is this brilliance or accuracy leak?↓ by Abdul Majed Raja (December 15^th, 2023) ► As Abdul Majed Raja says himself, he is not enough competent to criticize this paper (TinyGSM: achieving > 80% on GSM8k with small language models).
Large Language Models: How Large is Large Enough? by Kip Yego (December 15^th, 2023) ► A basic comparison of larger and smaller LLMs.
I tried Eric Hartford's "Save the Kittens" prompt!!!↓ by Abdul Majed Raja (December 19^th, 2023) ► Some naive prompting…
Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained by Letitia Parcalabescu (December 22^nd, 2023) ► This description of the differences between DPO and RLHF is not enough detailed to understand how DPO really works.
Open Source LLMs with Simon Willison by Simon Willison, Bryan Cantrill, and Adam Leventhal (January 17^th, 2024) ► The current status of LLM, open-weight models, jail breaking, prompt injection…
You can get PAID $$$ for Building AI LLMs!!↓ by Abdul Majed Raja (January 31^st, 2024) ► A very unclear description of a reward mechanism for the best fine-tuned models.
AI Assistants with OPEN MODELS!!! by Abdul Majed Raja (February 2^nd, 2024) ► It is now possible to create assistants in HuggingChat.
This 21B LMM Beats Gemini Pro & GPT-3.5!!! (in Vision) by Abdul Majed Raja (February 13^th, 2024) ► A presentation and quick ’n dirty demonstration of Reka Flash.
The problem with this $50M Funded AI Startup!" by Abdul Majed Raja (February 29^th, 2024) ► Ola’s Kutrim, an Indian LLM, seems not so good…
AI Prompt Engineering Is Dead — Long live AI prompt engineering by Dina Genkina (March 6^th, 2024) ► At last, more people start to explain that "prompt engineering" is bullshit, autotuned prompts or, better, having LLM not requiring tuned prompts is the future.
22,000 H100s later, Inflection 2.5!!! by Abdul Majed Raja (March 7^th, 2024) ► Ye another model claiming to be near GPT4 level.
The GPT-4 barrier has finally been broken by Simon Willison (March 8^th, 2024) ► Some recent models claiming to be on par with GPT4 are arriving: Google Gemini 1.5, Mistral Large, Claude 3 Opus, and Inflection-2.5.
CANCELED GPT-4 After Talking to Claude 3 by Abdul Majed Raja (March 10^th, 2024) ► Is it time to replace using GPT 4 by Claude 3?
This NEW LLM "Learnt" to "THINK" BEFORE "TALK"ING!!! by Abdul Majed Raja (March 15^th, 2024) ► A presentation of "Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking" where the model is trained to generate rationales at each token to explain future text.
Releasing Common Corpus: the largest public domain dataset for training LLMs by Pierre-Carl Langlais (March 20^th, 2024) ► The release of Common Corpus, a very large corpus of multilingual and copyright-free texts.
Claude and ChatGPT for ad-hoc sidequests by Simon Willison (March 22^nd, 2024) ► A small example of using Claude 3 Opus and ChatGPT 4.
Inside the Creation of the World’s Most Powerful Open Source AI Model — Startup Databricks just released DBRX, the most powerful open source large language model yet—eclipsing Meta’s Llama 2. by Will Knight (March 27^th, 2024) ► Some basic information about the training of a foundation model.
A little guide to building Large Language Models in 2024↑ by Thomas Wolf (March 28^th, 2024) ► A good overview of the current technologies used to build an LLM.
I found this STUNNING Local Perplexity CLONE!!! by Abdul Majed Raja (April 8^th, 2024) ► A presentation of LLocalSearch, a search aggregator using LLMs.
You can't build a moat with AI — It's all about the data by Vikram Sreekanti and Joseph E. Gonzalez (April 11^th, 2024) ► The value of a system built on top of an LLM is not the model nor the prompt, but the data you provide to the model.
ChatGPT rêve-t-il de cavaliers électriques ?↑ by Thibaut Giraud and Mathieu Acher (April 14^th, 2024) ► It appears that gpt-3.5-turbo-instruct is able to correctly play chess. Some searchers have been able to get smaller LLMs to play Othello and chess, and discovered that the models have built an internal representation of the board.
How to convert PDF DOCX to Structured TXT Formats for RAG! (UNSTRUCTURED Tutorial)↓ by Abdul Majed Raja (April 16^th, 2024) ► A bad presentation of the unstructured library: a library to extract text from PDF, HTML, Word… documents.
Using and Finetuning Pretrained Transformers by Sebastian Raschka (April 20^th, 2024) ► A list of quickly described options to use and fine-tune a foundation LLM.
Llama 3 from Scratch?? 15T Tokens Data for you!!! by Abdul Majed Raja (April 22^nd, 2024) ► A huge open dataset is available: datasets/HuggingFaceFW/fineweb.
The NEW AI Models ARE A PROBLEM by Abdul Majed Raja (April 23^rd, 2024) ► Abdul Majed Raja is getting tired of the benchmark war. But his discourse is unclear, current LLM are not intelligent, they are only performing some kind of very powerful pattern matching, so we should not expect to get them performing real reasoning, we can only expect them to "remember" and "match" more knowledge.
New Microsoft AI model may challenge GPT-4 and Google Gemini — In project headed by former Inflection chief, MAI-1 may have 500B parameters. by Benj Edwards (May 6^th, 2024) ► Mustafa Suleyman is leading the create of Microsoft’s own large model.
How Good Are the Latest Open LLMs? And Is DPO Better Than PPO? — Discussing the Latest Model Releases and AI Research in April 2024 by Sebastian Raschka (May 12^th, 2024) ► A simplistic comparison of Mixtral 8x22B vs. Llama 3 vs. Phi-3, OpenELM, and a comparison of DPO and PPO.
WARNING: Bad News for CHATGPT!↓ by Abdul Majed Raja (May 28^th, 2024) ► A presentation, as bad as usual, of HuggingChat, a chat supporting tools.
Anthropic's Latest Winner - Workbench by Sam Witteveen (July 10^th, 2024) ► A presentation and a demo of Anthropic Workbench, a tool to generate and evaluate prompts.
Instruction Pretraining LLMs — The Latest Research in Instruction Finetuning by Sebastian Raschka (July 20^th, 2024) ► Generating an instruction dataset by providing empty prompts to Llama 3 8B, pretraining models with synthetised data containing raw texts and instruction-response pairs, and some information about Gemma 2.
[Own work] On Measuring Faithfulness or Self-consistency of Natural Language Explanations by Letitia Parcalabescu (July 26^th, 2024) ► Letitia Parcalabescu proposes a self-consistency measurement.
Anthropic's Prompt Engineering Interactive Tutorial by Simon Willison (August 30^th, 2024) ► Simon Willison presents some interesting information nuggets he found in Anthopic documentation.
Mission: Impossible language models – Paper Explained [ACL 2024 recording] by Letitia Parcalabescu (September 2^nd, 2024) ► A presentation of a paper ("Mission: Impossible Language Models") claiming to disprove Noam Chomsky’s claim that LLMs can learn languages that are possible and impossible for humans to learn.
I am a Strange Dataset: Metalinguistic Tests for Language Models – Paper Explained [🔴 at ACL 2024] by Letitia Parcalabescu (September 10^th, 2024) ► A short presentation of a dataset containing self-referencing sentences ("I am a Strange Dataset: Metalinguistic Tests for Language Models"). It appears that LLMs are bad at handling them.
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity↑ (⧉) by Dario Amodei and Lex Fridman (November 11^th, 2024) ► Dario Amodei describes his vision of LLMs, Amanda Askell explains how she helps defining Claude temperament, and Chris Olah explains Mechanistic Interpretability.
🔥 This CHANGES the REASONING Game!!!💥 Nous Forge Reasoning💥↓ by Abdul Majed Raja (November 12^th, 2024) ► A usual Abdul Majed Raja reading of an announcement: Nous’ Forge Reasoning API, yet another try to get better results by using Monte Carlo Tree Search, Chain of Code, and Mixture of Agents.
Small Language Models, Synthetic Data and Robotics at the opening of Web Summit 2024 by Thomas Wolf (November 15^th, 2024) ► Some thoughts about the interest of small language models.
New Pleias 1.0 LLMs trained exclusively on openly licensed data by Simon Willison (December 5^th, 2024) ► The title says it all.
A Deep Dive Into The RedPajama Datasets by Maurice Weber and Zain Hasan (December 6^th, 2024) ► Some information on how the RedPajama Dataseset has been built.
Things we learned about LLMs in 2024 by Simon Willison (December 31^st, 2024) ► A summary of the year.
How to OPTIMIZE your prompts for better Reasoning! by Sam Witteveen (January 9^th, 2025) ► A presentation of PromptWizard, a Microsoft open-source framework to optimise prompts.
LLM Lecture: A Deep Dive into Transformers, Prompts, and Human Feedback by Letitia Parcalabescu (January 19^th, 2025) ► A wide and good overview of how LLMs are implemented. But this is a lot of information in too little time. If you do not know about the matter, I guess you will have trouble to understand everything.
What is a Context Window? Unlocking LLM Secrets by Martin Keen (January 21^st, 2025) ► A very basic explanation of context.
Deep Dive into LLMs like ChatGPT↑ by Andrej Karpathy (February 5^th, 2025) ► A long, clear, and non technical description of how chat AI are built.
How I use LLMs by Andrej Karpathy (February 27^th, 2025) ► Andrej Karpathy describes his use of GenAI, he uses mostly OpenAI.
Will AI Ever Understand Language Like Humans? — AI may sound like a human, but that doesn’t mean that AI learns like a human. In this episode, Ellie Pavlick explains why understanding how LLMs can process language could unlock deeper insights into both AI and the human mind. by Ellie Pavlick, Steven Strogatz, and Janna Levin (May 1^st, 2025) ► There is nothing new in this interview, just generalities about LLMs.
Coding LLMs from the Ground Up: A Complete Course by Sebastian Raschka (May 10^th, 2025) ► Sebastian Raschka lists the videos of his "Build a Large Language Model (From Scratch)" series.
Trying out llama.cpp’s new vision support by Simon Willison (May 10^th, 2025) ► Simon Willison is experimenting with llama.cpp and unsloth/gemma-3-4b-it-GGUF.
How often do LLMs snitch? Recreating Theo’s SnitchBench with LLM by Simon Willison (May 31^st, 2025) ► Models can act as whistler blowers when being asked to apply their values and having access to communication tooks.
Faster LLMs: Accelerate Inference with Speculative Decoding by Isaac Ke (June 4^th, 2025) ► A description of speculative decoding.
Chatbot Arena
- A much better LLM Leaderboard!!! by Abdul Majed Raja (November 28^th, 2023) ► A presentation of Chatbot Arena.
- Chatbot Arena: New models & Elo system update by Wei-Lin Chiang, Tim Li, Joseph E. Gonzalez, and Ion Stoica (December 7^th, 2023) ► The title says it all.
- LMSYS Chatbot Arena: Live and Community-Driven LLM Evaluation (March 1^st, 2024) ► A presentation of Chatbot Arena by its authors.
- From Live Data to High-Quality Benchmarks: The Arena-Hard Pipeline by Tianle Li, Wei-Lin Chiang, Evan Frick, Lisa Dunlap, Banghua Zhu, Joseph E. Gonzalez, and Ion Stoica (April 19^th, 2024) ► A detailed description of Arena-Hard, a rather complex comparison mechanism trying to correctly evaluate and force differentiation in scoring chatbot.
- Introducing Hard Prompts Category in Chatbot Arena by Tianle Li and Wei-Lin Chiang (May 17^th, 2024) ► Some first results of Arena-Hard.
- The Multimodal Arena is Here! by Christopher Chou, Lisa Dunlap, Wei-Lin Chiang, Ying Sheng, Lianmin Zheng, Anastasios Angelopoulos, Trevor Darrell, Ion Stoica, and Joseph E. Gonzalez (June 27^th, 2024) ► Chatbot Arena now support images.
- RedTeam Arena: An Open-Source, Community-driven Jailbreaking Platform by Anastasios Angelopoulos, Lucas Vivona, Wei-Lin Chiang, Aryan Vichare, Lisa Dunlap, Salvivona, "Pliny", and Ion Stoica (September 13^th, 2024) ► RedTeam Arena tries to evaluate how difficult it is to jailbreaking models. It generates two leaderboards: one for the model, the other one for the gamers. But the first game is so basic that it has little value.
- WebDev: This FREE AI Coder BEATS V0, Bolt & Has 3.5 SONNET, GPT-4O & More FOR FREE! by "AICodeKing" (December 14^th, 2024) ► A presentation of WebDev Arena, an arena to benchmark models for web development tasks.
- WebDev Arena by Simon Willison (December 16^th, 2024) ► Simon Willison has extracted the system prompt of WebDev Arena.
- Understanding the recent criticism of the Chatbot Arena by Simon Willison (April 30^th, 2025) ► Large companies are gaming Chatbot Arena.
Model parameter extraction
- Stealing bit of GPT's Brain for $20?!!! (INSANE GOOGLE RESEARCH)⇊ by Abdul Majed Raja (March 12^th, 2024) ► A slow and very bad explanation of "Stealing Part of a Production Language Model" paper.
- Stealing Part of a Production LLM | API protects LLMs no more by Letitia Parcalabescu (April 8^th, 2024) ► A better explanation of the method used in the previous paper and in "Logits of API-Protected LLMs Leak Proprietary Information".
SGLang
- Achieving Faster Open-Source Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM) by Liangsheng Yin, Yineng Zhang, and Ying Sheng (July 25^th, 2024) ► LMSYS has created a new server to host chat and vision servers, and they are proud of its performance.
- SGLang v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision (September 4^th, 2024) ► The title says it all.
- SGLang v0.4: Zero-Overhead Batch Scheduler, Cache-Aware Load Balancer, Faster Structured Outputs (December 4^th, 2024) ► The title says it all.
- Deploying DeepSeek with PD Disaggregation and Large-Scale Expert Parallelism on 96 H100 GPUs (May 5^th, 2025) ► A detailed and very technical description of the SGLang support of DeepSeek and achieved performance.
OpenAssistant
- OpenAssistant - ChatGPT's Open Alternative (We need your help!)↓ by Yannic Kilcher (February 4^th, 2023) ► Yannic Kilcher is trying to motivate persons to contribute training data to his project, a kind of volunteer Mechanical Turk.
- OpenAssistant First Models are here! (Open-Source ChatGPT) by Yannic Kilcher (April 7^th, 2023) ► The title says it all.
- OpenAssistant RELEASED! The world's best open-source Chat AI! by Yannic Kilcher (April 15^th, 2023) ► The title says it all.
- OpenAssistant is Completed by Yannic Kilcher (October 24^th, 2023) ► The project is ending.