Articles and videos
-
Inside a radical new project to democratize AI — A group of over 1,000 AI researchers has created a multilingual large language model bigger than GPT-3—and they’re giving it out for free. by Melissa Heikkilä (July 12th, 2022) ► The description of the BLOOM project: building a free LLM.
-
Ch(e)at GPT? - Computerphile by Mike Pound (February 16th, 2023) ► Some researchers propose a hidden statistical signature for text generated by a large language model.
-
LLaMA: Open and Efficient Foundation Language Models (Paper Explained) by Yannic Kilcher (March 2nd, 2023) ► Some comments on a LLaMA paper.
-
Glitch Tokens - Computerphile↑ by Robert Miles (March 3rd, 2023) ► The problem of meaningless tokens learned by LLMs and resulting in crazy answers.
-
Emergent Abilities of Large Language Models — Emergence can be defined as the sudden appearance of novel behavior. Large Language Models apparently display emergence by suddenly gaining new abilities as they grow. Why does this happen, and what does this mean? by Ryan O’Connor (March 7th, 2023) ► The emergent capabilities of LLM as they get larger and two possible explanations.
-
Baidu shares fall after Ernie AI chatbot demo disappoints — After demo, no one knows if Ernie can compete with ChatGPT. by Ryan McMorrow and Qianer Liu (March 16th, 2023) ► The title says it all.
-
What's Up With Bard? 9 Examples + 6 Reasons Google Fell Behind [ft. Muse, Med-PaLM 2 and more] by Philip (March 22nd, 2023) ► A comparison of Bard and GPT-4 and so hypotheses why Bard is bad.
-
Brace Yourself for a Tidal Wave of ChatGPT Email Scams — Thanks to large language models, a single scammer can run hundreds or thousands of cons in parallel, night and day, in every language under the sun. by Bruce Schneier and Barath Raghavan (April 3rd, 2023) ► The authors claim that AI will help scammers because it will be possible to easily deal with many potential victims in parallel, but will these scams be really effective?
-
ChatGPT vs Google Bard: Which is better? We put them to the test. — We compare two top AI language models in seven categories to pick a winner. by Benj Edwards (April 5th, 2023) ► How to easily write an article.
-
Why ChatGPT and Bing Chat are so good at making things up — A look inside the hallucinating artificial minds of the famous text prediction bots.↑ by Benj Edwards (April 6th, 2023) ► A good basic explanation of how Chat LLMs work.
-
China slaps security reviews on AI products as Alibaba unveils ChatGPT challenger — Regulator warns AI-created content should embody "socialist values." by Ryan McMorrow and Nian Liu (April 11th, 2023) ► The title says it all.
-
The mounting human and environmental costs of generative AI — Op-ed: Planetary impacts, escalating financial costs, and labor exploitation all factor. by Sasha Luccioni (April 12th, 2023) ► Some problems with LLMs. There is nothing new here, but this is still a good overview.
-
“A really big deal”—Dolly is a free, open source, ChatGPT-style AI model — Dolly 2.0 could spark a new wave of fully open source LLMs similar to ChatGPT. by Benj Edwards (April 13th, 2023) ► Databricks released Dolly 2.0, an open source LLM that can be used even in commercial products.
-
Stability AI launches StableLM, an open source ChatGPT alternative — StableLM's 3B and 7B models are available now on GitHub under CC 4.0 license. by Benj Edwards (April 24th, 2023) ► Yet another open source LLM.
-
Understanding Parameter-Efficient LLM Finetuning: Prompt Tuning And Prefix Tuning by Sebastian Raschka (April 30th, 2023) ► The title says it all.
-
Exploring ChatGPT vs open-source models on slightly harder tasks by Marco Túlio Ribeiro and Scott Lundberg (May 12th, 2023) ► A comparison of ChatGPT 3.5, Vicuna, and MPT.
-
Big Tech Isn’t Prepared for A.I.’s Next Chapter — Open source is changing everything by Bruce Schneier and Jim Waldo (May 30th, 2023) ► An analysis of the impact of open source LLMs.
-
Direct Preference Optimization: Forget RLHF (PPO)⇊ by "code_your_own_AI" (June 6th, 2023) ► A description of the paper "(Direct Preference Optimization: Your Language Model is Secretly a Reward Model)", but the guy does not seem to understand what he is talking about.
-
De l'art superflu d'écrire des dissertations à l'heure de ChatGPT by Thibaut Giraud (June 10th, 2023) ► Should we teach students to use LLM rather than asking them to still write dissertations.
-
Sarah Silverman sues OpenAI, Meta for being “industrial-strength plagiarists” — AI models allegedly trained on books copied from popular pirate e-book sites. by Ashley Belanger (July 10th, 2023) ► Will the AI companies have to pay back for the data they illegally reaped from Internet?
-
Redditors prank AI-powered news mill with “Glorbo” in World of Warcraft — "Glorbo" isn't real, but a news-writing AI model didn't know it—and then it wrote about itself. by Benj Edwards (July 21st, 2023) ► People start to trick news sites which are using AI to automatically generate articles.
-
A New Attack Impacts Major AI Chatbots—and No One Knows How to Stop It — Researchers found a simple way to make ChatGPT, Bard, and other chatbots misbehave, proving that AI is hard to tame. by Will Knight (August 1st, 2023) ► Yet another LLM jail breaking.
-
Can Two AIs Play the TDD Pairing Game? by Roberto Ostinelli (August 16th, 2023) ► Two AIs practising Ping-pong Programming.
-
Tiny Language Models Come of Age — To better understand how neural networks learn to simulate writing, researchers trained simpler versions on synthetic children’s stories. by Ben Brubaker (October 5th, 2023) ► Some Microsoft researchers trained "small" models with child stories generated by GPT4, these models are able to generate stories.
-
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution (Paper Explained) by Yannic Kilcher (October 7th, 2023) ► Yannic Kilcher is not convinced by "Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution", an experiment using an evolutionary algorithm to find better prompts.
-
Avoiding LLM hallucinations through analytical AI, is it possible? by Martin Deramecourt (October 25th, 2023) ► The experience of a company evaluating the use an LLM to answer to customer queries.
-
↪LLM Performance Optimization with Nvidia GPUs from Scaleway: A Technical Study by Kevin Baude (October 27th, 2023) ► Some information on running Llama-2 70B model using llama.cpp.
-
This is EXACTLY HOW some LLMs RANK TOP!!! by Abdul Majed Raja (November 9th, 2023) ► A paper "Don't Make Your LLM an Evaluation Benchmark Cheater" states the obvious: leaking benchmark data in the training data will result in better benchmark scores.
-
[1hr Talk] Intro to Large Language Models↑ by Andrej Karpathy (November 23rd, 2023) ► A good introduction and overview of LLMs.
-
"trust me", Google Bard REALLY launched a killer feature!!! by Abdul Majed Raja (November 24th, 2023) ► Bard is now able to get information from YouTube captions.
-
Extracting Training Data from ChatGPT by Milad Nasr, Nicholas Carlini, Jon Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Eric Wallace, Florian Tramèr, and Katherine Lee (November 28th, 2023) ► A summary of a research paper ("Scalable Extraction of Training Data from (Production) Language Models") studying training data extraction attacks and a basic explanation of patching an exploit vs. fixing a vulnerability.
-
↪Scalable Extraction of Training Data from (Production) Language Models (Paper Explained) by Yannic Kilcher (December 3rd, 2023) ► Some comments about the paper.
-
Round 2: We test the new Gemini-powered Bard against ChatGPT — We run the models through seven categories to determine an updated champion. by Kyle Orland (December 8th, 2023) ► An informal comparison of the new Bard (powered by Gemini), the old Bard (PaLM), ChatGPT 4, and ChatGPT 3.5.
-
Phi-2, Imagen-2, Optimus-Gen-2: Small New Models to Change the World? by Philip (December 13th, 2023) ► Some information about Phi-2 and the problems with MMLU.
-
is this brilliance or accuracy leak?↓ by Abdul Majed Raja (December 15th, 2023) ► As Abdul Majed Raja says himself, he is not enough competent to criticize this paper (TinyGSM: achieving > 80% on GSM8k with small language models).
-
Large Language Models: How Large is Large Enough? by Kip Yego (December 15th, 2023) ► A basic comparison of larger and smaller LLMs.
-
I tried Eric Hartford's "Save the Kittens" prompt!!!↓ by Abdul Majed Raja (December 19th, 2023) ► Some naive prompting…
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained by Letitia Parcalabescu (December 22nd, 2023) ► This description of the differences between DPO and RLHF is not enough detailed to understand how DPO really works.
-
Open Source LLMs with Simon Willison by Simon Willison, Bryan Cantrill, and Adam Leventhal (January 17th, 2024) ► The current status of LLM, open-weight models, jail breaking, prompt injection…
-
You can get PAID $$$ for Building AI LLMs!!↓ by Abdul Majed Raja (January 31st, 2024) ► A very unclear description of a reward mechanism for the best fine-tuned models.
-
AI Assistants with OPEN MODELS!!! by Abdul Majed Raja (February 2nd, 2024) ► It is now possible to create assistants in HuggingChat.
-
This 21B LMM Beats Gemini Pro & GPT-3.5!!! (in Vision) by Abdul Majed Raja (February 13th, 2024) ► A presentation and quick ’n dirty demonstration of Reka Flash.
-
The problem with this $50M Funded AI Startup!" by Abdul Majed Raja (February 29th, 2024) ► Ola’s Kutrim, an Indian LLM, seems not so good…
-
AI Prompt Engineering Is Dead — Long live AI prompt engineering by Dina Genkina (March 6th, 2024) ► At last, more people start to explain that "prompt engineering" is bullshit, autotuned prompts or, better, having LLM not requiring tuned prompts is the future.
-
22,000 H100s later, Inflection 2.5!!! by Abdul Majed Raja (March 7th, 2024) ► Ye another model claiming to be near GPT4 level.
-
The GPT-4 barrier has finally been broken by Simon Willison (March 8th, 2024) ► Some recent models claiming to be on par with GPT4 are arriving: Google Gemini 1.5, Mistral Large, Claude 3 Opus, and Inflection-2.5.
-
CANCELED GPT-4 After Talking to Claude 3 by Abdul Majed Raja (March 10th, 2024) ► Is it time to replace using GPT 4 by Claude 3?
-
This NEW LLM "Learnt" to "THINK" BEFORE "TALK"ING!!! by Abdul Majed Raja (March 15th, 2024) ► A presentation of "Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking" where the model is trained to generate rationales at each token to explain future text.
-
Releasing Common Corpus: the largest public domain dataset for training LLMs by Pierre-Carl Langlais (March 20th, 2024) ► The release of Common Corpus, a very large corpus of multilingual and copyright-free texts.
-
Claude and ChatGPT for ad-hoc sidequests by Simon Willison (March 22nd, 2024) ► A small example of using Claude 3 Opus and ChatGPT 4.
-
Inside the Creation of the World’s Most Powerful Open Source AI Model — Startup Databricks just released DBRX, the most powerful open source large language model yet—eclipsing Meta’s Llama 2. by Will Knight (March 27th, 2024) ► Some basic information about the training of a foundation model.
-
A little guide to building Large Language Models in 2024↑ by Thomas Wolf (March 28th, 2024) ► A good overview of the current technologies used to build an LLM.
-
I found this STUNNING Local Perplexity CLONE!!! by Abdul Majed Raja (April 8th, 2024) ► A presentation of LLocalSearch, a search aggregator using LLMs.
-
You can't build a moat with AI — It's all about the data by Vikram Sreekanti and Joseph E. Gonzalez (April 11th, 2024) ► The value of a system built on top of an LLM is not the model nor the prompt, but the data you provide to the model.
-
ChatGPT rêve-t-il de cavaliers électriques ?↑ by Thibaut Giraud and Mathieu Acher (April 14th, 2024) ► It appears that gpt-3.5-turbo-instruct is able to correctly play chess. Some searchers have been able to get smaller LLMs to play Othello and chess, and discovered that the models have built an internal representation of the board.
-
How to convert PDF DOCX to Structured TXT Formats for RAG! (UNSTRUCTURED Tutorial)↓ by Abdul Majed Raja (April 16th, 2024) ► A bad presentation of the unstructured library: a library to extract text from PDF, HTML, Word… documents.
-
Using and Finetuning Pretrained Transformers by Sebastian Raschka (April 20th, 2024) ► A list of quickly described options to use and fine-tune a foundation LLM.
-
Llama 3 from Scratch?? 15T Tokens Data for you!!! by Abdul Majed Raja (April 22nd, 2024) ► A huge open dataset is available: datasets/HuggingFaceFW/fineweb.
-
The NEW AI Models ARE A PROBLEM by Abdul Majed Raja (April 23rd, 2024) ► Abdul Majed Raja is getting tired of the benchmark war. But his discourse is unclear, current LLM are not intelligent, they are only performing some kind of very powerful pattern matching, so we should not expect to get them performing real reasoning, we can only expect them to "remember" and "match" more knowledge.
-
New Microsoft AI model may challenge GPT-4 and Google Gemini — In project headed by former Inflection chief, MAI-1 may have 500B parameters. by Benj Edwards (May 6th, 2024) ► Mustafa Suleyman is leading the create of Microsoft’s own large model.
-
How Good Are the Latest Open LLMs? And Is DPO Better Than PPO? — Discussing the Latest Model Releases and AI Research in April 2024 by Sebastian Raschka (May 12th, 2024) ► A simplistic comparison of Mixtral 8x22B vs. Llama 3 vs. Phi-3, OpenELM, and a comparison of DPO and PPO.
-
WARNING: Bad News for CHATGPT!↓ by Abdul Majed Raja (May 28th, 2024) ► A presentation, as bad as usual, of HuggingChat, a chat supporting tools.
-
Anthropic's Latest Winner - Workbench by Sam Witteveen (July 10th, 2024) ► A presentation and a demo of Anthropic Workbench, a tool to generate and evaluate prompts.
-
Instruction Pretraining LLMs — The Latest Research in Instruction Finetuning by Sebastian Raschka (July 20th, 2024) ► Generating an instruction dataset by providing empty prompts to Llama 3 8B, pretraining models with synthetised data containing raw texts and instruction-response pairs, and some information about Gemma 2.
-
[Own work] On Measuring Faithfulness or Self-consistency of Natural Language Explanations by Letitia Parcalabescu (July 26th, 2024) ► Letitia Parcalabescu proposes a self-consistency measurement.
-
Anthropic's Prompt Engineering Interactive Tutorial by Simon Willison (August 30th, 2024) ► Simon Willison presents some interesting information nuggets he found in Anthopic documentation.
-
Mission: Impossible language models – Paper Explained [ACL 2024 recording] by Letitia Parcalabescu (September 2nd, 2024) ► A presentation of a paper ("Mission: Impossible Language Models") claiming to disprove Noam Chomsky’s claim that LLMs can learn languages that are possible and impossible for humans to learn.
-
I am a Strange Dataset: Metalinguistic Tests for Language Models – Paper Explained [🔴 at ACL 2024] by Letitia Parcalabescu (September 10th, 2024) ► A short presentation of a dataset containing self-referencing sentences ("I am a Strange Dataset: Metalinguistic Tests for Language Models"). It appears that LLMs are bad at handling them.
-
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity↑ (⧉) by Dario Amodei and Lex Fridman (November 11th, 2024) ► Dario Amodei describes his vision of LLMs, Amanda Askell explains how she helps defining Claude temperament, and Chris Olah explains Mechanistic Interpretability.
-
🔥 This CHANGES the REASONING Game!!!💥 Nous Forge Reasoning💥↓ by Abdul Majed Raja (November 12th, 2024) ► A usual Abdul Majed Raja reading of an announcement: Nous’ Forge Reasoning API, yet another try to get better results by using Monte Carlo Tree Search, Chain of Code, and Mixture of Agents.
-
Small Language Models, Synthetic Data and Robotics at the opening of Web Summit 2024 by Thomas Wolf (November 15th, 2024) ► Some thoughts about the interest of small language models.
-
New Pleias 1.0 LLMs trained exclusively on openly licensed data by Simon Willison (December 5th, 2024) ► The title says it all.
-
A Deep Dive Into The RedPajama Datasets by Maurice Weber and Zain Hasan (December 6th, 2024) ► Some information on how the RedPajama Dataseset has been built.
-
Things we learned about LLMs in 2024 by Simon Willison (December 31st, 2024) ► A summary of the year.
-
How to OPTIMIZE your prompts for better Reasoning! by Sam Witteveen (January 9th, 2025) ► A presentation of PromptWizard, a Microsoft open-source framework to optimise prompts.
-
LLM Lecture: A Deep Dive into Transformers, Prompts, and Human Feedback by Letitia Parcalabescu (January 19th, 2025) ► A wide and good overview of how LLMs are implemented. But this is a lot of information in too little time. If you do not know about the matter, I guess you will have trouble to understand everything.
-
What is a Context Window? Unlocking LLM Secrets by Martin Keen (January 21st, 2025) ► A very basic explanation of context.
-
Deep Dive into LLMs like ChatGPT↑ by Andrej Karpathy (February 5th, 2025) ► A long, clear, and non technical description of how chat AI are built.
-
How I use LLMs by Andrej Karpathy (February 27th, 2025) ► Andrej Karpathy describes his use of GenAI, he uses mostly OpenAI.
-
Will AI Ever Understand Language Like Humans? — AI may sound like a human, but that doesn’t mean that AI learns like a human. In this episode, Ellie Pavlick explains why understanding how LLMs can process language could unlock deeper insights into both AI and the human mind. by Ellie Pavlick, Steven Strogatz, and Janna Levin (May 1st, 2025) ► There is nothing new in this interview, just generalities about LLMs.
-
Coding LLMs from the Ground Up: A Complete Course by Sebastian Raschka (May 10th, 2025) ► Sebastian Raschka lists the videos of his "Build a Large Language Model (From Scratch)" series.
-
Trying out llama.cpp’s new vision support by Simon Willison (May 10th, 2025) ► Simon Willison is experimenting with llama.cpp and unsloth/gemma-3-4b-it-GGUF.
-
How often do LLMs snitch? Recreating Theo’s SnitchBench with LLM by Simon Willison (May 31st, 2025) ► Models can act as whistler blowers when being asked to apply their values and having access to communication tooks.
-
Faster LLMs: Accelerate Inference with Speculative Decoding by Isaac Ke (June 4th, 2025) ► A description of speculative decoding.
-
Chatbot Arena
-
A much better LLM Leaderboard!!! by Abdul Majed Raja (November 28th, 2023) ► A presentation of Chatbot Arena.
-
Chatbot Arena: New models & Elo system update by Wei-Lin Chiang, Tim Li, Joseph E. Gonzalez, and Ion Stoica (December 7th, 2023) ► The title says it all.
-
LMSYS Chatbot Arena: Live and Community-Driven LLM Evaluation (March 1st, 2024) ► A presentation of Chatbot Arena by its authors.
-
From Live Data to High-Quality Benchmarks: The Arena-Hard Pipeline by Tianle Li, Wei-Lin Chiang, Evan Frick, Lisa Dunlap, Banghua Zhu, Joseph E. Gonzalez, and Ion Stoica (April 19th, 2024) ► A detailed description of Arena-Hard, a rather complex comparison mechanism trying to correctly evaluate and force differentiation in scoring chatbot.
-
Introducing Hard Prompts Category in Chatbot Arena by Tianle Li and Wei-Lin Chiang (May 17th, 2024) ► Some first results of Arena-Hard.
-
The Multimodal Arena is Here! by Christopher Chou, Lisa Dunlap, Wei-Lin Chiang, Ying Sheng, Lianmin Zheng, Anastasios Angelopoulos, Trevor Darrell, Ion Stoica, and Joseph E. Gonzalez (June 27th, 2024) ► Chatbot Arena now support images.
-
RedTeam Arena: An Open-Source, Community-driven Jailbreaking Platform by Anastasios Angelopoulos, Lucas Vivona, Wei-Lin Chiang, Aryan Vichare, Lisa Dunlap, Salvivona, "Pliny", and Ion Stoica (September 13th, 2024) ► RedTeam Arena tries to evaluate how difficult it is to jailbreaking models. It generates two leaderboards: one for the model, the other one for the gamers. But the first game is so basic that it has little value.
-
WebDev: This FREE AI Coder BEATS V0, Bolt & Has 3.5 SONNET, GPT-4O & More FOR FREE! by "AICodeKing" (December 14th, 2024) ► A presentation of WebDev Arena, an arena to benchmark models for web development tasks.
-
WebDev Arena by Simon Willison (December 16th, 2024) ► Simon Willison has extracted the system prompt of WebDev Arena.
-
Understanding the recent criticism of the Chatbot Arena by Simon Willison (April 30th, 2025) ► Large companies are gaming Chatbot Arena.
-
Model parameter extraction
-
SGLang
-
Achieving Faster Open-Source Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM) by Liangsheng Yin, Yineng Zhang, and Ying Sheng (July 25th, 2024) ► LMSYS has created a new server to host chat and vision servers, and they are proud of its performance.
-
SGLang v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision (September 4th, 2024) ► The title says it all.
-
SGLang v0.4: Zero-Overhead Batch Scheduler, Cache-Aware Load Balancer, Faster Structured Outputs (December 4th, 2024) ► The title says it all.
-
Deploying DeepSeek with PD Disaggregation and Large-Scale Expert Parallelism on 96 H100 GPUs (May 5th, 2025) ► A detailed and very technical description of the SGLang support of DeepSeek and achieved performance.
-
OpenAssistant