ACM SIGKDD 2026 · Tutorial papers.lunadong.com
Cutting-Edge Tutorial

Retrieval-Augmented Generation (RAG) — From Modular to Agentic Systems

An in-depth treatment of modern RAG and Deep Research, grounded in an AI-assisted systematic analysis of 2,000+ papers (2020–2026).

Paper Radar

An interactive survey of ~2,000 papers, organized by topic and continuously updated before and after the tutorial. A single running example evolves across all sessions, and live polls let attendees predict benchmark results before the reveal.

More related surveys & courses. Browse our continuously growing collection of surveys, reading lists, and short courses on factuality, RAG, and agents — all curated and kept current at papers.lunadong.com.

Half day3.5-hour lecture
2,000+ papersSystematically analyzed
3 paradigmsModular · Graph · Agentic

Schedule · 3.5 Hours

1

Intro40 min

Slides
  • 1.1 Introduction
  • 1.2 Factuality Overview
  • 1.3 RAG Benchmarks & State-of-the-art
2

RAG Deep Dive

2.1

Modularized RAG50 min

Slides
  • 2.1.0 Overview5 min
  • 2.1.1 RAG Triggering8 min
  • 2.1.2 Query Rewriting8 min
  • 2.1.3 Retrieval8 min
  • 2.1.4 Post-processing10 min
  • 2.1.5 Answer Generation11 min
2.2

Graph-Enhanced RAG30 min

Slides
☕ Coffee Break30 min
2.3

Agentic RAG30 min

Slides
3

Advanced Topics

3.1

Deep Research20 min

Slides
3.2

Multi-modal RAG20 min

Slides
4

Practical Tips, Synthesis & Future Directions15 min

Slides

Speakers

Early inventors of RAG, active researchers, and industry practitioners — also the organizing team behind KDD Cup 2024 (CRAG) and 2025 (CRAG-MM).

Xin Luna Dong
Xin Luna Dong
Principal Scientist, Meta Wearables AI · ACM & IEEE Fellow
Sanat Sharma
Sanat Sharma
Research Engineer, Meta Reality Labs
Scott Wen-tau Yih
Scott Wen-tau Yih
Research Scientist, Meta FAIR · Affiliate Professor, UW · ACL Fellow
Yinglong Xia
Yinglong Xia
Applied Research Scientist, Meta Recommendation Systems
Xiao Yang
Xiao Yang
Research Scientist, Meta Reality Labs
Kai Sun
Kai Sun
Research Scientist, Meta Reality Labs
Jiaqi Wang
Jiaqi Wang
ML Engineer, Meta Reality Labs
Franklin Zhang
Franklin Zhang
UW CSE · Online AI Companion

More about the tutorial

Abstract

Despite well-known hallucination issues, LLMs have become an increasingly indispensable source of information, and the underlying technology has advanced rapidly to make their answers far more reliable.

Early on, Retrieval-Augmented Generation (RAG) emerged as the dominant remedy, grounding LLM responses in external knowledge and evolving from simple retrieve-then-read pipelines into modular, graph-enhanced, and agentic systems. More recently, Agentic Deep Research has pushed the frontier further, equipping LLMs with autonomous planning, multi-hop investigation, and iterative synthesis to tackle open-ended questions that no single retrieval pass can answer.

This tutorial offers an in-depth treatment of modern RAG and Deep Research, grounded in an AI-assisted systematic analysis of 2,000+ recent papers (2020–2026). Attendees will leave with a structured roadmap, evidence-backed practical recommendations, and a clear map of open research opportunities.

Running Examples

A single question grows across the tutorial — from a simple lookup to an open-ended research task — each technique visibly improving on the last. We follow one running electric-vehicle example across the full range of question types:

Simple
What is the battery capacity of a Tesla Model 3?Expected answer: a single number (e.g., "~60 kWh for the Standard Range, ~82 kWh for the Long Range").
Complex
Which has a longer driving range — the Tesla Model 3 or the Lucid Air?
Deep Research
I'm considering buying an electric vehicle in 2026. Which model is the best choice, and why?Expected answer: a short structured report identifying 2–3 top candidates, comparing them across key dimensions, and offering a recommendation tailored to the user — along with the main trade-offs.
Multi-modal
[Photo of an EV in a parking lot] How far can it go on a full charge?Expected answer: "That looks like a Tesla Model 3 Long Range — it has an EPA-rated range of about 358 miles on a full charge."

Who Should Attend

Researchers

In NLP, data mining, and knowledge management seeking a structured map of the RAG landscape and its open problems.

Applied Scientists & Engineers

Building production LLM systems who need evidence-backed guidelines for choosing RAG architectures.

Graduate Students

Entering the field, wanting a comprehensive entry point to 2K+ papers organized by topic and timeline.

Prerequisites. Familiarity with basic LLM concepts (pre-training, fine-tuning, prompting). No prior knowledge of RAG or hallucination detection is assumed.

Reading List

denotes recommended pre-reading. The complete ~2,000-paper list lives in the online companion.

  1. Lewis et al. (2020) — the original paper introducing RAG for knowledge-intensive NLP tasks.
  2. Gao et al. (2024) — a comprehensive survey of RAG for LLMs (architecture taxonomy & evaluation).
  3. Huang et al. (2023) — a survey on hallucination in LLMs, motivating the need for RAG.
  4. Guu et al. (2020) — REALM: retrieval-augmented language model pre-training.
  5. Asai et al. (2023) — Self-RAG: learning to retrieve, generate, and critique via self-reflection.
  6. Jiang et al. (2023) — FLARE: active retrieval-augmented generation with adaptive triggering.
  7. Edge et al. (2024) — GraphRAG: a graph-based approach to query-focused summarization.
  8. Jin et al. (2025) — Search-R1: training LLMs to reason and use search engines with RL.
  9. Yang et al. (2024) — CRAG: a comprehensive benchmark for end-to-end RAG systems.
  10. Ni et al. (2025) — a recent survey on trustworthy RAG (robustness & reliability).