← Back to Arcade
Conversational Datasets Catalog
Curated list of dialogue datasets for training multi-turn instruction-following models
Research Summary: Recent datasets (2024-2026) show improved design for capturing authentic conversational phenomena including disfluencies, code-switching, and emotional nuances.
Multi-Turn Reasoning & Instruction-Following
MuTual
Reasoning dialogues • Multi-turn • Commonsense reasoning
Multi-turn reasoning dialogues that require commonsense reasoning across turns
DREAM
Reading comprehension • Dialogue-based • Multiple choice
Dialogue-based reading comprehension with multiple-choice questions
MultiWOZ (2.x)
Task-oriented • 7 domains • Wizard-of-Oz
Multi-domain Wizard-of-Oz dataset for task-oriented dialogues across 7 domains
Persona-Chat / ConvAI2
Persona-based • Multi-turn • Personality traits
Multi-turn conversations between speakers adopting specific personas with personality traits
Question Answering & Contextual Learning
CoQA
Conversational QA • Text passages • Question-answer pairs
Conversational Question Answering with question-answer pairs from text passages
HotpotQA
Multi-hop reasoning • Wikipedia-based • Complex questions
Multi-hop reasoning dataset with complex questions requiring information synthesis
QuAC
Information-seeking • Unanswerable questions • Contextual dialogue
Question Answering in Context with information-seeking dialogues and unanswerable questions
Daily Conversations & Social Interactions
DialogSum
Daily conversations • Summarization • High-quality dialogues
High-quality daily conversations covering various everyday topics with emotions
MASSIVE
Multi-language • 51 languages • Task-oriented
Multi-language dataset spanning 51 languages for task-oriented conversations
DailyDialog
Everyday topics • Emotional labels • High-quality
High-quality daily conversations covering various everyday topics with emotional annotations
Instruction-Tuning & Fine-Tuning Collections
Flan Collection
Instruction-tuning • Multi-task • Large-scale
Large-scale instruction-tuning dataset collection for multi-task learning
ChatAlpaca
Conversational AI • Instruction-following • Multi-turn
Dataset for conversational AI training with multi-turn instruction-following capabilities
Technical Note: Most datasets are available through Hugging Face datasets, GitHub repositories, or academic portals. Recent datasets (2024-2026) feature improved design for capturing authentic conversational phenomena.
Source: Research compiled from conversational dialogue datasets taxonomy (Feb 2026)