← Back to Arcade

Conversational Datasets Catalog

Curated list of dialogue datasets for training multi-turn instruction-following models

Research Summary: Recent datasets (2024-2026) show improved design for capturing authentic conversational phenomena including disfluencies, code-switching, and emotional nuances.

Multi-Turn Reasoning & Instruction-Following

MuTual

Reasoning dialogues • Multi-turn • Commonsense reasoning
Multi-turn reasoning dialogues that require commonsense reasoning across turns
Download: https://github.com/Nealcly/MuTual

DREAM

Reading comprehension • Dialogue-based • Multiple choice
Dialogue-based reading comprehension with multiple-choice questions
Download: https://github.com/nlpdata/dream

MultiWOZ (2.x)

Task-oriented • 7 domains • Wizard-of-Oz
Multi-domain Wizard-of-Oz dataset for task-oriented dialogues across 7 domains
Download: https://github.com/budzianowski/multiwoz

Persona-Chat / ConvAI2

Persona-based • Multi-turn • Personality traits
Multi-turn conversations between speakers adopting specific personas with personality traits
Download: https://parl.ai/projects/convai2/

Question Answering & Contextual Learning

CoQA

Conversational QA • Text passages • Question-answer pairs
Conversational Question Answering with question-answer pairs from text passages
Download: https://stanfordnlp.github.io/coqa/

HotpotQA

Multi-hop reasoning • Wikipedia-based • Complex questions
Multi-hop reasoning dataset with complex questions requiring information synthesis
Download: https://hotpotqa.github.io/

QuAC

Information-seeking • Unanswerable questions • Contextual dialogue
Question Answering in Context with information-seeking dialogues and unanswerable questions
Download: https://quac.ai/

Daily Conversations & Social Interactions

DialogSum

Daily conversations • Summarization • High-quality dialogues
High-quality daily conversations covering various everyday topics with emotions
Download: https://huggingface.co/datasets/knkarthick/dialogsum

MASSIVE

Multi-language • 51 languages • Task-oriented
Multi-language dataset spanning 51 languages for task-oriented conversations
Download: https://huggingface.co/datasets/amazon/massive

DailyDialog

Everyday topics • Emotional labels • High-quality
High-quality daily conversations covering various everyday topics with emotional annotations
Download: https://github.com/thu-coai/dailydialog

Instruction-Tuning & Fine-Tuning Collections

Flan Collection

Instruction-tuning • Multi-task • Large-scale
Large-scale instruction-tuning dataset collection for multi-task learning
Download: https://github.com/google-research/FLAN

ChatAlpaca

Conversational AI • Instruction-following • Multi-turn
Dataset for conversational AI training with multi-turn instruction-following capabilities
Download: https://github.com/cascaded-ai/ChatAlpaca
Technical Note: Most datasets are available through Hugging Face datasets, GitHub repositories, or academic portals. Recent datasets (2024-2026) feature improved design for capturing authentic conversational phenomena.

Source: Research compiled from conversational dialogue datasets taxonomy (Feb 2026)