Large Language Models Post-training: Surveying Techniques from Alignment to Reasoning

Welcome to the LLM-Post-training-Survey repository! This repository is a curated collection of the most influential Fine-Tuning, alignment, reasoning, and efficiency related to Large Language Models (LLMs) Post-Training Methodologies.

Our work is based on the following paper:
📄 Large Language Models Post-training: Surveying Techniques from Alignment to Reasoning – Available on

Corresponding authors: Guiyao Tie, Zeli zhao.

Feel free to ⭐ star and fork this repository to keep up with the latest advancements and contribute to the community.

Structural overview of post-training techniques surveyed in this study, illustrating the organization of methodologies, datasets, and applications.

Timeline of post-training technique development for Large Language Models(2018–2025), delineating key milestones in their historical progression

📌 Contents

Section	Subsection
🤖 PoLMs for Fine-Tuning	Supervised Fine-Tuning, Adaptive Fine-Tuning, Reinforcement Fine-Tuning
🏆 PoLMs for Alignment	Reinforcement Learning with Human Feedback, Reinforcement Learning with AI Feedback, Direct Preference Optimization
🚀 PoLMs for Reasoning	Self-Refine for Reasoning, Reinforcement Learning for Reasoning
🧠 PoLMs for Efficiency	Model Compression, Parameter-Efficient Fine-Tuning, Knowledge-Distillation
🌀 PoLMs for Integration and Adaptation	Multi-Modal Integration, Domain Adaptation, Model Merging
🤝 Datasets	Human-Labeled Datasets, Distilled Dataset, Synthetic Datasets
📚 Applications	Professional Domains, Technical and Logical Reasoning, Understanding and Interaction

📖 Papers

🤖 PoLMs for Fine-Tuning

Fine-tuning constitutes a cornerstone of adapting pre-trained Large Language Models (LLMs) to specialized tasks, refining their capabilities through targeted parameter adjustments. This process leverages labeled or task-specific datasets to optimize performance, bridging the gap between general-purpose pre-training and domain-specific requirements. This chapter explores three principal fine-tuning paradigms: Supervised Fine-Tuning, which employs annotated datasets to enhance task-specific accuracy; Adaptive Fine-Tuning, which customizes model behavior via instruction tuning and prompt-based methods; and Reinforcement Fine-Tuning, which integrates reinforcement learning to iteratively refine outputs based on reward signals, fostering continuous improvement through dynamic interaction.

Supervised Fine-Tuning

LLaMA: Open and efficient foundation language models Paper
GPT-4 technical report Paper
Beyond Goldfish Memory: Long-Term Open-Domain Conversation Paper
Don't stop pretraining: Adapt language models to domains and tasks Paper
Exploring the limits of transfer learning with a unified text-to-text transformer Paper
BERT: Pre-training of deep bidirectional transformers for language understanding Paper
Mixed precision training Paper
Training deep nets with sublinear memory cost Paper
Learning word vectors for sentiment analysis Paper

Adaptive Fine-Tuning

Instruction Mining: High-Quality Instruction Data Selection for Large Language Models Paper
Instruction Tuning for Large Language Models: A Survey Paper
Self-instruct: Aligning language model with self generated instructions Paper
Chain-of-thought prompting elicits reasoning in large language models Paper
LoRA: Low-Rank Adaptation of Large Language Models Paper
Prefix-tuning: Optimizing continuous prompts for generation Paper
Finetuned Language Models are Zero-Shot Learners Paper
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks Paper
The power of scale for parameter-efficient prompt tuning Paper
Language models are few-shot learners Paper
AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts Paper
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference Paper
How Can We Know What Language Models Know? Paper
Language models are unsupervised multitask learners Paper

Reinforcement Fine-Tuning

ReFT: Reasoning with Reinforced Fine-Tuning Paper
Training language models to follow instructions with human feedback Paper
Proximal policy optimization algorithms Paper

🏆 PoLMs for Alignment

Alignment in LLMs involves guiding model outputs to conform to human expectations and preferences, particularly in safety-critical or user-facing applications. This chapter discusses three major paradigms for achieving alignment: Reinforcement Learning with Human Feedback, which employs human-labeled data as a reward signal; Reinforcement Learning with AI Feedback, which leverages AI-generated feedback to address scalability issues; and Direct Preference Optimization, which learns directly from pairwise human preference data without requiring an explicit reward model. Each paradigm offers distinct advantages, challenges, and trade-offs in its pursuit of robust alignment. A concise comparison of these and related methods is summarized in paper Table2.

Reinforcement Learning with Human Feedback

DARD: Distributed Adaptive Reward Design for Deep RL Paper
Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation Paper
FREEHAND: Learning from Offline Human Feedback Paper
DCPPO: Deep Conservative Policy Iteration for Offline Reinforcement Learning Paper
PERL: Preference-based Reinforcement Learning with Optimistic Exploration Paper
Training language models to follow instructions with human feedback Paper
Robust Speech Recognition via Large-Scale Weak Supervision Paper
A Multi-Agent Benchmark for Studying Emergent Communication Paper
Offline Reinforcement Learning with Implicit Q-Learning Paper
PFERL: Preference-based Reinforcement Learning with Human Feedback Paper
A General Language Assistant as a Laboratory for Alignment Paper
A Minimalist Approach to Offline Reinforcement Learning Paper
PRFI: Preprocessing Reward Functions for Interpretability Paper
Guidelines for human-AI interaction Paper
Learning human objectives by evaluating hypothetical behaviors Paper
Social influence as intrinsic motivation for multi-agent deep reinforcement learning Paper
Learning from Physical Human Corrections, One Feature at a Time Paper
Deep reinforcement learning from human preferences Paper
Interactive learning from policy-dependent human feedback Paper
Proximal policy optimization algorithms Paper
Active preference-based learning of reward functions Paper
Emergence of locomotion behaviours in rich environments Paper
Asynchronous methods for deep reinforcement learning Paper
Cooperative inverse reinforcement learning Paper
Trust region policy optimization Paper
Continuous control with deep reinforcement learning Paper
Policy shaping: Integrating human feedback with reinforcement learning Paper
A reduction of imitation learning and structured prediction to no-regret online learning Paper
Interactively shaping agents via human reinforcement: The TAMER framework Paper
Rational and Convergent Learning in Stochastic Games Paper
Algorithms for inverse reinforcement learning Paper

Reinforcement Learning with AI Feedback

RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback Paper
Constitutional AI: Harmlessness from AI Feedback Paper

Direct Preference Optimization

Taxonomizing Failure Modes of Direct Preference Optimization Paper
Step-wise Direct Preference Optimization: A Rank-Based Approach to Alignment Paper
SimPO: Simple Preference Optimization with a Reference-Free Reward Paper
Token-level Direct Preference Optimization Paper
Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning Paper
Improving and Generalizing Bandit Algorithms via Direct Preference Optimization Paper
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs Paper
Negating Negatives: Alignment without Human Positives via Automatic Negative Sampling Paper
Rethinking Reinforcement Learning from Human Feedback with Efficient Reward Optimization Paper
LiPO: Listwise Preference Optimization through Learning-to-Rank Paper
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment Paper
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint Paper
Preference Ranking Optimization for Human Alignment Paper
Direct Preference Optimization: Your Language Model is Secretly a Reward Model Paper
Exploring Reward Model Evaluation through Distance Functions Paper
RRHF: Rank Responses to Align Language Models with Human Feedback without tears Paper
GPT-4 technical report Paper
Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection Paper
Claude Paper
Gemini: A Family of Highly Capable Multimodal Models Paper
Self-instruct: Aligning language model with self generated instructions Paper
Modeling purposeful adaptive behavior with the principle of maximum causal entropy Paper
Maximum entropy reinforcement learning Paper
Transfer learning for reinforcement learning domains: A survey Paper
Recent advances in hierarchical reinforcement learning Paper
Rank analysis of incomplete block designs: I. the method of paired comparisons Paper

🚀 PoLMs for Reasoning

Reasoning constitutes a central pillar for enabling LLMs to tackle tasks involving multi-step logic, intricate inference, and complex decision-making. This chapter examines two core techniques for enhancing model reasoning capabilities: Self-Refine for Reasoning, which guides the model to autonomously detect and remedy errors in its own reasoning steps; and Reinforcement Learning for Reasoning, which employs reward-based optimization to improve the consistency and depth of the model’s chain-of-thought. These approaches collectively enable more robust handling of long-horizon decision-making, logical proofs, mathematical reasoning, and other challenging tasks.

Self-Refine for Reasoning

Accessing GPT-4 level mathematical olympiad solutions via Monte Carlo Tree Self-Refine with Llama-3 8B [Paper]
DeepseekMath: Pushing the limits of mathematical reasoning in open language models [Paper]
Language models can solve computer tasks [Paper]
Cycle: Learning to self-refine the code generation [Paper]
Self-Contrast: Better reflection through inconsistent solving perspectives [Paper]
Improving LLM-based machine translation with systematic self-correction [Paper]
Reflexion: Language agents with verbal reinforcement learning [Paper]
Selfee: Iterative self-revising LLM empowered by self-feedback generation [Paper]
SelfEvolve: A code evolution framework via large language models [Paper]
Self-Edit: Fault-aware code editor for code generation [Paper]
Self-critiquing models for assisting human evaluators [Paper]
RE³: Generating longer stories with recursive reprompting and revision [Paper]
Generating sequences by learning to self-correct [Paper]
RARR: Researching and revising what language models say, using language models [Paper]

Reinforcement Learning for Reasoning

QwQ: Reflect Deeply on the Boundaries of the Unknown [Paper]
On the Convergence Rate of MCTS for the Optimal Value Estimation in Markov Decision Processes [Paper]
Refiner: Reasoning feedback on intermediate representations [Paper]
CRITIC: Large language models can self-correct with tool-interactive critiquing [Paper]
Teaching large language models to self-debug [Paper]
MM-React: Prompting ChatGPT for multimodal reasoning and action [Paper]
DetGPT: Detect what you need via reasoning [Paper]
RL4F: Generating natural language feedback with reinforcement learning for repairing model outputs [Paper]
Logic-LM: Empowering large language models with symbolic solvers for faithful logical reasoning [Paper]
Baldur: Whole-proof generation and repair with large language models [Paper]
CoderL: Mastering code generation through pretrained models and deep reinforcement learning [Paper]

🧠 PoLMs for Efficiency

Building on the post-training optimization techniques discussed in earlier chapters, post-training efficiency specifically targets the operational performance of LLMs after their initial pre-training. The principal goal is to optimize key deployment metrics (e.g., processing speed, memory usage, and resource consumption), thereby making LLMs more practical for real-world applications. Approaches to achieving post-training efficiency fall into three main categories: Model Compression, which reduces the overall computational footprint through techniques such as pruning and quantization; Parameter-Efficient Fine-Tuning, which updates only a fraction of a model’s parameters or employs specialized modules, thus minimizing retraining costs and accelerating adaptation to new tasks; and Knowledge Distillation, which transfers the knowledge from a larger, pre-trained model to a smaller model, enabling the smaller model to achieve comparable performance with reduced resource demands.

Model Compression

Agents Thinking Fast and Slow: A Talker-Reasoner Architecture [Paper]
Qlora: Efficient finetuning of quantized LLMs [Paper]
Quip: 2-bit quantization of large language models with guarantees [Paper]
SVD-LLM: Truncation-aware singular value decomposition for large language model compression [Paper]
KvQuant: Towards 10 million context length LLM inference with KV cache quantization [Paper]
Kivi: A tuning-free asymmetric 2bit quantization for KV cache [Paper]
Wkvquant: Quantizing weight and key/value cache for large language models gains more [Paper]
SliceGPT: Compress large language models by deleting rows and columns [Paper]
One-shot sensitivity-aware mixed sparsity pruning for large language models [Paper]
Fluctuation-based adaptive structured pruning for large language models [Paper]
LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning [Paper]
SparseGPT: Massive language models can be accurately pruned in one-shot [Paper]
SmoothQuant: Accurate and efficient post-training quantization for large language models [Paper]
Deja Vu: Contextual sparsity for efficient LLMs at inference time [Paper]
LoSparse: Structured compression of large language models based on low-rank and sparse approximation [Paper]
A simple and effective pruning approach for large language models [Paper]
Reorder-based posttraining quantization for large language models [Paper]
Owq: Lessons learned from activation outliers for weight quantization in large language models [Paper]
Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling [Paper]
Omniquant: Omnidirectionally calibrated quantization for large language models [Paper]
Flash-LLM: Enabling cost-effective and highly-efficient large generative model inference with unstructured sparsity [Paper]
LLM-Pruner: On the Structural Pruning of Large Language Models [Paper]
ASVD: Activation-aware singular value decomposition for compressing large language models [Paper]
TensorGPT: Efficient compression of the embedding layer in LLMs based on the tensor-train decomposition [Paper]
Sheared LLaMA: Accelerating language model pre-training via structured pruning [Paper]
Gpt3.int8(): 8-bit matrix multiplication for transformers at scale [Paper]
Language model compression with weighted low-rank factorization [Paper]
Optimal Brain Damage [Paper]

Parameter-Efficient Fine-Tuning

Dora: Weight-decomposed low-rank adaptation [Paper]
AutoPEFT: Automatic configuration search for parameter-efficient fine-tuning [Paper]
Conditional adapters: Parameter-efficient transfer learning with fast inference [Paper]
Mera: Merging pretrained adapters for few-shot learning [Paper]
When do prompting and prefix-tuning work? A theory of capabilities and limitations [Paper]
PTP: Boosting stability and performance of prompt tuning with perturbation-based regularizer [Paper]
DEPT: Decomposed prompt tuning for parameter-efficient fine-tuning [Paper]
SMoP: Towards Efficient and Effective Prompt Tuning with Sparse Mixture-of-Prompts [Paper]
On the effectiveness of parameter-efficient fine-tuning [Paper]
AdaLoRA: Adaptive budget allocation for parameter-efficient fine-tuning [Paper]
Sparse low-rank adaptation of pre-trained language models [Paper]
Bayesian low-rank adaptation for large language models [Paper]
Vera: Vector-based random matrix adaptation [Paper]
Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning [Paper]
Scaling & shifting your features: A new baseline for efficient model tuning [Paper]
Xprompt: Exploring the extreme of prompt tuning [Paper]
IDPG: An instance-dependent prompt generation method [Paper]
Neural prompt search [Paper]
Inference-time policy adapters (IPA): Tailoring extreme-scale LMs without fine-tuning [Paper]
Training neural networks with fixed sparse masks [Paper]
Raise a child in large language model: Towards effective and generalizable fine-tuning [Paper]
BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models [Paper]
Compacter: Efficient low-rank hypercomplex adapter layers [Paper]
Dylora: Parameter efficient tuning of pre-trained models using dynamic search-free low-rank adaptation [Paper]
SPoT: Better frozen model adaptation through soft prompt transfer [Paper]
Towards a unified view of parameter-efficient transfer learning [Paper]
Intrinsic dimensionality explains the effectiveness of language model fine-tuning [Paper]
Diff pruning: Parameter-efficient transfer learning with diff pruning [Paper]
AdapterFusion: Non-destructive task composition for transfer learning [Paper]
Parameter-efficient transfer learning for NLP [Paper]

Knowledge-Distillation

Bitdistiller: Unleashing the potential of sub-4-bit LLMs via self-distillation [Paper]
Born Again Neural Networks [Paper]
Distilling the Knowledge in a Neural Network [Paper]
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation [Paper]
On information and sufficiency [Paper]

🌀 PoLMs for Integration and Adaptation

Integration and adaptation techniques are pivotal for enhancing the versatility and efficacy of LLMs across diverse real-world applications. These methodologies enable LLMs to seamlessly process heterogeneous data types, adapt to specialized domains, and leverage multiple architectural strengths, thereby addressing complex, multifaceted challenges. This chapter delineates three principal strategies: Multi-modal Integration, which equips models to handle diverse data modalities such as text, images, and audio; Domain Adaptation, which refines models for specific industries or use cases; and Model Merging, which amalgamates capabilities from distinct models to optimize overall performance. Collectively, these approaches enhance LLMs’ adaptability, efficiency, and robustness, broadening their applicability across varied tasks and contexts.

Multi-Modal Integration

Modal Connection

What matters when building vision-language models? [Paper]
Claude 3.7 Sonnet [Paper]
Qwen2.5-VL Technical Report [Paper]
Shikra: Unleashing Multimodal LLM’s Referential Dialogue Magic. arXiv 2023 [Paper]
OpenAI GPT-4.5 System Card [Paper]
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization [Paper]
A Comprehensive Overhaul of Multimodal Assistant with Small Language Models [Paper]
Vl-mamba: Exploring state space models for multimodal learning [Paper]
The llama 3 herd of models [Paper]
Vila: On pre-training for visual language models [Paper]
CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation [Paper]
Cobra: Extending mamba to multi-modal large language model for efficient inference [Paper]
Anymal: An efficient and scalable any-modality augmented language model [Paper]
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning [Paper]
Sphinx-x: Scaling data and parameters for a family of multi-modal large language models [Paper]
Improved baselines with visual instruction tuning [Paper]
Voicecraft: Zero-shot speech editing and text-to-speech in the wild [Paper]
QVQ: To See the World with Wisdom [Paper]
Deepseek-vl2: Mixture-of-experts vision-language models for advanced multimodal understanding [Paper]
Obelics: An open web-scale filtered dataset of interleaved image-text documents [Paper]
Lion: Empowering multimodal large language model with dual-level visual knowledge [Paper]
X-instructblip: A framework for aligning x-modal instruction-aware representations to llms and emergent cross-modal reasoning [Paper]
Llama-adapter: Efficient fine-tuning of language models with zero-init attention [Paper]
X-llm: Bootstrapping advanced large language models by treating multi-modalities as foreign languages [Paper]
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks [Paper]
Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models [Paper]
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning [Paper]
Minigpt-4: Enhancing vision-language understanding with advanced large language models [Paper]
Grounding language models to images for multimodal inputs and outputs [Paper]
Llama-adapter v2: Parameter-efficient visual instruction model [Paper]
Lyrics: Boosting fine-grained language-vision alignment and comprehension via semantic-aware visual objects [Paper]
Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond [Paper]
Next-gpt: Any-to-any multimodal llm [Paper]
Video-llama: An instruction-tuned audio-visual language model for video understanding [Paper]
Speechgpt: Empowering large language models with intrinsic cross-modal conversational abilities [Paper]
Visual Instruction Tuning [Paper]
Openflamingo: An open-source framework for training large autoregressive vision-language models [Paper]
One for all: Video conversation is feasible without video instruction tuning [Paper]
Cogvlm: Visual expert for pretrained language models [Paper]
A survey on multimodal large language models [Paper]
Imagebind-llm: Multi-modality instruction tuning [Paper]
Otter: a multi-modal model with in-context instruction tuning. CoRR abs/2305.03726 (2023) [Paper]
Audiopalm: A large language model that can speak and listen [Paper]
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models [Paper]
mplug-owl: Modularization empowers large language models with multimodality [Paper]
Videochat: Chat-centric video understanding [Paper]
Detgpt: Detect what you need via reasoning [Paper]
Flamingo: a visual language model for few-shot learning [Paper]

Modal Encoder

Vl-mamba: Exploring state space models for multimodal learning [Paper]
A Comprehensive Overhaul of Multimodal Assistant with Small Language Models [Paper]
CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation [Paper]
Cobra: Extending mamba to multi-modal large language model for efficient inference [Paper]
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning [Paper]
Sphinx-x: Scaling data and parameters for a family of multi-modal large language models [Paper]
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research [Paper]
X-llm: Bootstrapping advanced large language models by treating multi-modalities as foreign languages [Paper]
Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models [Paper]
ImageBind: One Embedding Space To Bind Them All [Paper]
Eva: Exploring the limits of masked visual representation learning at scale [Paper]
Next-gpt: Any-to-any multimodal llm [Paper]
Google usm: Scaling automatic speech recognition beyond 100 languages [Paper]
Speechgpt: Empowering large language models with intrinsic cross-modal conversational abilities [Paper]
Visual Instruction Tuning [Paper]
One for all: Video conversation is feasible without video instruction tuning [Paper]
Dinov2: Learning robust visual features without supervision [Paper]
Sigmoid loss for language image pre-training [Paper]
Imagebind-llm: Multi-modality instruction tuning [Paper]
Audiopalm: A large language model that can speak and listen [Paper]
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models [Paper]
Videochat: Chat-centric video understanding [Paper]
Hts-at: A hierarchical token-semantic audio transformer for sound classification and detection [Paper]
Learning transferable visual models from natural language supervision [Paper]
Panns: Large-scale pretrained audio neural networks for audio pattern recognition [Paper]
An image is worth 16x16 words: Transformers for image recognition at scale [Paper]

Domain Adaptation

Knowledge Editing

Mitigating Heterogeneous Token Overfitting in LLM Knowledge Editing [Paper]
Melo: Enhancing model editing with neuron-indexed dynamic lora [Paper]
Knowledge Editing for Large Language Model with Knowledge Neuronal Ensemble [Paper]
Aging with grace: Lifelong model editing with discrete key-value adaptors [Paper]
LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models [Paper]
A comprehensive study of knowledge editing for large language models [Paper]
Inspecting and editing knowledge representations in language models [Paper]
Pokemqa: Programmable knowledge editing for multi-hop question answering [Paper]
Eva-kellm: A new benchmark for evaluating knowledge editing of llms [Paper]
Transformer-patcher: One mistake worth one neuron [Paper]
Massive editing for large language models via meta learning [Paper]
Methods for measuring, updating, and visualizing factual beliefs in language models [Paper]
Calibrating factual knowledge in pretrained language models [Paper]
Memory-based model editing at scale [Paper]
Fast model editing at scale [Paper]
Editing factual knowledge in language models [Paper]
Editable neural networks [Paper]
Modifying memories in transformer models [Paper]

Retrieval-Augmented Generation

REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models [Paper]
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models [Paper]
Hipporag: Neurobiologically inspired long-term memory for large language models [Paper]
Toolformer: Language models can teach themselves to use tools [Paper]
Benchmarking retrieval-augmented generation for medicine [Paper]
Retrieval-augmented generation for ai-generated content: A survey [Paper]
Reward-RAG: Enhancing RAG with Reward Driven Supervision [Paper]
Retrieval-augmented generation for large language models: A survey [Paper]
Shall we pretrain autoregressive language models with retrieval? a comprehensive study [Paper]
Replug: Retrieval-augmented black-box language models [Paper]
Enhancing financial sentiment analysis via retrieval augmented large language models [Paper]
Ra-dit: Retrieval-augmented dual instruction tuning [Paper]
Active retrieval augmented generation [Paper]
Self-rag: Learning to retrieve, generate, and critique through self-reflection [Paper]
Learning to retrieve in-context examples for large language models [Paper]
Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy [Paper]
Optimizing science question ranking through model and retrieval-augmented generation [Paper]
Improving language models by retrieving from trillions of tokens [Paper]
Retrieval-augmented transformer for image captioning [Paper]
Dense passage retrieval for open-domain question answering [Paper]
Leveraging passage retrieval with generative models for open domain question answering [Paper]
Retrieval-augmented generation for knowledge-intensive nlp tasks [Paper]
Retrieval augmented language model pre-training [Paper]
Learning binary codes for maximum inner product search [Paper]

Model Merging

Model Merging at Hierarchical Levels

Knowledge fusion of large language models [Paper]
Language models are super mario: Absorbing abilities from homologous models as a free lunch [Paper]
Fusechat: Knowledge fusion of chat models [Paper]
Soft merging of experts with adaptive routing [Paper]
Llm-blender: Ensembling large language models with pairwise ranking and generative fusion [Paper]
From sparse to soft mixtures of experts [Paper]
Editing models with task arithmetic [Paper]
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time [Paper]
Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity [Paper]
All You Need Is Low (Rank) Defending Against Adversarial Attacks on Graphs [Paper]

Pre-Merging Methods

Fusechat: Knowledge fusion of chat models [Paper]
Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic [Paper]
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models [Paper]
Tangent Transformers for Composition, Privacy and Removal [Paper]
Git Re-Basin: Merging Models modulo Permutation Symmetries [Paper]
REPAIR: REnormalizing Permuted Activations for Interpolation Repair [Paper]
Personalized Federated Learning using Hypernetworks [Paper]
GAN Cocktail: mixing GANs without dataset access [Paper]
On Cross-Layer Alignment for Model Fusion of Heterogeneous Neural Networks [Paper]
Model Fusion via Optimal Transport [Paper]

During-Merging Methods

Arcee's MergeKit: A Toolkit for Merging Large Language Models [Paper]
Ties-merging: Resolving interference when merging models [Paper]
Language models are super mario: Absorbing abilities from homologous models as a free lunch [Paper]
AdaMerging: Adaptive Model Merging for Multi-Task Learning [Paper]
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts [Paper]
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic [Paper]
Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging [Paper]
Representation Surgery for Multi-Task Model Merging [Paper]
Soft merging of experts with adaptive routing [Paper]
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion [Paper]
Editing models with task arithmetic [Paper]

🤝 Datasets

Post-training techniques are meticulously engineered to refine the adaptability of LLMs to specialized domains or tasks, leveraging datasets as the cornerstone of this optimization process. A thorough examination of prior researc underscores that the quality, diversity, and relevance of data profoundly influence model efficacy, often determining the success of post-training endeavors. To elucidate the critical role of datasets in this context, we present a comprehensive review and in-depth analysis of those employed in post-training phases, categorizing them into three principal types based on their collection methodologies: human-labeled data, distilled data, and synthetic data. These categories reflect distinct strategies in data curation, with models adopting either a singular approach or a hybrid methodology integrating multiple types to balance scalability, cost, and performance. Paper Tab.9 provides a detailed overview of these dataset types, encompassing their origins, sizes, languages, tasks, and post-training phases (e.g., SFT and RLHF), which we explore in subsequent sections to highlight their contributions and challenges in advancing LLM capabilities.

Human-Labeled Datasets

Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM [Paper]
OpenAssistant Conversations -- Democratizing Large Language Model Alignment [Paper]
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection [Paper]
Crosslingual Generalization through Multitask Finetuning [Paper]
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning [Paper]
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks [Paper]
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback [Paper]
Multitask Prompted Training Enables Zero-Shot Task Generalization [Paper]
WebGPT: Browser-assisted question-answering with human feedback [Paper]

Distilled Dataset

WildChat: 1M ChatGPT Interaction Logs in the Wild [Paper]
Instruction Tuning with GPT-4 [Paper]
Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality [Paper]
Alpaca: A Strong, Replicable Instruction-Following Model [Paper]

Synthetic Datasets

Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models [Paper]
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing [Paper]
WizardCoder: Empowering Code Large Language Models with Evol-Instruct [Paper]
GenQA: Generating Millions of Instructions from a Handful of Prompts [Paper]
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations [Paper]
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data [Paper]
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor [Paper]
Self-Instruct: Aligning Language Models with Self-Generated Instructions [Paper]

📚 Applications

Despite the robust foundational capabilities imparted by pre-training, Large Language Models (LLMs) frequently encounter persistent limitations when deployed in specialized domains, including constrained context lengths, tendencies toward hallucination, suboptimal reasoning proficiency, and ingrained biases. These shortcomings assume critical significance in real-world applications, where precision, reliability, and ethical alignment are paramount. Such challenges prompt fundamental inquiries: (1) How can LLM performance be systematically enhanced to meet domain-specific demands? (2) What strategies can effectively mitigate the practical obstacles inherent in applied settings? Post-training emerges as a pivotal solution, augmenting LLMs’ adaptability by refining their recognition of domain-specific terminology and reasoning patterns while preserving their broad-spectrum competencies. This chapter delineates the transformative applications of post-trained LLMs across professional, technical, and interactive domains, elucidating how tailored post-training methodologies address these challenges and elevate model utility in diverse contexts.

Professional Domains

LawGPT: Knowledge-Guided Data Generation and Its Application to Legal LLM [Paper]
InternLM-Law: An Open Source Chinese Legal Large Language Model [Paper]
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain [Paper]
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences [Paper]
SoulChat: Improving LLMs' Empathy, Listening, and Comfort Abilities through Fine-tuning with Multi-turn Empathy Conversations [Paper]
BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT [Paper]
AlpaCare: Instruction-tuned Large Language Models for Medical Application [Paper]
DISC-FinLLM: A Chinese Financial Large Language Model based on Multiple Experts Fine-tuning [Paper]
DISC-LawLLM: Fine-tuning Large Language Models for Intelligent Legal Services [Paper]
Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-world Multi-turn Dialogue [Paper]
EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education [Paper]
DISC-MedLLM: Bridging General Large Language Models and Real-World Medical Consultation [Paper]
Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model [Paper]
FinGPT: Open-Source Financial Large Language Models [Paper]
Towards the Exploitation of LLM-based Chatbot for Providing Legal Support to Palestinian Cooperatives [Paper]
HuatuoGPT, towards Taming Language Model to Be a Doctor [Paper]
XuanYuan 2.0: A Large Chinese Financial Chat Model with Hundreds of Billions Parameters [Paper]
HuaTuo: Tuning LLaMA Model with Chinese Medical Knowledge [Paper]
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge [Paper]

Technical and Logical Reasoning

If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents [Paper]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models [Paper]
Llemma: An Open Language Model for Mathematics [Paper]
WizardCoder: Empowering Code Large Language Models with Evol-Instruct [Paper]
Code Llama: Open Foundation Models for Code [Paper]

Understanding and Interaction

Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks [Paper]
CogAgent: A Visual Language Model for GUI Agents [Paper]
LLaMA-Omni: Seamless Speech Interaction with Large Language Models [Paper]
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception [Paper]
Mind2Web: Towards a Generalist Agent for the Web [Paper]
LLaRA: Large Language-Recommendation Assistant [Paper]
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding [Paper]

📌 Contributing

Contributions are welcome! If you have relevant papers, code, or insights, feel free to submit a pull request.

Citation

If you find our work useful or use it in your research, please consider citing:

@inproceedings{Tie2025ASO,
  title={Large Language Models Post-training: Surveying Techniques from Alignment to Reasoning},
  author={Guiyao Tie and Zeli Zhao and Dingjie Song and Fuyang Wei and Rong Zhou and Yurou Dai and Wen Yin and Zhejian Yang and Jiangyue Yan and Yao Su and Zhenhan Dai and Yifeng Xie and Yihan Cao and Lichao Sun and Pan Zhou and Lifang He and Hechang Chen and Yu Zhang and Qingsong Wen and Tianming Liu and Neil Zhenqiang Gong and Jiliang Tang and Caiming Xiong and Heng Ji and Philip S. Yu and Jianfeng Gao},
  year={2025},
  url={https://api.semanticscholar.org/CorpusID:276902416}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
fig		fig
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Large Language Models Post-training: Surveying Techniques from Alignment to Reasoning

📌 Contents

📖 Papers

🤖 PoLMs for Fine-Tuning

Supervised Fine-Tuning

Adaptive Fine-Tuning

Reinforcement Fine-Tuning

🏆 PoLMs for Alignment

Reinforcement Learning with Human Feedback

Reinforcement Learning with AI Feedback

Direct Preference Optimization

🚀 PoLMs for Reasoning

Self-Refine for Reasoning

Reinforcement Learning for Reasoning

🧠 PoLMs for Efficiency

Model Compression

Parameter-Efficient Fine-Tuning

Knowledge-Distillation

🌀 PoLMs for Integration and Adaptation

Multi-Modal Integration

Modal Connection

Modal Encoder

Domain Adaptation

Knowledge Editing

Retrieval-Augmented Generation

Model Merging

Model Merging at Hierarchical Levels

Pre-Merging Methods

During-Merging Methods

🤝 Datasets

Human-Labeled Datasets

Distilled Dataset

Synthetic Datasets

📚 Applications

Professional Domains

Technical and Logical Reasoning

Understanding and Interaction

📌 Contributing

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages