Mixture of Strategies: Teaching LLMs to Meta-Think • Singularity Notes

Core Idea

When solving complex problems, humans often first think about “what approach should I use to solve this problem” before actually taking action. This meta-cognition ability is key to efficient problem-solving.

Mixture of Strategies (MoS) introduces this meta-thinking capability into LLMs: before answering a question, the model first generates a “Strategy Prompt” for itself as a guideline for solving the current problem.

Mixture of Strategies Overview

Background & Motivation

The Problem: Code Generation ≠ Freeform QA

In 2023, Code LLMs like Code Llama, StarCoder, and WizardCoder demonstrated impressive capabilities in text-to-code (converting instructions into executable code) and code-to-code (refactoring existing code) tasks. However, a critical observation emerged:

Excellence in code generation does not inherently equate to comparable adeptness in freeform QA scenarios.

Freeform code QA—answering diverse, open-ended programming questions like debugging advice, API usage explanations, or architectural recommendations—remained a significant challenge for open-source Code LLMs. This gap motivated the search for methods that could harmonize both capabilities within a single model.

Prior Art: Fixed Strategies for Fixed Tasks

Before MoS, several prompting strategies had been proposed to enhance LLM reasoning:

Work	Strategy	Approach
System 2 Attention (Weston et al., 2023)	Regenerate	Regenerate input context to focus on pertinent information before responding
ChatCoder (Wang et al., 2023)	Paraphrase-and-Extend	Encourage LLMs to paraphrase and extend context for accurate generation
Orca (Mukherjee et al., 2023)	Step-by-Step	Break down reasoning into sequential steps
Orca 2 (Mitra et al., 2023)	Recall-then-Generate, Recall-Reason-Generate	First recall relevant information, then reason and generate

The critical limitation: These approaches hard-coded each strategy to a specific task type. A step-by-step strategy was always used for math; a recall-then-generate strategy was always used for knowledge-intensive questions. There was no mechanism for the model to dynamically choose the most appropriate strategy.

The Key Insight: Let LLMs Choose Their Own Strategy

The breakthrough idea behind MoS came from a simple observation about human cognition:

Experts don’t apply the same problem-solving approach to every problem. They first assess the problem, then select an appropriate methodology.

This led to two key innovations:

Adaptive Strategy Selection: Instead of hard-coding strategies to tasks, let the LLM decide which strategy best fits each specific question
Strategy Composition: Allow the LLM to combine “atomic elements” from existing strategies or even invent entirely new strategies when none of the existing ones fit

Why This Mattered in 2023 (Pre-Thinking Models)

It’s important to note that this work predates the emergence of “thinking models” like o1. In 2023, the standard paradigm was:

System prompt → fixed, human-designed
User query → variable
Response → direct answer

MoS introduced an intermediate step that was novel at the time:

Meta system prompt → instructs the model to think about strategy first
User query → variable
Strategy selection → model-generated “self system prompt”
Response → answer following the self-selected strategy

This effectively gave LLMs a form of meta-cognitive scaffolding—the ability to reason about how to reason—within the constraints of single-turn inference. The approach can be seen as an early precursor to the explicit “thinking” capabilities that would later be built into models like o1.

Method Overview

Meta-Strategies System Prompt

The core of MoS is a Meta-Strategies System Prompt that instructs the LLM:

“You are proficient in Cognitive Science. When facing questions, you should first meta-think to select the most appropriate strategy for the current problem, then solve the problem following that strategy.”

The specific Meta System Prompt structure is as follows:

📋 META SYSTEM PROMPT
You are proficient and knowledgeable in Cognitive Science,
so when you are faced with questions, youFIRST META-THINK
about the best ways and strategies for this 
PARTICULAR PROBLEM,

then tackle the problem with the best methodology.


For example, several strategy thoughts are given below:
<strategy examples>

You can combine the strategy elements of the above examples
or invent your own, if it is more suitable for a given question.


Now given the following question, please generate a strategy first,

then treat this strategy as yourSYSTEM PROMPT(instructions you
shouldFOLLOW) and generate a solution for the question.

Strategy Library (Strategy Examples)

MoS provides 6 base strategies that the LLM can select, combine, or create new ones from:

Strategy Name	Use Case	Description
Step-by-Step	Mathematical calculations, logical reasoning	Break down problems step by step, since LLMs aren’t good at complex calculations in one shot, but can handle simple single-step calculations
Recall-then-Generate	Problems requiring multi-domain knowledge	First carefully recall and collect reliable information, then synthesize this information to generate a thorough, logical, and trustworthy solution
Recall-Filter-Reason-Generate	Information overload, filtering required	Not only collect information, but also reason about which pieces are useful vs. redundant or interfering, then generate answers based on useful information only
Generate-Examine-Regenerate	Code generation, complex programming problems	First generate a draft, then examine, debug, and repair to finally produce a correct solution
Paraphrase-Extend-Generate	Abstract code requirement descriptions	Rephrase abstract problems into more detailed descriptions, considering key concepts, method purpose, input/output requirements, edge cases, possible exceptions, etc.
Custom Combination	Complex multi-faceted problems	Combine elements from above strategies, or invent new strategies based on specific problems

Workflow

The MoS method workflow consists of three phases:

1. Input Phase

Meta-Strategies System Prompt (fixed)
User’s specific question (Instruction)

2. Strategy Selection Phase

LLM analyzes problem characteristics
Selects the most appropriate strategy from the library, or combines/creates new ones
Uses the selected strategy as a “self-guiding System Prompt”

3. Answer Generation Phase

LLM follows its self-selected strategy
Generates structured response: #STRATEGY + #ANSWER

Output Format Example

🧠 #STRATEGY

To address this multifaceted problem, I will employ a
recall-filter-reason-generate strategybecause it requires
not only careful information collection but also a strong
reasoning procedure to first decide which pieces of information

are critical…

1. Recall information about[relevant topics]
2. Filter the information to identify[key points]
3. Reason about[the core problem]
4. Generate[the solution]

✅ #ANSWER

Considering the complexity of your issue…
[Concrete solution]

Key Innovations

1. Strategy as “Self System Prompt”

In traditional approaches, system prompts are designed by humans and remain fixed. MoS allows LLMs to generate task-specific system prompts for themselves, achieving dynamic adaptation to different problem types.

2. Single-Round Achieves Multi-Round Depth

MoS enables standard LLMs to simulate the depth and nuance typically associated with multi-round reflective interactions of an LLM-Agent, but within a single response cycle, making it more efficient.

3. SFT Data Construction

During instruction fine-tuning, GPT-4 is used to generate training data in MoS format:

Input: Question + Meta System Prompt
Output: Strategy selection + Final answer

This allows the fine-tuned model to internalize meta-thinking capability.

Experimental Validation

Ablation Study: Effect of Meta System Prompt

SFT Data	Inference System Prompt	FreeformQA Performance
GPT4 answer (no strategy)	Normal	42.42%
GPT4-MT-HideStrategy	Normal	42.19%
GPT4-MT-HideStrategy	Meta Think + parse strategy	37.59%
GPT4-MT-keepStrategy	Meta Think + parse strategy	48.98%

Key findings:

Training with strategy retention (keepStrategy) + Meta Think at inference = best performance
Using Meta Think only at inference without strategy in training data actually degrades performance

Ablation Study: Effect of Number of Strategies

Number of Strategies	FreeformQA Performance
0 (no strategy)	42.42%
1	45.90%
2	45.70%
3	45.32%
4	43.41%
5	46.01%
6 (all)	48.98%

Key findings:

More strategy choices generally lead to better performance
Strategy diversity allows the model to select optimal methods for different problem types

Main Results

On InfiCoder-Eval (Freeform Code QA) benchmark:

Model	Parameters	Score
GPT-3.5 (ChatGPT)	-	47.0
GPT-4 (0613)	-	64.0
Code Llama - Instruct	13B	38.7
StarCoder Prompted	15.5B	40.8
InfiCoder-1 (MoSE)	13B	51.3

InfiCoder-1 using the MoSE method surpasses GPT-3.5 with only 13B parameters and significantly outperforms open-source models of similar size.

Practical Implications

1. Prompt Engineering Level

When using LLMs, you can directly apply the MoS concept:

PYTHON
meta_prompt="""
Before answering, think about the best strategy for this problem:

-Is it a step-by-step reasoning problem?
-Do I need to recall and filter information?
-Should I generate, examine, and regenerate?

First output your strategy, then solve the problem following that strategy.

"""

2. Model Training Level

When constructing SFT data:

Have a strong model (e.g., GPT-4) generate strategy + answer for each question
Retain the strategy selection process as part of the training data
Use Meta System Prompt at inference as well

3. Agent Design Level

The MoS concept can be extended to Agent systems:

Introduce meta-strategy selection during task planning phase
Dynamically select execution strategies based on task type
Allow Agents to combine or create new problem-solving methods

Summary

The core contribution of Mixture of Strategies is formalizing and injecting human meta-cognitive ability into LLMs:

Meta-thinking first: Let LLMs think about “how to solve” before solving problems
Strategy adaptation: Dynamically select optimal strategies based on problem type
Single-round deep reasoning: Achieve multi-round reflection effects within a single response
Trainability: Internalize meta-thinking capability through SFT

This approach not only improves Code LLM performance on Freeform QA tasks, but more importantly provides a general framework for teaching LLMs to “think about how to think”.

This article summarizes the Mixture of Strategies component from an internal project called MoSE (Mixture of Strategies and Experts), developed in late 2023.