Prompt Templates — CogBench

The Four Templates

Ordered from least to most guidance. Click through to see the exact prompt format and a real generated example.

The minimal baseline. The LLM receives only the Bloom's level name and the subject. No definitions, no examples, no scaffolding. Tests whether the model has internalized Bloom's Taxonomy from pretraining alone.

Prompt Format

Generate an Analyze-level question about biology.

Respond with ONLY the question. No explanation, no preamble.

Example Output (L4 Analyze, Biology)

"How does the structure of hemoglobin relate to its function in oxygen transport, and what would happen if the quaternary structure were disrupted?"

Target: L4 Analyze CCS Predicted: L4 Correct

Typical performance: Models average ~55% exact accuracy. Without guidance, most LLMs default to higher cognitive levels (L4-L5) regardless of the target.

Adds the Bloom's level definition and action verbs. The LLM now knows what cognitive process the level requires and which verbs characterize it. Tests whether explicit pedagogical scaffolding improves level targeting.

Prompt Format

Generate a question about biology at Bloom's Taxonomy Analyze level.

At the Analyze level, students should be able to break material into its
constituent parts and determine how the parts relate to one another.
Typical action verbs: compare, contrast, examine, differentiate, categorize, organize.

The question should require students to examine and break information
into parts, identify motives or causes, make inferences, and find evidence.

Respond with ONLY the question. No explanation, no preamble.

Example Output (L4 Analyze, Biology)

"Examine how the structure of hemoglobin molecules allows for efficient oxygen transport in the blood, and differentiate this from myoglobin's role in muscle tissue."

Target: L4 Analyze CCS Predicted: L4 Correct

Typical performance: Significant jump to ~73% exact accuracy. The definitions help models distinguish between adjacent levels (especially L2 vs L3 and L4 vs L5).

Adds a subject-specific example question at the target level. The LLM gets the definition, verbs, and a concrete demonstration of what a correct question looks like. This is the best-performing template for most models.

Prompt Format

Generate a question about biology at Bloom's Taxonomy Analyze level.

At the Analyze level, students should be able to break material into its
constituent parts and determine how the parts relate to one another.
Typical action verbs: compare, contrast, examine, differentiate, categorize, organize.

Here is an example of an Analyze-level question:
"Compare and contrast the processes of mitosis and meiosis in terms of
their outcomes and biological significance."

Generate a DIFFERENT question at the same cognitive level about biology.

Respond with ONLY the question. No explanation, no preamble.

Example Output (L4 Analyze, Biology)

"Examine how the structure of hemoglobin molecules allows for efficient oxygen transport in the blood, and differentiate this from myoglobin's role in muscle tissue."

Target: L4 Analyze CCS Predicted: L4 Confidence: 0.927

Typical performance: Best overall at ~87% exact accuracy. The exemplar anchors the model's understanding, dramatically reducing overshooting at lower levels (L1-L2).

Asks the model to reason step-by-step about what makes a question at the target level before generating one. The LLM must first explain the cognitive process, then produce a question. Tests whether explicit reasoning improves level precision.

Prompt Format

You are an expert in Bloom's Taxonomy (Anderson & Krathwohl, 2001).

Task: Generate a question about biology at the Analyze level.

Step 1: Briefly explain what makes a question Analyze-level
  (what cognitive process it requires from students).

Step 2: Generate a question about biology that specifically targets
  the Analyze cognitive level.

Format your response as:
REASONING: [your explanation]
QUESTION: [the generated question]

Example Output (L4 Analyze, Biology)

"Compare the mechanisms of active transport and passive transport across cell membranes, and analyze how each contributes to cellular homeostasis."

Target: L4 Analyze CCS Predicted: L4 Correct

Typical performance: ~77% exact accuracy. Reasoning helps with harder levels (L5-L6) but can introduce overthinking on simpler ones (L1-L2). Higher latency due to two-step output.

Template Comparison

Average CCS-Control score across all models for each prompt template.

Fig 1. Average CCS-Control score by template across all evaluated models. With Exemplar consistently outperforms other templates, achieving the highest mean score. Name Only serves as the minimal baseline.

Recommendation

Use the with_exemplar template for the best CCS-Control scores. It provides the optimal balance of guidance — enough context to anchor the model's understanding of each Bloom's level, without the overthinking risk of chain-of-thought. When submitting to the CogBench leaderboard, results from any template are accepted, but we recommend evaluating with at least with_exemplar for comparability.