The Four Templates

Ordered from least to most guidance. Click through to see the exact prompt format and a real generated example.

Template 1 Name Only

The minimal baseline. The LLM receives only the Bloom's level name and the subject. No definitions, no examples, no scaffolding. Tests whether the model has internalized Bloom's Taxonomy from pretraining alone.

Prompt Format
Generate an Analyze-level question about biology. Respond with ONLY the question. No explanation, no preamble.
Example Output (L4 Analyze, Biology)
"How does the structure of hemoglobin relate to its function in oxygen transport, and what would happen if the quaternary structure were disrupted?"
Target: L4 Analyze CCS Predicted: L4 Correct

Typical performance: Models average ~55% exact accuracy. Without guidance, most LLMs default to higher cognitive levels (L4-L5) regardless of the target.

Template 2 With Definition

Adds the Bloom's level definition and action verbs. The LLM now knows what cognitive process the level requires and which verbs characterize it. Tests whether explicit pedagogical scaffolding improves level targeting.

Prompt Format
Generate a question about biology at Bloom's Taxonomy Analyze level. At the Analyze level, students should be able to break material into its constituent parts and determine how the parts relate to one another. Typical action verbs: compare, contrast, examine, differentiate, categorize, organize. The question should require students to examine and break information into parts, identify motives or causes, make inferences, and find evidence. Respond with ONLY the question. No explanation, no preamble.
Example Output (L4 Analyze, Biology)
"Examine how the structure of hemoglobin molecules allows for efficient oxygen transport in the blood, and differentiate this from myoglobin's role in muscle tissue."
Target: L4 Analyze CCS Predicted: L4 Correct

Typical performance: Significant jump to ~73% exact accuracy. The definitions help models distinguish between adjacent levels (especially L2 vs L3 and L4 vs L5).

Template 3 With Exemplar

Adds a subject-specific example question at the target level. The LLM gets the definition, verbs, and a concrete demonstration of what a correct question looks like. This is the best-performing template for most models.

Prompt Format
Generate a question about biology at Bloom's Taxonomy Analyze level. At the Analyze level, students should be able to break material into its constituent parts and determine how the parts relate to one another. Typical action verbs: compare, contrast, examine, differentiate, categorize, organize. Here is an example of an Analyze-level question: "Compare and contrast the processes of mitosis and meiosis in terms of their outcomes and biological significance." Generate a DIFFERENT question at the same cognitive level about biology. Respond with ONLY the question. No explanation, no preamble.
Example Output (L4 Analyze, Biology)
"Examine how the structure of hemoglobin molecules allows for efficient oxygen transport in the blood, and differentiate this from myoglobin's role in muscle tissue."
Target: L4 Analyze CCS Predicted: L4 Confidence: 0.927

Typical performance: Best overall at ~87% exact accuracy. The exemplar anchors the model's understanding, dramatically reducing overshooting at lower levels (L1-L2).

Template 4 Chain of Thought

Asks the model to reason step-by-step about what makes a question at the target level before generating one. The LLM must first explain the cognitive process, then produce a question. Tests whether explicit reasoning improves level precision.

Prompt Format
You are an expert in Bloom's Taxonomy (Anderson & Krathwohl, 2001). Task: Generate a question about biology at the Analyze level. Step 1: Briefly explain what makes a question Analyze-level (what cognitive process it requires from students). Step 2: Generate a question about biology that specifically targets the Analyze cognitive level. Format your response as: REASONING: [your explanation] QUESTION: [the generated question]
Example Output (L4 Analyze, Biology)
"Compare the mechanisms of active transport and passive transport across cell membranes, and analyze how each contributes to cellular homeostasis."
Target: L4 Analyze CCS Predicted: L4 Correct

Typical performance: ~77% exact accuracy. Reasoning helps with harder levels (L5-L6) but can introduce overthinking on simpler ones (L1-L2). Higher latency due to two-step output.

Template Comparison

Average CCS-Control score across all models for each prompt template.

Fig 1. Average CCS-Control score by template across all evaluated models. With Exemplar consistently outperforms other templates, achieving the highest mean score. Name Only serves as the minimal baseline.

Recommendation

Use the with_exemplar template for the best CCS-Control scores. It provides the optimal balance of guidance — enough context to anchor the model's understanding of each Bloom's level, without the overthinking risk of chain-of-thought. When submitting to the CogBench leaderboard, results from any template are accepted, but we recommend evaluating with at least with_exemplar for comparability.