Knowledge Resource

Learning Evaluation Framework Design

An exploration of the knowledge, considerations, and outcomes that characterise workplace learning evaluation — shaped by each organisation's learners, context, and culture.

For L&D professionals, HR practitioners, and organisational leaders. Explore at whatever depth is useful.

Five knowledge foundations

Five interconnected bodies of knowledge inform evaluation framework design. Together, they support evaluations that go well beyond what any single model or checklist can offer.

Evaluation theory and models

Kirkpatrick, Phillips ROI, Brinkerhoff's Success Case Method, and CIPP each reflect the context in which they were developed. Understanding this lineage helps practitioners recognise the assumptions embedded in any model — and consider how well those assumptions fit their own setting.

Measurement theory and psychometrics

Validity, reliability, and the distinction between statistical and practical significance are the difference between data that is trustworthy and data that is simply abundant. These concepts inform how well evaluation instruments measure what they are intended to measure.

Research methods

Quantitative and qualitative methods each answer different questions about what learning produces and why. The proportions depend on the evaluation questions, the learner community, and what each approach can realistically offer in the given setting.

Adult learning and transfer research

Grounding in how adults learn and transfer knowledge shapes what evaluation looks for. Research on transfer conditions — including support structures, practice opportunities, and organisational climate — is particularly relevant when evaluation extends beyond immediate outcomes.

Data literacy and organisational context

Statistical fluency supports honest interpretation and communication of findings. Equally important is understanding the organisational context — who holds the data, who has a stake in the findings, and which decisions are genuinely on the table all shape what evaluation is possible and appropriate.

Expertise versus checklist compliance

Applying an established model mechanically is not the same as designing a framework. Considered design requires understanding which distinctions matter in a specific context, which levels of evaluation are genuinely answerable given real constraints, and what is gained or lost when particular elements are adapted or set aside. That judgement belongs to the people closest to the learners and the work.

Equity and decolonising data

Evaluation frameworks carry embedded assumptions about whose knowledge counts as evidence and who defines what success looks like. Scholarship in decolonising research methods — including work on Indigenous data governance, culturally responsive design, and participatory approaches — offers frameworks for examining and challenging these assumptions. Central to this work is the recognition that community members are holders of knowledge, not simply providers of data.

Design considerations

Every evaluation framework is shaped by the organisation it serves — its learners, its culture, and its particular context. What follows describes considerations that practitioners commonly navigate. They are not a sequence or a prescription, but an orientation to the kinds of questions that thoughtful framework design tends to address.

A framework designed to demonstrate regulatory accountability looks quite different from one oriented toward program improvement, strategic workforce decisions, or building evaluation capacity. These purposes can coexist, but they generate different priorities — and clarity about which matters most shapes every subsequent design choice.

Scope decisions — which programs, which levels, which learner communities — carry real consequences for what a framework can and cannot address. Making these choices explicitly, rather than defaulting to what is easiest to measure, is itself a meaningful act of design.

Questions this consideration addresses

  • Why is evaluation being undertaken, and for whom?
  • Which programs or interventions are in scope — and which are not?
  • Whose decisions will the findings inform?
  • What resources and timelines are realistic?

Frameworks without clarity on purpose risk producing findings that no one quite knows how to use.

Established models offer different ways of organising evaluation levels. No structure fits every context. The levels a framework includes — and the methods used within each — reflect the programs being evaluated, the data available, and what the organisation can credibly claim to measure.

Method selection involves the same contextual judgement. Surveys, assessments, observations, interviews, and focus groups each suit different questions and different communities. The choice between more and less resource-intensive approaches involves real tradeoffs — frameworks that make those tradeoffs visible make it easier to revisit them when circumstances change.

Questions this consideration addresses

  • Which distinctions between outcome types matter in this context?
  • What methods are appropriate for the learner population and program type?
  • What evidence quality is needed to support the decisions being made?
  • What is actually measurable given available data and access?

Data collection spans the learning lifecycle and involves learners, managers, and sometimes wider communities. The design of instruments and processes reflects both technical choices — what to ask, how, and when — and relational ones: who is being asked to share something of themselves, under what conditions, and with what understanding of how that information will be used and protected.

When learners understand why evaluation is happening and trust that their input matters, the resulting data tends to be more honest and more useful. Instruments that do not attend to these relational dimensions — or that are not culturally appropriate for the communities involved — risk producing data that reflects the instrument, not the learner.

Questions this consideration addresses

  • Who are the appropriate sources of data for each evaluation question?
  • How will participants understand the purpose and use of evaluation data?
  • Are instruments culturally appropriate and free of unexamined assumptions?
  • What are the ethical obligations to people providing data?

Combining quantitative and qualitative approaches tends to produce more complete understanding than either alone — particularly when the question is not only whether outcomes occurred, but what enabled or constrained them.

Analysis methods are chosen in relation to the data type and the evaluation question. The more consequential work is interpretation: situating findings in context, distinguishing statistical from practical significance, and being clear about what the data does and does not support.

A shift in scores might reflect the instrument, the curriculum, the delivery, the learning environment, or the particular community of learners. Identifying the most plausible explanation requires judgement grounded in the specific setting. This is where knowledge of measurement, adult learning, and organisational context matters most — and where respect for the complexity of people's experiences is essential.

Questions this consideration addresses

  • What analysis approach is appropriate for each data type and question?
  • What do the findings actually support as a conclusion — and what do they not?
  • What alternative explanations for observed patterns exist?
  • How do quantitative and qualitative findings relate to each other?

Findings serve different audiences with different needs. Those designing programs need insight into what is and is not working. Leadership needs strategic framing and clarity on what systemic questions the findings raise. A single report that tries to serve all audiences at once tends to serve none of them well.

Findings that are specific, contextualised, and honest about their limitations tend to be more trusted and more used. Connecting findings to decisions — rather than simply presenting data — makes it more likely that evaluation leads somewhere.

Questions this consideration addresses

  • Who needs to receive findings, and what will they do with them?
  • What format and level of detail serves each audience?
  • What decisions do the findings bear on, and how directly?
  • How are limitations and uncertainties communicated honestly?

Evaluation that does not inform anything is documentation. The value of evaluation lies in its integration — findings shaping program design, surfacing transfer barriers, or prompting more fundamental questions about what is being offered to learners and why. Frameworks that build in explicit connections between evaluation and design decisions tend to improve both over time.

Each organisation finds its own approach to this integration, shaped by its culture and governance. Brinkerhoff's Success Case Method offers one documented approach: identifying learners who achieved meaningful outcomes, understanding what conditions supported them, and sharing that understanding across the organisation in ways that honour their experience.

Questions this consideration addresses

  • How do evaluation findings reach the people who design and deliver programs?
  • What mechanisms exist for findings to inform decisions about program changes?
  • How is the framework itself reviewed and adapted as the organisation learns?
  • How is evaluation knowledge built and retained across the L&D team?

Key questions in framework design

Every evaluation framework reflects choices about scope, depth, timing, and whose perspectives are included. These choices are not universal — they are made in relation to each organisation's learners, programs, culture, and the decisions the framework is meant to serve.

What is evaluation for?

Purpose shapes every other design decision

  • Demonstrating accountability to regulators or funders
  • Improving program quality through ongoing evidence
  • Supporting strategic workforce and capability decisions
  • Building evaluation capacity within the L&D function

What is in scope?

Coverage reflects organisational priorities

  • Specific programs or program types
  • Particular learner populations
  • Certain outcome levels (immediate, behavioural, organisational)
  • Specific phases of the learning lifecycle

How much evidence is enough?

Evidence quality involves explicit tradeoffs

  • What decisions will this evidence support?
  • What confidence level is sufficient?
  • What resources does more rigorous evidence require?
  • What is lost when less rigorous approaches are used?

When does evaluation happen?

Timing determines what questions are answerable

  • Before development — informing design choices
  • During delivery — monitoring engagement and learning
  • Immediately after — assessing immediate outcomes
  • Over time — tracking transfer and performance change

Whose perspectives are included?

Inclusion is a design choice, not a default

  • Learners — as data sources, participants, or co-designers
  • Managers and supervisors
  • Subject matter experts and facilitators
  • Community members affected by the learning

Who uses the findings?

Use determines what reporting needs to look like

  • Instructional designers making program decisions
  • L&D and HR leadership overseeing programs
  • Executives and boards making investment decisions
  • Learners and communities whose outcomes are being evaluated
On standardised frameworks and contextual fit

Established models offer useful vocabulary and structure. They were also developed in particular organisational and cultural contexts, and carry those contexts' assumptions. Organisations that adapt models thoughtfully — making explicit what they are keeping, changing, and setting aside, and why — tend to produce evaluation that is more relevant to their own learners and more credible to the communities they serve.

There is no universally correct number of evaluation levels, no single best method, and no template that carries unchanged from one context to another. The value of engaging with the field lies in developing the judgement to design something that genuinely fits — the learners, the work, and the organisation.

What well-designed evaluation enables

Contextually appropriate evaluation tends to produce change at two horizons — some visible relatively quickly, others that take hold over time as evaluation becomes part of how an organisation understands and supports its learners.

Shorter term
What becomes visible

Evidence replaces assumption

Evaluation surfaces what learners are experiencing, knowing, and able to do — offering a more grounded picture than assumption or inference alone.

Investment is directed purposefully

Understanding which programs produce which outcomes for which learners makes it possible to direct resources where they are most needed.

Transfer barriers become visible

Evaluation beyond immediate outcomes often reveals that learning occurred but did not carry into practice — pointing toward environmental, relational, or structural factors that programs alone cannot resolve.

Evaluation competence grows

Practitioners who engage seriously with evaluation develop expertise in measurement, analysis, and evidence communication — capacities that strengthen their broader professional practice.

Longer term
What becomes embedded

Evaluation is part of design, not after it

Where evaluation is woven into program design from the beginning, learning from evidence becomes continuous. Programs tend to improve more quickly, and the cycle of design-deliver-evaluate becomes genuinely iterative.

L&D earns strategic credibility

Organisations able to speak credibly about learning impact — with evidence grounded in their own context — are better positioned to advocate for their learners in decisions about investment, policy, and workplace conditions.

Equity gaps become visible and addressable

Frameworks that disaggregate findings and centre diverse perspectives surface inequities that aggregate data tends to obscure — making it possible to address them with the specificity and care they deserve.

Organisational learning compounds

Careful documentation of what supports learning — for whom, and under what conditions — builds knowledge that accumulates across programs and cycles, informing future design with hard-won understanding.

Why distinctions between outcome levels matter

Models that compress meaningfully different outcomes into a single category can obscure where a program is succeeding and where it is not. Finer distinctions — between, for example, knowledge acquired, competence demonstrated under realistic conditions, and behaviour applied in practice — make it possible to locate a gap more precisely and respond more thoughtfully.

A learner may report a positive experience without having acquired the intended knowledge. Acquired knowledge does not guarantee the ability to apply it under realistic conditions. Demonstrated competence in a controlled assessment does not guarantee that learning carries into everyday work. Each gap is meaningful, and each points toward a different kind of response. Evaluation that can distinguish between them is better placed to serve learners and organisations well.

Evaluation framework lineage

Workplace learning evaluation frameworks have developed over seven decades. Each model built on — and sometimes pushed back against — what came before. Understanding this lineage helps practitioners read the assumptions embedded in any model they work with.

Model Core contribution Structure Limitations
Kirkpatrick (1959)
The foundational model
Established a four-level structure — reaction, learning, behaviour, results — and a shared vocabulary that has shaped the field ever since. 4 levels Conflates outcomes that may be meaningfully distinct. Does not address transfer conditions, financial return, or broader social impact. Causality between levels is assumed rather than demonstrated.
Phillips ROI (1990s)
Financial accountability
Added a fifth level expressing return on investment in monetary terms, making the financial case for L&D investment explicit. 5 levels Resource-intensive. Requires attribution assumptions that are often difficult to substantiate. Most appropriate where isolating the financial impact of a specific program is genuinely feasible.
Kaufman's Five Levels
Societal accountability
Extended the framework to include societal and community impact, arguing that organisations exist in and affect broader social contexts. 5 levels including society Societal impact is difficult to attribute to specific programs with confidence. Few organisations have the data infrastructure to evaluate at this level reliably.
Brinkerhoff's Success Case Method
Narrative evidence
Shifted attention from average outcomes to individual experience — using structured interviews to understand what conditions enabled notable success or contributed to limited impact. 6 stages Does not offer population-level data. Particularly valuable alongside quantitative methods when understanding enabling conditions matters as much as measuring outcomes.
CIPP Model (Stufflebeam)
Comprehensive program evaluation
Evaluated Context, Input, Process, and Product — attending to the conditions that shape outcomes, not only the outcomes themselves. 4 types Resource-intensive. Developed for educational program evaluation; less commonly adapted to workplace L&D settings.
On choosing models: No published model is universally appropriate. Each was developed in a particular context and carries the assumptions of that context. Understanding the lineage supports more informed decisions about what to carry forward, what to adapt, and what to set aside — in service of learners and the communities they belong to.
Kraiger's three-dimensional framework

Kraiger, Ford, and Salas (1993) proposed a taxonomy of learning outcomes across three dimensions: cognitive (knowledge and understanding), skill-based (procedural competence), and affective (attitudes, motivation, confidence). Each dimension requires different measurement approaches and responds to different conditions in the learning environment. Affective outcomes are frequently underemphasised in published frameworks, despite evidence that shifts in learner confidence and motivation can be stronger predictors of on-the-job transfer than knowledge gains alone.

Reflection on current practice

This reflection surfaces questions about evaluation practice. There are no right or wrong answers — choose the response that most honestly reflects the current situation. Takes approximately two minutes.

1. How does your organisation currently approach learning evaluation?
2. How does the L&D team currently relate to data and evidence?
3. To what extent do learners and other stakeholders inform evaluation design?
4. Where does evaluation responsibility currently sit?

Questions worth sitting with

    Academic citations

    Sources informing the knowledge claims and frameworks discussed in this resource, grouped by area.

    Evaluation theory and models

    Primary model

    Kirkpatrick, D.L. & Kirkpatrick, J.D. Evaluating Training Programs: The Four Levels. Berrett-Koehler, 1994 (revised 2006).
    ROI methodology

    Phillips, J.J. Return on Investment in Training and Performance Improvement Programs. Gulf Publishing, 1997.
    Success case method

    Brinkerhoff, R.O. The Success Case Method. Berrett-Koehler, 2003.
    CIPP model

    Stufflebeam, D.L. & Shinkfield, A.J. Systematic Evaluation. Springer, 1985.

    Measurement theory and psychometrics

    Three-dimensional framework

    Kraiger, K., Ford, J.K., & Salas, E. "Application of cognitive, skill-based, and affective theories of learning outcomes to new methods of training evaluation." Journal of Applied Psychology, 78(2), 311-328, 1993.
    Scale development

    DeVellis, R.F. Scale Development: Theory and Applications. SAGE Publications, 2016.

    Research methods

    Mixed methods

    Creswell, J.W. & Plano Clark, V.L. Designing and Conducting Mixed Methods Research. SAGE Publications, 2017.
    Qualitative methods

    Patton, M.Q. Qualitative Research and Evaluation Methods. SAGE Publications, 2014.

    Adult learning and transfer

    Transfer conditions

    Baldwin, T.T. & Ford, J.K. "Transfer of training: A review and directions for future research." Personnel Psychology, 41(1), 63-105, 1988.
    Adult learning

    Merriam, S.B., Caffarella, R.S., & Baumgartner, L.M. Learning in Adulthood: A Comprehensive Guide. Jossey-Bass, 2007.

    Equity and decolonising data

    Decolonising research methods

    Quinless, J.M. Decolonizing Data: Unsettling Conversations about Social Research Methods. University of Toronto Press, 2022. ISBN 9781487523336. utppublishing.com
    Equitable data practice

    Zheng, L. FAIR Framework for equitable data practice. lilyzheng.co/fair-framework
    Indigenous data governance

    First Nations Information Governance Centre. OCAP Principles (Ownership, Control, Access, Possession). fnigc.ca

    This resource draws on scholarship in evaluation theory, psychometrics, research methods, adult learning, and decolonising data. It does not constitute professional or legal advice.