Learning Evaluation Framework Design — Knowledge Resource (Accessible linear version)

Knowledge foundations

Five interconnected bodies of knowledge inform evaluation framework design. Together, they support evaluations that go well beyond what any single model or checklist can offer.

1. Evaluation theory and models

Kirkpatrick, Phillips ROI, Brinkerhoff's Success Case Method, and CIPP each reflect the context in which they were developed. Understanding this lineage helps practitioners recognise the assumptions embedded in any model — and consider how well those assumptions fit their own setting.

2. Measurement theory and psychometrics

Validity, reliability, and the distinction between statistical and practical significance are the difference between data that is trustworthy and data that is simply abundant. These concepts inform how well evaluation instruments measure what they are intended to measure.

3. Research methods

Quantitative and qualitative methods each answer different questions about what learning produces and why. The proportions depend on the evaluation questions, the learner community, and what each approach can realistically offer in the given setting.

4. Adult learning and transfer research

Grounding in how adults learn and transfer knowledge shapes what evaluation looks for. Research on transfer conditions — including support structures, practice opportunities, and organisational climate — is particularly relevant when evaluation extends beyond immediate outcomes.

5. Data literacy and organisational context

Statistical fluency supports honest interpretation and communication of findings. Equally important is understanding the organisational context — who holds the data, who has a stake in the findings, and which decisions are genuinely on the table all shape what evaluation is possible and appropriate.

Expertise versus checklist compliance

Applying an established model mechanically is not the same as designing a framework. Considered design requires understanding which distinctions matter in a specific context, which levels of evaluation are genuinely answerable given real constraints, and what is gained or lost when particular elements are adapted or set aside. That judgement belongs to the people closest to the learners and the work.

Equity and decolonising data

Evaluation frameworks carry embedded assumptions about whose knowledge counts as evidence and who defines what success looks like. Scholarship in decolonising research methods — including work on Indigenous data governance, culturally responsive design, and participatory approaches — offers frameworks for examining and challenging these assumptions. Central to this work is the recognition that community members are holders of knowledge, not simply providers of data.

Design considerations

Every evaluation framework is shaped by the organisation it serves — its learners, its culture, and its particular context. What follows describes considerations that practitioners commonly navigate. They are not a sequence or a prescription, but an orientation to the kinds of questions that thoughtful framework design tends to address.

Purpose and scope

What the framework is for, and what it covers

A framework designed to demonstrate regulatory accountability looks quite different from one oriented toward program improvement, strategic workforce decisions, or building evaluation capacity. These purposes can coexist, but they generate different priorities — and clarity about which matters most shapes every subsequent design choice.

Scope decisions — which programs, which levels, which learner communities — carry real consequences for what a framework can and cannot address. Making these choices explicitly, rather than defaulting to what is easiest to measure, is itself a meaningful act of design.

Questions this consideration addresses

Why is evaluation being undertaken, and for whom?
Which programs or interventions are in scope — and which are not?
Whose decisions will the findings inform?
What resources and timelines are realistic?

Frameworks without clarity on purpose risk producing findings that no one quite knows how to use.

Evaluation levels and method selection

What gets measured, and how

Established models offer different ways of organising evaluation levels. No structure fits every context. The levels a framework includes — and the methods used within each — reflect the programs being evaluated, the data available, and what the organisation can credibly claim to measure.

Method selection involves the same contextual judgement. Surveys, assessments, observations, interviews, and focus groups each suit different questions and different communities. The choice between more and less resource-intensive approaches involves real tradeoffs — frameworks that make those tradeoffs visible make it easier to revisit them when circumstances change.

Questions this consideration addresses

Which distinctions between outcome types matter in this context?
What methods are appropriate for the learner community and program type?
What evidence quality is needed to support the decisions being made?
What is actually measurable given available data and access?

Data collection design

How information is gathered, and from whom

Data collection spans the learning lifecycle and involves learners, managers, and sometimes wider communities. The design of instruments and processes reflects both technical choices — what to ask, how, and when — and relational ones: who is being asked to share something of themselves, under what conditions, and with what understanding of how that information will be used and protected.

When learners understand why evaluation is happening and trust that their input matters, the resulting data tends to be more honest and more useful. Instruments that do not attend to these relational dimensions — or that are not culturally appropriate for the communities involved — risk producing data that reflects the instrument, not the learner.

Questions this consideration addresses

Who are the appropriate sources of data for each evaluation question?
How will participants understand the purpose and use of evaluation data?
Are instruments culturally appropriate and free of unexamined assumptions?
What are the ethical obligations to people providing data?

Combining quantitative and qualitative approaches tends to produce more complete understanding than either alone — particularly when the question is not only whether outcomes occurred, but what enabled or constrained them.

Analysis and interpretation

Making sense of findings in context

Analysis methods are chosen in relation to the data type and the evaluation question. The more consequential work is interpretation: situating findings in context, distinguishing statistical from practical significance, and being clear about what the data does and does not support.

A shift in scores might reflect the instrument, the curriculum, the delivery, the learning environment, or the particular community of learners. Identifying the most plausible explanation requires judgement grounded in the specific setting. This is where knowledge of measurement, adult learning, and organisational context matters most — and where respect for the complexity of people's experiences is essential.

Questions this consideration addresses

What analysis approach is appropriate for each data type and question?
What do the findings actually support as a conclusion — and what do they not?
What alternative explanations for observed patterns exist?
How do quantitative and qualitative findings relate to each other?

Reporting and communication

Translating evidence into something different audiences can use

Findings serve different audiences with different needs. Those designing programs need insight into what is and is not working. Leadership needs strategic framing and clarity on what systemic questions the findings raise. A single report that tries to serve all audiences at once tends to serve none of them well.

Findings that are specific, contextualised, and honest about their limitations tend to be more trusted and more used. Connecting findings to decisions — rather than simply presenting data — makes it more likely that evaluation leads somewhere.

Questions this consideration addresses

Who needs to receive findings, and what will they do with them?
What format and level of detail serves each audience?
What decisions do the findings bear on, and how directly?
How are limitations and uncertainties communicated honestly?

Continuous improvement and integration

How evaluation findings feed back into program and framework design

Evaluation that does not inform anything is documentation. The value of evaluation lies in its integration — findings shaping program design, surfacing transfer barriers, or prompting more fundamental questions about what is being offered to learners and why. Frameworks that build in explicit connections between evaluation and design decisions tend to improve both over time.

Each organisation finds its own approach to this integration, shaped by its culture and governance. Brinkerhoff's Success Case Method offers one documented approach: identifying learners who achieved meaningful outcomes, understanding what conditions supported them, and sharing that understanding across the organisation in ways that honour their experience.

Questions this consideration addresses

How do evaluation findings reach the people who design and deliver programs?
What mechanisms exist for findings to inform decisions about program changes?
How is the framework itself reviewed and adapted as the organisation learns?
How is evaluation knowledge built and retained across the L&D team?

Key questions in framework design

Every evaluation framework reflects choices about scope, depth, timing, and whose perspectives are included. These choices are not universal — they are made in relation to each organisation's learners, programs, culture, and the decisions the framework is meant to serve.

What is evaluation for?

Purpose shapes every other design decision

Demonstrating accountability to regulators or funders
Improving program quality through ongoing evidence
Supporting strategic workforce and capability decisions
Building evaluation capacity within the L&D function

What is in scope?

Coverage reflects organisational priorities

Specific programs or program types
Particular learner communities
Certain outcome levels (immediate, behavioural, organisational)
Specific phases of the learning lifecycle

How much evidence is enough?

Evidence quality involves explicit tradeoffs

What decisions will this evidence support?
What confidence level is sufficient?
What resources does more rigorous evidence require?
What is lost when less rigorous approaches are used?

When does evaluation happen?

Timing determines what questions are answerable

Before development — informing design choices
During delivery — monitoring engagement and learning
Immediately after — assessing immediate outcomes
Over time — tracking transfer and performance change

Whose perspectives are included?

Inclusion is a design choice, not a default

Learners — as data sources, participants, or co-designers
Managers and supervisors
Subject matter experts and facilitators
Community members affected by the learning

Who uses the findings?

Use determines what reporting needs to look like

Instructional designers making program decisions
L&D and HR leadership overseeing programs
Executives and boards making investment decisions
Learners and communities whose outcomes are being evaluated

On standardised frameworks and contextual fit

Established models offer useful vocabulary and structure. They were also developed in particular organisational and cultural contexts, and carry those contexts' assumptions. Organisations that adapt models thoughtfully — making explicit what they are keeping, changing, and setting aside, and why — tend to produce evaluation that is more relevant to their own learners and more credible to the communities they serve.

There is no universally correct number of evaluation levels, no single best method, and no template that carries unchanged from one context to another. The value of engaging with the field lies in developing the judgement to design something that genuinely fits — the learners, the work, and the organisation.

What well-designed evaluation enables

Contextually appropriate evaluation tends to produce change at two horizons — some visible relatively quickly, others that take hold over time as evaluation becomes part of how an organisation understands and supports its learners.

What becomes visible

Evidence replaces assumption

Evaluation surfaces what learners are experiencing, knowing, and able to do — offering a more grounded picture than assumption or inference alone.

Investment is directed purposefully

Understanding which programs produce which outcomes for which learners makes it possible to direct resources where they are most needed.

Transfer barriers become visible

Evaluation beyond immediate outcomes often reveals that learning occurred but did not carry into practice — pointing toward environmental, relational, or structural factors that programs alone cannot resolve.

Evaluation competence grows

Practitioners who engage seriously with evaluation develop expertise in measurement, analysis, and evidence communication — capacities that strengthen their broader professional practice.

What becomes embedded

Evaluation is part of design, not after it

Where evaluation is woven into program design from the beginning, learning from evidence becomes continuous. Programs tend to improve more quickly, and the cycle of design-deliver-evaluate becomes genuinely iterative.

L&D earns strategic credibility

Organisations able to speak credibly about learning impact — with evidence grounded in their own context — are better positioned to advocate for their learners in decisions about investment, policy, and workplace conditions.

Equity gaps become visible and addressable

Frameworks that disaggregate findings and centre diverse perspectives surface inequities that aggregate data tends to obscure — making it possible to address them with the specificity and care they deserve.

Organisational learning compounds

Careful documentation of what supports learning — for whom, and under what conditions — builds knowledge that accumulates across programs and cycles, informing future design with hard-won understanding.

Why distinctions between outcome levels matter

Models that compress meaningfully different outcomes into a single category can obscure where a program is succeeding and where it is not. Finer distinctions — between, for example, knowledge acquired, competence demonstrated under realistic conditions, and behaviour applied in practice — make it possible to locate a gap more precisely and respond more thoughtfully.

A learner may report a positive experience without having acquired the intended knowledge. Acquired knowledge does not guarantee the ability to apply it under realistic conditions. Demonstrated competence in a controlled assessment does not guarantee that learning carries into everyday work. Each gap is meaningful, and each points toward a different kind of response. Evaluation that can distinguish between them is better placed to serve learners and organisations well.

Evaluation framework lineage

Workplace learning evaluation frameworks have developed over seven decades. Each model built on — and sometimes pushed back against — what came before. Understanding this lineage helps practitioners read the assumptions embedded in any model they work with.

Published evaluation frameworks: contribution, structure, and limitations
Model	Core contribution	Structure	Limitations
Kirkpatrick (1959)The foundational model	Established a four-level structure — reaction, learning, behaviour, results — and a shared vocabulary that has shaped the field ever since.	4 levels	Conflates outcomes that may be meaningfully distinct. Does not address transfer conditions, financial return, or broader social impact. Causality between levels is assumed rather than demonstrated.
Phillips ROI (1990s)Financial accountability	Added a fifth level expressing return on investment in monetary terms, making the financial case for L&D investment explicit.	5 levels	Resource-intensive. Requires attribution assumptions that are often difficult to substantiate. Most appropriate where isolating the financial impact of a specific program is genuinely feasible.
Kaufman's Five LevelsSocietal accountability	Extended the framework to include societal and community impact, arguing that organisations exist in and affect broader social contexts.	5 levels including society	Societal impact is difficult to attribute to specific programs with confidence. Few organisations have the data infrastructure to evaluate at this level reliably.
Brinkerhoff's Success Case MethodNarrative evidence	Shifted attention from average outcomes to individual experience — using structured interviews to understand what conditions enabled notable success or contributed to limited impact.	6 stages	Does not offer population-level data. Particularly valuable alongside quantitative methods when understanding enabling conditions matters as much as measuring outcomes.
CIPP Model (Stufflebeam)Comprehensive program evaluation	Evaluated Context, Input, Process, and Product — attending to the conditions that shape outcomes, not only the outcomes themselves.	4 types	Resource-intensive. Developed for educational program evaluation; less commonly adapted to workplace L&D settings.

On choosing models: No published model is universally appropriate. Each was developed in a particular context and carries the assumptions of that context. Understanding the lineage supports more informed decisions about what to carry forward, what to adapt, and what to set aside — in service of learners and the communities they belong to.

Kraiger's three-dimensional framework

Kraiger, Ford, and Salas (1993) proposed a taxonomy of learning outcomes across three dimensions: cognitive (knowledge and understanding), skill-based (procedural competence), and affective (attitudes, motivation, confidence). Each dimension requires different measurement approaches and responds to different conditions in the learning environment. Affective outcomes are frequently underemphasised in published frameworks, despite evidence that shifts in learner confidence and motivation can be stronger predictors of on-the-job transfer than knowledge gains alone.

Reflection on current practice

This reflection surfaces questions about evaluation practice. There are no right or wrong answers — choose the response that most honestly reflects the current situation. Takes approximately two minutes.

1. How does your organisation currently approach learning evaluation? We do not conduct formal evaluation of learning programs We collect end-of-course feedback for most programs We conduct multi-level evaluation for some programs using surveys and follow-up methods Evaluation is integrated into the instructional design process and covers multiple levels systematically

2. How does the L&D team currently relate to data and evidence? We rely primarily on participant feedback and completion rates We track some metrics but do not analyse or act on them systematically We analyse evaluation data and use it to inform program improvements We use mixed-methods evidence to drive design decisions and communicate impact to leadership

3. To what extent do learners and other stakeholders inform evaluation design? Instruments are designed internally without input from learners or managers We gather informal feedback when issues are raised We conduct structured follow-up with learners and managers at key intervals Learners, managers, and stakeholders help shape evaluation questions, instruments, and interpretation

4. Where does evaluation responsibility currently sit? It is unclear or handled informally by whoever delivers training One or two people manage it alongside other responsibilities A small team has defined evaluation roles with some integration into L&D workflows Evaluation is embedded in governance, program design, and shared across the L&D team

Citations

Sources informing the knowledge claims and frameworks discussed in this resource, grouped by area.

Evaluation theory and models

Primary model
Kirkpatrick, D.L. & Kirkpatrick, J.D. Evaluating Training Programs: The Four Levels. Berrett-Koehler, 1994 (revised 2006).

ROI methodology
Phillips, J.J. Return on Investment in Training and Performance Improvement Programs. Gulf Publishing, 1997.

Success case method
Brinkerhoff, R.O. The Success Case Method. Berrett-Koehler, 2003.

CIPP model
Stufflebeam, D.L. & Shinkfield, A.J. Systematic Evaluation. Springer, 1985.

Measurement theory and psychometrics

Three-dimensional framework
Kraiger, K., Ford, J.K., & Salas, E. “Application of cognitive, skill-based, and affective theories of learning outcomes to new methods of training evaluation.” Journal of Applied Psychology, 78(2), 311–328, 1993.

Scale development
DeVellis, R.F. Scale Development: Theory and Applications. SAGE Publications, 2016.

Research methods

Mixed methods
Creswell, J.W. & Plano Clark, V.L. Designing and Conducting Mixed Methods Research. SAGE Publications, 2017.

Qualitative methods
Patton, M.Q. Qualitative Research and Evaluation Methods. SAGE Publications, 2014.

Adult learning and transfer

Transfer conditions
Baldwin, T.T. & Ford, J.K. “Transfer of training: A review and directions for future research.” Personnel Psychology, 41(1), 63–105, 1988.

Adult learning
Merriam, S.B., Caffarella, R.S., & Baumgartner, L.M. Learning in Adulthood: A Comprehensive Guide. Jossey-Bass, 2007.

Equity and decolonising data

Decolonising research methods
Quinless, J.M. Decolonizing Data: Unsettling Conversations about Social Research Methods. University of Toronto Press, 2022. ISBN 9781487523336. utppublishing.com

Equitable data practice
Zheng, L. FAIR Framework for equitable data practice. lilyzheng.co/fair-framework

Indigenous data governance
First Nations Information Governance Centre. OCAP Principles (Ownership, Control, Access, Possession). fnigc.ca

This resource draws on scholarship in evaluation theory, psychometrics, research methods, adult learning, and decolonising data. It does not constitute professional or legal advice.