research in practice analysis and critical thinking in assessment

Social Care Online

The Social Care Online website/service has now closed down, it closed in March 2024 and the data was last updated in early 2023.

Social Care Online was first launched in May 2005 by the Social Care Institute for Excellence (SCIE). Previously it was known as the Electronic Library for Social Care (eLSC). Content originated from the National Institute for Social Work library and included resources dating back to the 1980s.

Social Care Online’s bibliographic data will continue to be available via the combined  Social Policy and Practice database  a subscription-based service. Please refer to your organisation’s information and library service for further information about this database and alternatives to Social Care Online relevant to your research interests.

Freely available alternative sources of information include:

  • Kings Fund Library Database  for information relating to health and social care management and policy, systems, services and leadership.
  • NSPCC Library catalogue  for safeguarding, child protection, child abuse and neglect.
  • AgeInfo  – information resource provided by the Centre for Policy on Ageing Library Service, covering social gerontology.

Stay up to date with the latest from SCIE

Assessment of Critical Thinking

  • First Online: 10 December 2023

Cite this chapter

Book cover

  • Dirk Jahn 3 &
  • Michael Cursio 4  

87 Accesses

The term “to assess” has various meanings, such as to judge, evaluate, estimate, gauge, or determine. Assessment is therefore a diagnostic inventory of certain characteristics of a section of observable reality on the basis of defined criteria. In a pedagogical context, assessments aim to make learners’ knowledge, skills, or attitudes observable in certain application situations and to assess them on the basis of observation criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

To give an example: Holistic Critical Thinking Rubric from East Georgia College; Available at https://studylib.net/doc/7608742/east-georgia-college-holistic-critical-thinking-rubric-cr… (04/03/2020).

Astleitner, H. (1998). Kritisches Denken. Basisqualifikation für Lehrer und Ausbildner . Studien.

Google Scholar  

Biggs, J. (2003). Aligning teaching and assessment to curriculum objectives . https://www.heacademy.ac.uk/sites/default/files/biggs-aligning-teaching-and-assessment.pdf . Accessed 21 Apr 2015.

Brookfield, S. (2003). Critical thinking in adulthood. In D. J. Fasko & D. J. Fasko (Eds.), Critical thinking and reasoning. Current research, theory, and practice (pp. 143–163). Hampton Press.

Ennis, R. H. (2003). Critical thinking assessment. In D. Fasko (Ed.), Critical thinking and reasoning. Current research, theory, and practice (pp. 293–314). Hampton Press.

Garrison, D. R. (1992). Critical thinking and self-directed learning in adult education: an analysis of responsibility and control issues. Adult Education Quarterly, 42 (3), 136–148.

Article   Google Scholar  

Garrison, D. R., & Anderson, T. (2003). E-learning in the 21st century. A framework for research and practice . Routledge.

Book   Google Scholar  

Grotjahn, R. (1999). Testtheorie: Grundzüge und Anwendung in der Praxis. Materialien Deutsch als Fremdsprache, 53 , 304–341.

Halpern, D. F. (2003). The “how” and “why” of critical thinking assessment. In D. Fasko (Ed.), Critical thinking and reasoning: Current research, theory and practice . Hampton Press.

Handke, J., & Schäfer, A. M. (2012). E-learning, E-teaching und E-assessment in der Hochschullehre: Eine Anleitung: Eine Anleitung . Oldenbourg.

Ingenkamp, K. (1985). Lehrbuch der Pädagogischen Diagnostik . Beltz Verlag.

Jahn, D. (2012a). Kritisches Denken fördern können. Entwicklung eines didaktischen Designs zur Qualifizierung pädagogischer Professionals . Shaker.

Landis, M., Swain, K. D., Friehe, M. J., & Coufal, K. L. (2007). Evaluating critical thinking in class and online: Comparison of the Newman method and the Facione Rubric. Teacher Education Quarterly, 34 (4), 121–136.

Newman, D. R., Webb, B., & Cochrane, C. (1995). A content analysis method to measure critical thinking in face-to-face and computer supported group learning. Interpersonal Computing and Technology: An Electronic Journal for the 21st Century, 2 , 56–77.

Newman, D. R., Johnson, C., Cochrane, C. & Webb, B. (1996). An experiment in group learning technology: evaluating critical thinking in face-to-face and computer-supported seminars . Verfügbar unter: http://emoderators.com/ipct-j/1996/n1/newman/contents.html . Accessed 12 Apr.

Pandero, E., & Jonsson, A. (2013). The use of scoring rubrics for formative assessment purposes revisited: A review. Educational Research Review, 9 , 129–144.

Reinmann-Rothmeier, G., & Mandl, H. (1999). Unterrichten und Lernumgebungen gestalten (überarbeitete Fassung). Forschungsbericht Nr. 60. Ludwig-Maximilans-Universität München Institut für Pädagogische Psychologie und Empirische Pädagogik.

Rieck, K, unter Mitarbeit von Hoffmann, D., & Friege, G. (2005). Gute Aufgaben. In Modulbeschreibung des Programms SINUS-Transfer Grundschule. https://www.schulportal-thueringen.de/get-data/a79020fe-f99b-4153-8de5-cfff12f92f30/N1.pdf . Accessed 27 Jan 2020.

Sopka, S., Simon, M., & Beckers, S. (2013). “Assessment drives Learning”: Konzepte zur Erfolgs- und Qualitätskontrolle. In M. St. Pierre & G. Breuer (Eds.), Simulation in der Medizin . Springer.

Wilbers, K. (2014). Wirtschaftsunterricht gestalten. Toolbox (2. Aufl.). epubli.

Wilbers, K. (2019). Wirtschaftsunterricht gestalten. epubli GmbH. https://www.pedocs.de/volltexte/2019/17949/pdf/Wilbers_2019_Wi.rtschaftsunterricht_gestalten.pdf . Accessed 24 Okt 2019.

Download references

Author information

Authors and affiliations.

Friedrich Alexander Uni, Fortbildungszentrum Hochschullehre FBZHL, Fürth, Bayern, Germany

Friedrich Alexander Universität Erlangen-Nürnberg, Fortbildungszentrum Hochschullehre FBZHL, Fürth, Germany

Michael Cursio

You can also search for this author in PubMed   Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature

About this chapter

Jahn, D., Cursio, M. (2023). Assessment of Critical Thinking. In: Critical Thinking. Springer VS, Wiesbaden. https://doi.org/10.1007/978-3-658-41543-3_8

Download citation

DOI : https://doi.org/10.1007/978-3-658-41543-3_8

Published : 10 December 2023

Publisher Name : Springer VS, Wiesbaden

Print ISBN : 978-3-658-41542-6

Online ISBN : 978-3-658-41543-3

eBook Packages : Education Education (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Reference Manager
  • Simple TEXT file

People also looked at

Original research article, performance assessment of critical thinking: conceptualization, design, and implementation.

research in practice analysis and critical thinking in assessment

  • 1 Lynch School of Education and Human Development, Boston College, Chestnut Hill, MA, United States
  • 2 Graduate School of Education, Stanford University, Stanford, CA, United States
  • 3 Department of Business and Economics Education, Johannes Gutenberg University, Mainz, Germany

Enhancing students’ critical thinking (CT) skills is an essential goal of higher education. This article presents a systematic approach to conceptualizing and measuring CT. CT generally comprises the following mental processes: identifying, evaluating, and analyzing a problem; interpreting information; synthesizing evidence; and reporting a conclusion. We further posit that CT also involves dealing with dilemmas involving ambiguity or conflicts among principles and contradictory information. We argue that performance assessment provides the most realistic—and most credible—approach to measuring CT. From this conceptualization and construct definition, we describe one possible framework for building performance assessments of CT with attention to extended performance tasks within the assessment system. The framework is a product of an ongoing, collaborative effort, the International Performance Assessment of Learning (iPAL). The framework comprises four main aspects: (1) The storyline describes a carefully curated version of a complex, real-world situation. (2) The challenge frames the task to be accomplished (3). A portfolio of documents in a range of formats is drawn from multiple sources chosen to have specific characteristics. (4) The scoring rubric comprises a set of scales each linked to a facet of the construct. We discuss a number of use cases, as well as the challenges that arise with the use and valid interpretation of performance assessments. The final section presents elements of the iPAL research program that involve various refinements and extensions of the assessment framework, a number of empirical studies, along with linkages to current work in online reading and information processing.

Introduction

In their mission statements, most colleges declare that a principal goal is to develop students’ higher-order cognitive skills such as critical thinking (CT) and reasoning (e.g., Shavelson, 2010 ; Hyytinen et al., 2019 ). The importance of CT is echoed by business leaders ( Association of American Colleges and Universities [AACU], 2018 ), as well as by college faculty (for curricular analyses in Germany, see e.g., Zlatkin-Troitschanskaia et al., 2018 ). Indeed, in the 2019 administration of the Faculty Survey of Student Engagement (FSSE), 93% of faculty reported that they “very much” or “quite a bit” structure their courses to support student development with respect to thinking critically and analytically. In a listing of 21st century skills, CT was the most highly ranked among FSSE respondents ( Indiana University, 2019 ). Nevertheless, there is considerable evidence that many college students do not develop these skills to a satisfactory standard ( Arum and Roksa, 2011 ; Shavelson et al., 2019 ; Zlatkin-Troitschanskaia et al., 2019 ). This state of affairs represents a serious challenge to higher education – and to society at large.

In view of the importance of CT, as well as evidence of substantial variation in its development during college, its proper measurement is essential to tracking progress in skill development and to providing useful feedback to both teachers and learners. Feedback can help focus students’ attention on key skill areas in need of improvement, and provide insight to teachers on choices of pedagogical strategies and time allocation. Moreover, comparative studies at the program and institutional level can inform higher education leaders and policy makers.

The conceptualization and definition of CT presented here is closely related to models of information processing and online reasoning, the skills that are the focus of this special issue. These two skills are especially germane to the learning environments that college students experience today when much of their academic work is done online. Ideally, students should be capable of more than naïve Internet search, followed by copy-and-paste (e.g., McGrew et al., 2017 ); rather, for example, they should be able to critically evaluate both sources of evidence and the quality of the evidence itself in light of a given purpose ( Leu et al., 2020 ).

In this paper, we present a systematic approach to conceptualizing CT. From that conceptualization and construct definition, we present one possible framework for building performance assessments of CT with particular attention to extended performance tasks within the test environment. The penultimate section discusses some of the challenges that arise with the use and valid interpretation of performance assessment scores. We conclude the paper with a section on future perspectives in an emerging field of research – the iPAL program.

Conceptual Foundations, Definition and Measurement of Critical Thinking

In this section, we briefly review the concept of CT and its definition. In accordance with the principles of evidence-centered design (ECD; Mislevy et al., 2003 ), the conceptualization drives the measurement of the construct; that is, implementation of ECD directly links aspects of the assessment framework to specific facets of the construct. We then argue that performance assessments designed in accordance with such an assessment framework provide the most realistic—and most credible—approach to measuring CT. The section concludes with a sketch of an approach to CT measurement grounded in performance assessment .

Concept and Definition of Critical Thinking

Taxonomies of 21st century skills ( Pellegrino and Hilton, 2012 ) abound, and it is neither surprising that CT appears in most taxonomies of learning, nor that there are many different approaches to defining and operationalizing the construct of CT. There is, however, general agreement that CT is a multifaceted construct ( Liu et al., 2014 ). Liu et al. (2014) identified five key facets of CT: (i) evaluating evidence and the use of evidence; (ii) analyzing arguments; (iii) understanding implications and consequences; (iv) developing sound arguments; and (v) understanding causation and explanation.

There is empirical support for these facets from college faculty. A 2016–2017 survey conducted by the Higher Education Research Institute (HERI) at the University of California, Los Angeles found that a substantial majority of faculty respondents “frequently” encouraged students to: (i) evaluate the quality or reliability of the information they receive; (ii) recognize biases that affect their thinking; (iii) analyze multiple sources of information before coming to a conclusion; and (iv) support their opinions with a logical argument ( Stolzenberg et al., 2019 ).

There is general agreement that CT involves the following mental processes: identifying, evaluating, and analyzing a problem; interpreting information; synthesizing evidence; and reporting a conclusion (e.g., Erwin and Sebrell, 2003 ; Kosslyn and Nelson, 2017 ; Shavelson et al., 2018 ). We further suggest that CT includes dealing with dilemmas of ambiguity or conflict among principles and contradictory information ( Oser and Biedermann, 2020 ).

Importantly, Oser and Biedermann (2020) posit that CT can be manifested at three levels. The first level, Critical Analysis , is the most complex of the three levels. Critical Analysis requires both knowledge in a specific discipline (conceptual) and procedural analytical (deduction, inclusion, etc.) knowledge. The second level is Critical Reflection , which involves more generic skills “… necessary for every responsible member of a society” (p. 90). It is “a basic attitude that must be taken into consideration if (new) information is questioned to be true or false, reliable or not reliable, moral or immoral etc.” (p. 90). To engage in Critical Reflection, one needs not only apply analytic reasoning, but also adopt a reflective stance toward the political, social, and other consequences of choosing a course of action. It also involves analyzing the potential motives of various actors involved in the dilemma of interest. The third level, Critical Alertness , involves questioning one’s own or others’ thinking from a skeptical point of view.

Wheeler and Haertel (1993) categorized higher-order skills, such as CT, into two types: (i) when solving problems and making decisions in professional and everyday life, for instance, related to civic affairs and the environment; and (ii) in situations where various mental processes (e.g., comparing, evaluating, and justifying) are developed through formal instruction, usually in a discipline. Hence, in both settings, individuals must confront situations that typically involve a problematic event, contradictory information, and possibly conflicting principles. Indeed, there is an ongoing debate concerning whether CT should be evaluated using generic or discipline-based assessments ( Nagel et al., 2020 ). Whether CT skills are conceptualized as generic or discipline-specific has implications for how they are assessed and how they are incorporated into the classroom.

In the iPAL project, CT is characterized as a multifaceted construct that comprises conceptualizing, analyzing, drawing inferences or synthesizing information, evaluating claims, and applying the results of these reasoning processes to various purposes (e.g., solve a problem, decide on a course of action, find an answer to a given question or reach a conclusion) ( Shavelson et al., 2019 ). In the course of carrying out a CT task, an individual typically engages in activities such as specifying or clarifying a problem; deciding what information is relevant to the problem; evaluating the trustworthiness of information; avoiding judgmental errors based on “fast thinking”; avoiding biases and stereotypes; recognizing different perspectives and how they can reframe a situation; considering the consequences of alternative courses of actions; and communicating clearly and concisely decisions and actions. The order in which activities are carried out can vary among individuals and the processes can be non-linear and reciprocal.

In this article, we focus on generic CT skills. The importance of these skills derives not only from their utility in academic and professional settings, but also the many situations involving challenging moral and ethical issues – often framed in terms of conflicting principles and/or interests – to which individuals have to apply these skills ( Kegan, 1994 ; Tessier-Lavigne, 2020 ). Conflicts and dilemmas are ubiquitous in the contexts in which adults find themselves: work, family, civil society. Moreover, to remain viable in the global economic environment – one characterized by increased competition and advances in second generation artificial intelligence (AI) – today’s college students will need to continually develop and leverage their CT skills. Ideally, colleges offer a supportive environment in which students can develop and practice effective approaches to reasoning about and acting in learning, professional and everyday situations.

Measurement of Critical Thinking

Critical thinking is a multifaceted construct that poses many challenges to those who would develop relevant and valid assessments. For those interested in current approaches to the measurement of CT that are not the focus of this paper, consult Zlatkin-Troitschanskaia et al. (2018) .

In this paper, we have singled out performance assessment as it offers important advantages to measuring CT. Extant tests of CT typically employ response formats such as forced-choice or short-answer, and scenario-based tasks (for an overview, see Liu et al., 2014 ). They all suffer from moderate to severe construct underrepresentation; that is, they fail to capture important facets of the CT construct such as perspective taking and communication. High fidelity performance tasks are viewed as more authentic in that they provide a problem context and require responses that are more similar to what individuals confront in the real world than what is offered by traditional multiple-choice items ( Messick, 1994 ; Braun, 2019 ). This greater verisimilitude promises higher levels of construct representation and lower levels of construct-irrelevant variance. Such performance tasks have the capacity to measure facets of CT that are imperfectly assessed, if at all, using traditional assessments ( Lane and Stone, 2006 ; Braun, 2019 ; Shavelson et al., 2019 ). However, these assertions must be empirically validated, and the measures should be subjected to psychometric analyses. Evidence of the reliability, validity, and interpretative challenges of performance assessment (PA) are extensively detailed in Davey et al. (2015) .

We adopt the following definition of performance assessment:

A performance assessment (sometimes called a work sample when assessing job performance) … is an activity or set of activities that requires test takers, either individually or in groups, to generate products or performances in response to a complex, most often real-world task. These products and performances provide observable evidence bearing on test takers’ knowledge, skills, and abilities—their competencies—in completing the assessment ( Davey et al., 2015 , p. 10).

A performance assessment typically includes an extended performance task and short constructed-response and selected-response (i.e., multiple-choice) tasks (for examples, see Zlatkin-Troitschanskaia and Shavelson, 2019 ). In this paper, we refer to both individual performance- and constructed-response tasks as performance tasks (PT) (For an example, see Table 1 in section “iPAL Assessment Framework”).

www.frontiersin.org

Table 1. The iPAL assessment framework.

An Approach to Performance Assessment of Critical Thinking: The iPAL Program

The approach to CT presented here is the result of ongoing work undertaken by the International Performance Assessment of Learning collaborative (iPAL 1 ). iPAL is an international consortium of volunteers, primarily from academia, who have come together to address the dearth in higher education of research and practice in measuring CT with performance tasks ( Shavelson et al., 2018 ). In this section, we present iPAL’s assessment framework as the basis of measuring CT, with examples along the way.

iPAL Background

The iPAL assessment framework builds on the Council of Aid to Education’s Collegiate Learning Assessment (CLA). The CLA was designed to measure cross-disciplinary, generic competencies, such as CT, analytic reasoning, problem solving, and written communication ( Klein et al., 2007 ; Shavelson, 2010 ). Ideally, each PA contained an extended PT (e.g., examining a range of evidential materials related to the crash of an aircraft) and two short PT’s: one in which students either critique an argument or provide a solution in response to a real-world societal issue.

Motivated by considerations of adequate reliability, in 2012, the CLA was later modified to create the CLA+. The CLA+ includes two subtests: a PT and a 25-item Selected Response Question (SRQ) section. The PT presents a document or problem statement and an assignment based on that document which elicits an open-ended response. The CLA+ added the SRQ section (which is not linked substantively to the PT scenario) to increase the number of student responses to obtain more reliable estimates of performance at the student-level than could be achieved with a single PT ( Zahner, 2013 ; Davey et al., 2015 ).

iPAL Assessment Framework

Methodological foundations.

The iPAL framework evolved from the Collegiate Learning Assessment developed by Klein et al. (2007) . It was also informed by the results from the AHELO pilot study ( Organisation for Economic Co-operation and Development [OECD], 2012 , 2013 ), as well as the KoKoHs research program in Germany (for an overview see, Zlatkin-Troitschanskaia et al., 2017 , 2020 ). The ongoing refinement of the iPAL framework has been guided in part by the principles of Evidence Centered Design (ECD) ( Mislevy et al., 2003 ; Mislevy and Haertel, 2006 ; Haertel and Fujii, 2017 ).

In educational measurement, an assessment framework plays a critical intermediary role between the theoretical formulation of the construct and the development of the assessment instrument containing tasks (or items) intended to elicit evidence with respect to that construct ( Mislevy et al., 2003 ). Builders of the assessment framework draw on the construct theory and operationalize it in a way that provides explicit guidance to PT’s developers. Thus, the framework should reflect the relevant facets of the construct, where relevance is determined by substantive theory or an appropriate alternative such as behavioral samples from real-world situations of interest (criterion-sampling; McClelland, 1973 ), as well as the intended use(s) (for an example, see Shavelson et al., 2019 ). By following the requirements and guidelines embodied in the framework, instrument developers strengthen the claim of construct validity for the instrument ( Messick, 1994 ).

An assessment framework can be specified at different levels of granularity: an assessment battery (“omnibus” assessment, for an example see below), a single performance task, or a specific component of an assessment ( Shavelson, 2010 ; Davey et al., 2015 ). In the iPAL program, a performance assessment comprises one or more extended performance tasks and additional selected-response and short constructed-response items. The focus of the framework specified below is on a single PT intended to elicit evidence with respect to some facets of CT, such as the evaluation of the trustworthiness of the documents provided and the capacity to address conflicts of principles.

From the ECD perspective, an assessment is an instrument for generating information to support an evidentiary argument and, therefore, the intended inferences (claims) must guide each stage of the design process. The construct of interest is operationalized through the Student Model , which represents the target knowledge, skills, and abilities, as well as the relationships among them. The student model should also make explicit the assumptions regarding student competencies in foundational skills or content knowledge. The Task Model specifies the features of the problems or items posed to the respondent, with the goal of eliciting the evidence desired. The assessment framework also describes the collection of task models comprising the instrument, with considerations of construct validity, various psychometric characteristics (e.g., reliability) and practical constraints (e.g., testing time and cost). The student model provides grounds for evidence of validity, especially cognitive validity; namely, that the students are thinking critically in responding to the task(s).

In the present context, the target construct (CT) is the competence of individuals to think critically, which entails solving complex, real-world problems, and clearly communicating their conclusions or recommendations for action based on trustworthy, relevant and unbiased information. The situations, drawn from actual events, are challenging and may arise in many possible settings. In contrast to more reductionist approaches to assessment development, the iPAL approach and framework rests on the assumption that properly addressing these situational demands requires the application of a constellation of CT skills appropriate to the particular task presented (e.g., Shavelson, 2010 , 2013 ). For a PT, the assessment framework must also specify the rubric by which the responses will be evaluated. The rubric must be properly linked to the target construct so that the resulting score profile constitutes evidence that is both relevant and interpretable in terms of the student model (for an example, see Zlatkin-Troitschanskaia et al., 2019 ).

iPAL Task Framework

The iPAL ‘omnibus’ framework comprises four main aspects: A storyline , a challenge , a document library , and a scoring rubric . Table 1 displays these aspects, brief descriptions of each, and the corresponding examples drawn from an iPAL performance assessment (Version adapted from original in Hyytinen and Toom, 2019 ). Storylines are drawn from various domains; for example, the worlds of business, public policy, civics, medicine, and family. They often involve moral and/or ethical considerations. Deriving an appropriate storyline from a real-world situation requires careful consideration of which features are to be kept in toto , which adapted for purposes of the assessment, and which to be discarded. Framing the challenge demands care in wording so that there is minimal ambiguity in what is required of the respondent. The difficulty of the challenge depends, in large part, on the nature and extent of the information provided in the document library , the amount of scaffolding included, as well as the scope of the required response. The amount of information and the scope of the challenge should be commensurate with the amount of time available. As is evident from the table, the characteristics of the documents in the library are intended to elicit responses related to facets of CT. For example, with regard to bias, the information provided is intended to play to judgmental errors due to fast thinking and/or motivational reasoning. Ideally, the situation should accommodate multiple solutions of varying degrees of merit.

The dimensions of the scoring rubric are derived from the Task Model and Student Model ( Mislevy et al., 2003 ) and signal which features are to be extracted from the response and indicate how they are to be evaluated. There should be a direct link between the evaluation of the evidence and the claims that are made with respect to the key features of the task model and student model . More specifically, the task model specifies the various manipulations embodied in the PA and so informs scoring, while the student model specifies the capacities students employ in more or less effectively responding to the tasks. The score scales for each of the five facets of CT (see section “Concept and Definition of Critical Thinking”) can be specified using appropriate behavioral anchors (for examples, see Zlatkin-Troitschanskaia and Shavelson, 2019 ). Of particular importance is the evaluation of the response with respect to the last dimension of the scoring rubric; namely, the overall coherence and persuasiveness of the argument, building on the explicit or implicit characteristics related to the first five dimensions. The scoring process must be monitored carefully to ensure that (trained) raters are judging each response based on the same types of features and evaluation criteria ( Braun, 2019 ) as indicated by interrater agreement coefficients.

The scoring rubric of the iPAL omnibus framework can be modified for specific tasks ( Lane and Stone, 2006 ). This generic rubric helps ensure consistency across rubrics for different storylines. For example, Zlatkin-Troitschanskaia et al. (2019 , p. 473) used the following scoring scheme:

Based on our construct definition of CT and its four dimensions: (D1-Info) recognizing and evaluating information, (D2-Decision) recognizing and evaluating arguments and making decisions, (D3-Conseq) recognizing and evaluating the consequences of decisions, and (D4-Writing), we developed a corresponding analytic dimensional scoring … The students’ performance is evaluated along the four dimensions, which in turn are subdivided into a total of 23 indicators as (sub)categories of CT … For each dimension, we sought detailed evidence in students’ responses for the indicators and scored them on a six-point Likert-type scale. In order to reduce judgment distortions, an elaborate procedure of ‘behaviorally anchored rating scales’ (Smith and Kendall, 1963) was applied by assigning concrete behavioral expectations to certain scale points (Bernardin et al., 1976). To this end, we defined the scale levels by short descriptions of typical behavior and anchored them with concrete examples. … We trained four raters in 1 day using a specially developed training course to evaluate students’ performance along the 23 indicators clustered into four dimensions (for a description of the rater training, see Klotzer, 2018).

Shavelson et al. (2019) examined the interrater agreement of the scoring scheme developed by Zlatkin-Troitschanskaia et al. (2019) and “found that with 23 items and 2 raters the generalizability (“reliability”) coefficient for total scores to be 0.74 (with 4 raters, 0.84)” ( Shavelson et al., 2019 , p. 15). In the study by Zlatkin-Troitschanskaia et al. (2019 , p. 478) three score profiles were identified (low-, middle-, and high-performer) for students. Proper interpretation of such profiles requires care. For example, there may be multiple possible explanations for low scores such as poor CT skills, a lack of a disposition to engage with the challenge, or the two attributes jointly. These alternative explanations for student performance can potentially pose a threat to the evidentiary argument. In this case, auxiliary information may be available to aid in resolving the ambiguity. For example, student responses to selected- and short-constructed-response items in the PA can provide relevant information about the levels of the different skills possessed by the student. When sufficient data are available, the scores can be modeled statistically and/or qualitatively in such a way as to bring them to bear on the technical quality or interpretability of the claims of the assessment: reliability, validity, and utility evidence ( Davey et al., 2015 ; Zlatkin-Troitschanskaia et al., 2019 ). These kinds of concerns are less critical when PT’s are used in classroom settings. The instructor can draw on other sources of evidence, including direct discussion with the student.

Use of iPAL Performance Assessments in Educational Practice: Evidence From Preliminary Validation Studies

The assessment framework described here supports the development of a PT in a general setting. Many modifications are possible and, indeed, desirable. If the PT is to be more deeply embedded in a certain discipline (e.g., economics, law, or medicine), for example, then the framework must specify characteristics of the narrative and the complementary documents as to the breadth and depth of disciplinary knowledge that is represented.

At present, preliminary field trials employing the omnibus framework (i.e., a full set of documents) indicated that 60 min was generally an inadequate amount of time for students to engage with the full set of complementary documents and to craft a complete response to the challenge (for an example, see Shavelson et al., 2019 ). Accordingly, it would be helpful to develop modified frameworks for PT’s that require substantially less time. For an example, see a short performance assessment of civic online reasoning, requiring response times from 10 to 50 min ( Wineburg et al., 2016 ). Such assessment frameworks could be derived from the omnibus framework by focusing on a reduced number of facets of CT, and specifying the characteristics of the complementary documents to be included – or, perhaps, choices among sets of documents. In principle, one could build a ‘family’ of PT’s, each using the same (or nearly the same) storyline and a subset of the full collection of complementary documents.

Paul and Elder (2007) argue that the goal of CT assessments should be to provide faculty with important information about how well their instruction supports the development of students’ CT. In that spirit, the full family of PT’s could represent all facets of the construct while affording instructors and students more specific insights on strengths and weaknesses with respect to particular facets of CT. Moreover, the framework should be expanded to include the design of a set of short answer and/or multiple choice items to accompany the PT. Ideally, these additional items would be based on the same narrative as the PT to collect more nuanced information on students’ precursor skills such as reading comprehension, while enhancing the overall reliability of the assessment. Areas where students are under-prepared could be addressed before, or even in parallel with the development of the focal CT skills. The parallel approach follows the co-requisite model of developmental education. In other settings (e.g., for summative assessment), these complementary items would be administered after the PT to augment the evidence in relation to the various claims. The full PT taking 90 min or more could serve as a capstone assessment.

As we transition from simply delivering paper-based assessments by computer to taking full advantage of the affordances of a digital platform, we should learn from the hard-won lessons of the past so that we can make swifter progress with fewer missteps. In that regard, we must take validity as the touchstone – assessment design, development and deployment must all be tightly linked to the operational definition of the CT construct. Considerations of reliability and practicality come into play with various use cases that highlight different purposes for the assessment (for future perspectives, see next section).

The iPAL assessment framework represents a feasible compromise between commercial, standardized assessments of CT (e.g., Liu et al., 2014 ), on the one hand, and, on the other, freedom for individual faculty to develop assessment tasks according to idiosyncratic models. It imposes a degree of standardization on both task development and scoring, while still allowing some flexibility for faculty to tailor the assessment to meet their unique needs. In so doing, it addresses a key weakness of the AAC&U’s VALUE initiative 2 (retrieved 5/7/2020) that has achieved wide acceptance among United States colleges.

The VALUE initiative has produced generic scoring rubrics for 15 domains including CT, problem-solving and written communication. A rubric for a particular skill domain (e.g., critical thinking) has five to six dimensions with four ordered performance levels for each dimension (1 = lowest, 4 = highest). The performance levels are accompanied by language that is intended to clearly differentiate among levels. 3 Faculty are asked to submit student work products from a senior level course that is intended to yield evidence with respect to student learning outcomes in a particular domain and that, they believe, can elicit performances at the highest level. The collection of work products is then graded by faculty from other institutions who have been trained to apply the rubrics.

A principal difficulty is that there is neither a common framework to guide the design of the challenge, nor any control on task complexity and difficulty. Consequently, there is substantial heterogeneity in the quality and evidential value of the submitted responses. This also causes difficulties with task scoring and inter-rater reliability. Shavelson et al. (2009) discuss some of the problems arising with non-standardized collections of student work.

In this context, one advantage of the iPAL framework is that it can provide valuable guidance and an explicit structure for faculty in developing performance tasks for both instruction and formative assessment. When faculty design assessments, their focus is typically on content coverage rather than other potentially important characteristics, such as the degree of construct representation and the adequacy of their scoring procedures ( Braun, 2019 ).

Concluding Reflections

Challenges to interpretation and implementation.

Performance tasks such as those generated by iPAL are attractive instruments for assessing CT skills (e.g., Shavelson, 2010 ; Shavelson et al., 2019 ). The attraction mainly rests on the assumption that elaborated PT’s are more authentic (direct) and more completely capture facets of the target construct (i.e., possess greater construct representation) than the widely used selected-response tests. However, as Messick (1994) noted authenticity is a “promissory note” that must be redeemed with empirical research. In practice, there are trade-offs among authenticity, construct validity, and psychometric quality such as reliability ( Davey et al., 2015 ).

One reason for Messick (1994) caution is that authenticity does not guarantee construct validity. The latter must be established by drawing on multiple sources of evidence ( American Educational Research Association et al., 2014 ). Following the ECD principles in designing and developing the PT, as well as the associated scoring rubrics, constitutes an important type of evidence. Further, as Leighton (2019) argues, response process data (“cognitive validity”) is needed to validate claims regarding the cognitive complexity of PT’s. Relevant data can be obtained through cognitive laboratory studies involving methods such as think aloud protocols or eye-tracking. Although time-consuming and expensive, such studies can yield not only evidence of validity, but also valuable information to guide refinements of the PT.

Going forward, iPAL PT’s must be subjected to validation studies as recommended in the Standards for Psychological and Educational Testing by American Educational Research Association et al. (2014) . With a particular focus on the criterion “relationships to other variables,” a framework should include assumptions about the theoretically expected relationships among the indicators assessed by the PT, as well as the indicators’ relationships to external variables such as intelligence or prior (task-relevant) knowledge.

Complementing the necessity of evaluating construct validity, there is the need to consider potential sources of construct-irrelevant variance (CIV). One pertains to student motivation, which is typically greater when the stakes are higher. If students are not motivated, then their performance is likely to be impacted by factors unrelated to their (construct-relevant) ability ( Lane and Stone, 2006 ; Braun et al., 2011 ; Shavelson, 2013 ). Differential motivation across groups can also bias comparisons. Student motivation might be enhanced if the PT is administered in the context of a course with the promise of generating useful feedback on students’ skill profiles.

Construct-irrelevant variance can also occur when students are not equally prepared for the format of the PT or fully appreciate the response requirements. This source of CIV could be alleviated by providing students with practice PT’s. Finally, the use of novel forms of documentation, such as those from the Internet, can potentially introduce CIV due to differential familiarity with forms of representation or contents. Interestingly, this suggests that there may be a conflict between enhancing construct representation and reducing CIV.

Another potential source of CIV is related to response evaluation. Even with training, human raters can vary in accuracy and usage of the full score range. In addition, raters may attend to features of responses that are unrelated to the target construct, such as the length of the students’ responses or the frequency of grammatical errors ( Lane and Stone, 2006 ). Some of these sources of variance could be addressed in an online environment, where word processing software could alert students to potential grammatical and spelling errors before they submit their final work product.

Performance tasks generally take longer to administer and are more costly than traditional assessments, making it more difficult to reliably measure student performance ( Messick, 1994 ; Davey et al., 2015 ). Indeed, it is well known that more than one performance task is needed to obtain high reliability ( Shavelson, 2013 ). This is due to both student-task interactions and variability in scoring. Sources of student-task interactions are differential familiarity with the topic ( Hyytinen and Toom, 2019 ) and differential motivation to engage with the task. The level of reliability required, however, depends on the context of use. For use in formative assessment as part of an instructional program, reliability can be lower than use for summative purposes. In the former case, other types of evidence are generally available to support interpretation and guide pedagogical decisions. Further studies are needed to obtain estimates of reliability in typical instructional settings.

With sufficient data, more sophisticated psychometric analyses become possible. One challenge is that the assumption of unidimensionality required for many psychometric models might be untenable for performance tasks ( Davey et al., 2015 ). Davey et al. (2015) provide the example of a mathematics assessment that requires students to demonstrate not only their mathematics skills but also their written communication skills. Although the iPAL framework does not explicitly address students’ reading comprehension and organization skills, students will likely need to call on these abilities to accomplish the task. Moreover, as the operational definition of CT makes evident, the student must not only deploy several skills in responding to the challenge of the PT, but also carry out component tasks in sequence. The former requirement strongly indicates the need for a multi-dimensional IRT model, while the latter suggests that the usual assumption of local item independence may well be problematic ( Lane and Stone, 2006 ). At the same time, the analytic scoring rubric should facilitate the use of latent class analysis to partition data from large groups into meaningful categories ( Zlatkin-Troitschanskaia et al., 2019 ).

Future Perspectives

Although the iPAL consortium has made substantial progress in the assessment of CT, much remains to be done. Further refinement of existing PT’s and their adaptation to different languages and cultures must continue. To this point, there are a number of examples: The refugee crisis PT (cited in Table 1 ) was translated and adapted from Finnish to US English and then to Colombian Spanish. A PT concerning kidney transplants was translated and adapted from German to US English. Finally, two PT’s based on ‘legacy admissions’ to US colleges were translated and adapted to Colombian Spanish.

With respect to data collection, there is a need for sufficient data to support psychometric analysis of student responses, especially the relationships among the different components of the scoring rubric, as this would inform both task development and response evaluation ( Zlatkin-Troitschanskaia et al., 2019 ). In addition, more intensive study of response processes through cognitive laboratories and the like are needed to strengthen the evidential argument for construct validity ( Leighton, 2019 ). We are currently conducting empirical studies, collecting data on both iPAL PT’s and other measures of CT. These studies will provide evidence of convergent and discriminant validity.

At the same time, efforts should be directed at further development to support different ways CT PT’s might be used—i.e., use cases—especially those that call for formative use of PT’s. Incorporating formative assessment into courses can plausibly be expected to improve students’ competency acquisition ( Zlatkin-Troitschanskaia et al., 2017 ). With suitable choices of storylines, appropriate combinations of (modified) PT’s, supplemented by short-answer and multiple-choice items, could be interwoven into ordinary classroom activities. The supplementary items may be completely separate from the PT’s (as is the case with the CLA+), loosely coupled with the PT’s (as in drawing on the same storyline), or tightly linked to the PT’s (as in requiring elaboration of certain components of the response to the PT).

As an alternative to such integration, stand-alone modules could be embedded in courses to yield evidence of students’ generic CT skills. Core curriculum courses or general education courses offer ideal settings for embedding performance assessments. If these assessments were administered to a representative sample of students in each cohort over their years in college, the results would yield important information on the development of CT skills at a population level. For another example, these PA’s could be used to assess the competence profiles of students entering Bachelor’s or graduate-level programs as a basis for more targeted instructional support.

Thus, in considering different use cases for the assessment of CT, it is evident that several modifications of the iPAL omnibus assessment framework are needed. As noted earlier, assessments built according to this framework are demanding with respect to the extensive preliminary work required by a task and the time required to properly complete it. Thus, it would be helpful to have modified versions of the framework, focusing on one or two facets of the CT construct and calling for a smaller number of supplementary documents. The challenge to the student should be suitably reduced.

Some members of the iPAL collaborative have developed PT’s that are embedded in disciplines such as engineering, law and education ( Crump et al., 2019 ; for teacher education examples, see Jeschke et al., 2019 ). These are proving to be of great interest to various stakeholders and further development is likely. Consequently, it is essential that an appropriate assessment framework be established and implemented. It is both a conceptual and an empirical question as to whether a single framework can guide development in different domains.

Performance Assessment in Online Learning Environment

Over the last 15 years, increasing amounts of time in both college and work are spent using computers and other electronic devices. This has led to formulation of models for the new literacies that attempt to capture some key characteristics of these activities. A prominent example is a model proposed by Leu et al. (2020) . The model frames online reading as a process of problem-based inquiry that calls on five practices to occur during online research and comprehension:

1. Reading to identify important questions,

2. Reading to locate information,

3. Reading to critically evaluate information,

4. Reading to synthesize online information, and

5. Reading and writing to communicate online information.

The parallels with the iPAL definition of CT are evident and suggest there may be benefits to closer links between these two lines of research. For example, a report by Leu et al. (2014) describes empirical studies comparing assessments of online reading using either open-ended or multiple-choice response formats.

The iPAL consortium has begun to take advantage of the affordances of the online environment (for examples, see Schmidt et al. and Nagel et al. in this special issue). Most obviously, Supplementary Materials can now include archival photographs, audio recordings, or videos. Additional tasks might include the online search for relevant documents, though this would add considerably to the time demands. This online search could occur within a simulated Internet environment, as is the case for the IEA’s ePIRLS assessment ( Mullis et al., 2017 ).

The prospect of having access to a wealth of materials that can add to task authenticity is exciting. Yet it can also add ambiguity and information overload. Increased authenticity, then, should be weighed against validity concerns and the time required to absorb the content in these materials. Modifications of the design framework and extensive empirical testing will be required to decide on appropriate trade-offs. A related possibility is to employ some of these materials in short-answer (or even selected-response) items that supplement the main PT. Response formats could include highlighting text or using a drag-and-drop menu to construct a response. Students’ responses could be automatically scored, thereby containing costs. With automated scoring, feedback to students and faculty, including suggestions for next steps in strengthening CT skills, could also be provided without adding to faculty workload. Therefore, taking advantage of the online environment to incorporate new types of supplementary documents should be a high priority and, perhaps, to introduce new response formats as well. Finally, further investigation of the overlap between this formulation of CT and the characterization of online reading promulgated by Leu et al. (2020) is a promising direction to pursue.

Data Availability Statement

All datasets generated for this study are included in the article/supplementary material.

Author Contributions

HB wrote the article. RS, OZ-T, and KB were involved in the preparation and revision of the article and co-wrote the manuscript. All authors contributed to the article and approved the submitted version.

This study was funded in part by the Spencer Foundation (Grant No. #201700123).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We would like to thank all the researchers who have participated in the iPAL program.

  • ^ https://www.ipal-rd.com/
  • ^ https://www.aacu.org/value
  • ^ When test results are reported by means of substantively defined categories, the scoring is termed “criterion-referenced”. This is, in contrast to results, reported as percentiles; such scoring is termed “norm-referenced”.

American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (2014). Standards for Educational and Psychological Testing. Washington, D.C: American Educational Research Association.

Google Scholar

Arum, R., and Roksa, J. (2011). Academically Adrift: Limited Learning on College Campuses. Chicago, IL: University of Chicago Press.

Association of American Colleges and Universities (n.d.). VALUE: What is value?. Available online at:: https://www.aacu.org/value (accessed May 7, 2020).

Association of American Colleges and Universities [AACU] (2018). Fulfilling the American Dream: Liberal Education and the Future of Work. Available online at:: https://www.aacu.org/research/2018-future-of-work (accessed May 1, 2020).

Braun, H. (2019). Performance assessment and standardization in higher education: a problematic conjunction? Br. J. Educ. Psychol. 89, 429–440. doi: 10.1111/bjep.12274

PubMed Abstract | CrossRef Full Text | Google Scholar

Braun, H. I., Kirsch, I., and Yamoto, K. (2011). An experimental study of the effects of monetary incentives on performance on the 12th grade NAEP reading assessment. Teach. Coll. Rec. 113, 2309–2344.

Crump, N., Sepulveda, C., Fajardo, A., and Aguilera, A. (2019). Systematization of performance tests in critical thinking: an interdisciplinary construction experience. Rev. Estud. Educ. 2, 17–47.

Davey, T., Ferrara, S., Shavelson, R., Holland, P., Webb, N., and Wise, L. (2015). Psychometric Considerations for the Next Generation of Performance Assessment. Washington, DC: Center for K-12 Assessment & Performance Management, Educational Testing Service.

Erwin, T. D., and Sebrell, K. W. (2003). Assessment of critical thinking: ETS’s tasks in critical thinking. J. Gen. Educ. 52, 50–70. doi: 10.1353/jge.2003.0019

CrossRef Full Text | Google Scholar

Haertel, G. D., and Fujii, R. (2017). “Evidence-centered design and postsecondary assessment,” in Handbook on Measurement, Assessment, and Evaluation in Higher Education , 2nd Edn, eds C. Secolsky and D. B. Denison (Abingdon: Routledge), 313–339. doi: 10.4324/9781315709307-26

Hyytinen, H., and Toom, A. (2019). Developing a performance assessment task in the Finnish higher education context: conceptual and empirical insights. Br. J. Educ. Psychol. 89, 551–563. doi: 10.1111/bjep.12283

Hyytinen, H., Toom, A., and Shavelson, R. J. (2019). “Enhancing scientific thinking through the development of critical thinking in higher education,” in Redefining Scientific Thinking for Higher Education: Higher-Order Thinking, Evidence-Based Reasoning and Research Skills , eds M. Murtonen and K. Balloo (London: Palgrave MacMillan).

Indiana University (2019). FSSE 2019 Frequencies: FSSE 2019 Aggregate. Available online at:: http://fsse.indiana.edu/pdf/FSSE_IR_2019/summary_tables/FSSE19_Frequencies_(FSSE_2019).pdf (accessed May 1, 2020).

Jeschke, C., Kuhn, C., Lindmeier, A., Zlatkin-Troitschanskaia, O., Saas, H., and Heinze, A. (2019). Performance assessment to investigate the domain specificity of instructional skills among pre-service and in-service teachers of mathematics and economics. Br. J. Educ. Psychol. 89, 538–550. doi: 10.1111/bjep.12277

Kegan, R. (1994). In Over Our Heads: The Mental Demands of Modern Life. Cambridge, MA: Harvard University Press.

Klein, S., Benjamin, R., Shavelson, R., and Bolus, R. (2007). The collegiate learning assessment: facts and fantasies. Eval. Rev. 31, 415–439. doi: 10.1177/0193841x07303318

Kosslyn, S. M., and Nelson, B. (2017). Building the Intentional University: Minerva and the Future of Higher Education. Cambridge, MAL: The MIT Press.

Lane, S., and Stone, C. A. (2006). “Performance assessment,” in Educational Measurement , 4th Edn, ed. R. L. Brennan (Lanham, MA: Rowman & Littlefield Publishers), 387–432.

Leighton, J. P. (2019). The risk–return trade-off: performance assessments and cognitive validation of inferences. Br. J. Educ. Psychol. 89, 441–455. doi: 10.1111/bjep.12271

Leu, D. J., Kiili, C., Forzani, E., Zawilinski, L., McVerry, J. G., and O’Byrne, W. I. (2020). “The new literacies of online research and comprehension,” in The Concise Encyclopedia of Applied Linguistics , ed. C. A. Chapelle (Oxford: Wiley-Blackwell), 844–852.

Leu, D. J., Kulikowich, J. M., Kennedy, C., and Maykel, C. (2014). “The ORCA Project: designing technology-based assessments for online research,” in Paper Presented at the American Educational Research Annual Meeting , Philadelphia, PA.

Liu, O. L., Frankel, L., and Roohr, K. C. (2014). Assessing critical thinking in higher education: current state and directions for next-generation assessments. ETS Res. Rep. Ser. 1, 1–23. doi: 10.1002/ets2.12009

McClelland, D. C. (1973). Testing for competence rather than for “intelligence.”. Am. Psychol. 28, 1–14. doi: 10.1037/h0034092

McGrew, S., Ortega, T., Breakstone, J., and Wineburg, S. (2017). The challenge that’s bigger than fake news: civic reasoning in a social media environment. Am. Educ. 4, 4-9, 39.

Mejía, A., Mariño, J. P., and Molina, A. (2019). Incorporating perspective analysis into critical thinking performance assessments. Br. J. Educ. Psychol. 89, 456–467. doi: 10.1111/bjep.12297

Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educ. Res. 23, 13–23. doi: 10.3102/0013189x023002013

Mislevy, R. J., Almond, R. G., and Lukas, J. F. (2003). A brief introduction to evidence-centered design. ETS Res. Rep. Ser. 2003, i–29. doi: 10.1002/j.2333-8504.2003.tb01908.x

Mislevy, R. J., and Haertel, G. D. (2006). Implications of evidence-centered design for educational testing. Educ. Meas. Issues Pract. 25, 6–20. doi: 10.1111/j.1745-3992.2006.00075.x

Mullis, I. V. S., Martin, M. O., Foy, P., and Hooper, M. (2017). ePIRLS 2016 International Results in Online Informational Reading. Available online at:: http://timssandpirls.bc.edu/pirls2016/international-results/ (accessed May 1, 2020).

Nagel, M.-T., Zlatkin-Troitschanskaia, O., Schmidt, S., and Beck, K. (2020). “Performance assessment of generic and domain-specific skills in higher education economics,” in Student Learning in German Higher Education , eds O. Zlatkin-Troitschanskaia, H. A. Pant, M. Toepper, and C. Lautenbach (Berlin: Springer), 281–299. doi: 10.1007/978-3-658-27886-1_14

Organisation for Economic Co-operation and Development [OECD] (2012). AHELO: Feasibility Study Report , Vol. 1. Paris: OECD. Design and implementation.

Organisation for Economic Co-operation and Development [OECD] (2013). AHELO: Feasibility Study Report , Vol. 2. Paris: OECD. Data analysis and national experiences.

Oser, F. K., and Biedermann, H. (2020). “A three-level model for critical thinking: critical alertness, critical reflection, and critical analysis,” in Frontiers and Advances in Positive Learning in the Age of Information (PLATO) , ed. O. Zlatkin-Troitschanskaia (Cham: Springer), 89–106. doi: 10.1007/978-3-030-26578-6_7

Paul, R., and Elder, L. (2007). Consequential validity: using assessment to drive instruction. Found. Crit. Think. 29, 31–40.

Pellegrino, J. W., and Hilton, M. L. (eds) (2012). Education for life and work: Developing Transferable Knowledge and Skills in the 21st Century. Washington DC: National Academies Press.

Shavelson, R. (2010). Measuring College Learning Responsibly: Accountability in a New Era. Redwood City, CA: Stanford University Press.

Shavelson, R. J. (2013). On an approach to testing and modeling competence. Educ. Psychol. 48, 73–86. doi: 10.1080/00461520.2013.779483

Shavelson, R. J., Zlatkin-Troitschanskaia, O., Beck, K., Schmidt, S., and Marino, J. P. (2019). Assessment of university students’ critical thinking: next generation performance assessment. Int. J. Test. 19, 337–362. doi: 10.1080/15305058.2018.1543309

Shavelson, R. J., Zlatkin-Troitschanskaia, O., and Marino, J. P. (2018). “International performance assessment of learning in higher education (iPAL): research and development,” in Assessment of Learning Outcomes in Higher Education: Cross-National Comparisons and Perspectives , eds O. Zlatkin-Troitschanskaia, M. Toepper, H. A. Pant, C. Lautenbach, and C. Kuhn (Berlin: Springer), 193–214. doi: 10.1007/978-3-319-74338-7_10

Shavelson, R. J., Klein, S., and Benjamin, R. (2009). The limitations of portfolios. Inside Higher Educ. Available online at: https://www.insidehighered.com/views/2009/10/16/limitations-portfolios

Stolzenberg, E. B., Eagan, M. K., Zimmerman, H. B., Berdan Lozano, J., Cesar-Davis, N. M., Aragon, M. C., et al. (2019). Undergraduate Teaching Faculty: The HERI Faculty Survey 2016–2017. Los Angeles, CA: UCLA.

Tessier-Lavigne, M. (2020). Putting Ethics at the Heart of Innovation. Stanford, CA: Stanford Magazine.

Wheeler, P., and Haertel, G. D. (1993). Resource Handbook on Performance Assessment and Measurement: A Tool for Students, Practitioners, and Policymakers. Palm Coast, FL: Owl Press.

Wineburg, S., McGrew, S., Breakstone, J., and Ortega, T. (2016). Evaluating Information: The Cornerstone of Civic Online Reasoning. Executive Summary. Stanford, CA: Stanford History Education Group.

Zahner, D. (2013). Reliability and Validity–CLA+. Council for Aid to Education. Available online at:: https://pdfs.semanticscholar.org/91ae/8edfac44bce3bed37d8c9091da01d6db3776.pdf .

Zlatkin-Troitschanskaia, O., and Shavelson, R. J. (2019). Performance assessment of student learning in higher education [Special issue]. Br. J. Educ. Psychol. 89, i–iv, 413–563.

Zlatkin-Troitschanskaia, O., Pant, H. A., Lautenbach, C., Molerov, D., Toepper, M., and Brückner, S. (2017). Modeling and Measuring Competencies in Higher Education: Approaches to Challenges in Higher Education Policy and Practice. Berlin: Springer VS.

Zlatkin-Troitschanskaia, O., Pant, H. A., Toepper, M., and Lautenbach, C. (eds) (2020). Student Learning in German Higher Education: Innovative Measurement Approaches and Research Results. Wiesbaden: Springer.

Zlatkin-Troitschanskaia, O., Shavelson, R. J., and Pant, H. A. (2018). “Assessment of learning outcomes in higher education: international comparisons and perspectives,” in Handbook on Measurement, Assessment, and Evaluation in Higher Education , 2nd Edn, eds C. Secolsky and D. B. Denison (Abingdon: Routledge), 686–697.

Zlatkin-Troitschanskaia, O., Shavelson, R. J., Schmidt, S., and Beck, K. (2019). On the complementarity of holistic and analytic approaches to performance assessment scoring. Br. J. Educ. Psychol. 89, 468–484. doi: 10.1111/bjep.12286

Keywords : critical thinking, performance assessment, assessment framework, scoring rubric, evidence-centered design, 21st century skills, higher education

Citation: Braun HI, Shavelson RJ, Zlatkin-Troitschanskaia O and Borowiec K (2020) Performance Assessment of Critical Thinking: Conceptualization, Design, and Implementation. Front. Educ. 5:156. doi: 10.3389/feduc.2020.00156

Received: 30 May 2020; Accepted: 04 August 2020; Published: 08 September 2020.

Reviewed by:

Copyright © 2020 Braun, Shavelson, Zlatkin-Troitschanskaia and Borowiec. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Henry I. Braun, [email protected]

This article is part of the Research Topic

Assessing Information Processing and Online Reasoning as a Prerequisite for Learning in Higher Education

University of Bristol Logo

  • Help & Terms of Use

Analysis and Critical Thinking in Assessment: Change Project Pilot Resources

  • School for Policy Studies

Research output : Other contribution

Bibliographical note

Access to document.

  • http://www.rip.org.uk/putting-it-into-practice/change-project-programme/analysis-and-assessment
  • Persistent link

T1 - Analysis and Critical Thinking in Assessment: Change Project Pilot Resources

AU - Brown, L.

AU - Moore, S.

AU - Turney, DJ

N1 - Other: Resources produced as part of the Change Project: Analysis and Critical Thinking in Assessment

M3 - Other contribution

PB - Research in Practice

APS

  • Teaching Tips

A Brief Guide for Teaching and Assessing Critical Thinking in Psychology

In my first year of college teaching, a student approached me one day after class and politely asked, “What did you mean by the word ‘evidence’?” I tried to hide my shock at what I took to be a very naive question. Upon further reflection, however, I realized that this was actually a good question, for which the usual approaches to teaching psychology provided too few answers. During the next several years, I developed lessons and techniques to help psychology students learn how to evaluate the strengths and weaknesses of scientific and nonscientific kinds of evidence and to help them draw sound conclusions. It seemed to me that learning about the quality of evidence and drawing appropriate conclusions from scientific research were central to teaching critical thinking (CT) in psychology.

In this article, I have attempted to provide guidelines to psychol­ogy instructors on how to teach CT, describing techniques I devel­oped over 20 years of teaching. More importantly, the techniques and approach described below are ones that are supported by scientific research. Classroom examples illustrate the use of the guidelines and how assessment can be integrated into CT skill instruction.

Overview of the Guidelines

Confusion about the definition of CT has been a major obstacle to teaching and assessing it (Halonen, 1995; Williams, 1999). To deal with this problem, we have defined CT as reflective think­ing involved in the evaluation of evidence relevant to a claim so that a sound or good conclusion can be drawn from the evidence (Bensley, 1998). One virtue of this definition is it can be applied to many thinking tasks in psychology. The claims and conclusions psychological scientists make include hypotheses, theoretical state­ments, interpretation of research findings, or diagnoses of mental disorders. Evidence can be the results of an experiment, case study, naturalistic observation study, or psychological test. Less formally, evidence can be anecdotes, introspective reports, commonsense beliefs, or statements of authority. Evaluating evidence and drawing appropriate conclusions along with other skills, such as distin­guishing arguments from nonarguments and finding assumptions, are collectively called argument analysis skills. Many CT experts take argument analysis skills to be fundamental CT skills (e.g., Ennis, 1987; Halpern, 1998). Psychology students need argument analysis skills to evaluate psychological claims in their work and in everyday discourse.

Some instructors expect their students will improve CT skills like argument analysis skills by simply immersing them in challenging course work. Others expect improvement because they use a textbook with special CT questions or modules, give lectures that critically review the literature, or have students complete written assignments. While these and other traditional techniques may help, a growing body of research suggests they are not sufficient to efficiently produce measurable changes in CT skills. Our research on acquisition of argument analysis skills in psychology (Bensley, Crowe, Bernhardt, Buchner, & Allman, in press) and on critical reading skills (Bensley & Haynes, 1995; Spero & Bensley, 2009) suggests that more explicit, direct instruction of CT skills is necessary. These results concur with results of an earlier review of CT programs by Chance (1986) and a recent meta-analysis by Abrami et al., (2008).

Based on these and other findings, the following guidelines describe an approach to explicit instruction in which instructors can directly infuse CT skills and assessment into their courses. With infusion, instructors can use relevant content to teach CT rules and concepts along with the subject matter. Directly infus­ing CT skills into course work involves targeting specific CT skills, making CT rules, criteria, and methods explicit, providing guided practice in the form of exercises focused on assessing skills, and giving feedback on practice and assessments. These components are similar to ones found in effective, direct instruc­tion approaches (Walberg, 2006). They also resemble approaches to teaching CT proposed by Angelo (1995), Beyer (1997), and Halpern (1998). Importantly, this approach has been successful in teaching CT skills in psychology (e.g., Bensley, et al., in press; Bensley & Haynes, 1995; Nieto & Saiz, 2008; Penningroth, Despain, & Gray, 2007). Directly infusing CT skill instruction can also enrich content instruction without sacrificing learning of subject matter (Solon, 2003). The following seven guidelines, illustrated by CT lessons and assessments, explicate this process.

Seven Guidelines for Teaching and Assessing Critical Thinking

1. Motivate your students to think critically

Critical thinking takes effort. Without proper motivation, students are less inclined to engage in it. Therefore, it is good to arouse interest right away and foster commitment to improving CT throughout a course. One motivational strategy is to explain why CT is important to effective, professional behavior. Often, telling a compelling story that illustrates the consequences of failing to think critically can mo­tivate students. For example, the tragic death of 10-year-old Candace Newmaker at the hands of her therapists practicing attachment therapy illustrates the perils of using a therapy that has not been supported by good empirical evidence (Lilienfeld, 2007).

Instructors can also pique interest by taking a class poll posing an interesting question on which students are likely to have an opinion. For example, asking students how many think that the full moon can lead to increases in abnormal behavior can be used to introduce the difference between empirical fact and opinion or common sense belief. After asking students how psychologists answer such questions, instructors might go over the meta-analysis of Rotton and Kelly (1985). Their review found that almost all of the 37 studies they reviewed showed no association between the phase of the moon and abnormal behavior with only a few, usually poorly, controlled studies supporting it. Effect size over all stud­ies was very small (.01). Instructors can use this to illustrate how psychologists draw a conclusion based on the quality and quantity of research studies as opposed to what many people commonly believe. For other interesting thinking errors and misconceptions related to psychology, see Bensley (1998; 2002; 2008), Halpern (2003), Ruscio (2006), Stanovich (2007), and Sternberg (2007).

Attitudes and dispositions can also affect motivation to think critically. If students lack certain CT dispositions such as open-mindedness, fair-mindedness, and skepticism, they will be less likely to think critically even if they have CT skills (Halpern, 1998). Instructors might point out that even great scientists noted for their powers of reasoning sometimes fail to think critically when they are not disposed to use their skills. For example, Alfred Russel Wallace who used his considerable CT skills to help develop the concept of natural selection also believed in spiritualistic contact with the dead. Despite considerable evidence that mediums claiming to contact the dead were really faking such contact, Wallace continued to believe in it (Bensley, 2006). Likewise, the great American psychologist William James, whose reasoning skills helped him develop the seeds of important contemporary theories, believed in spiritualism despite evidence to the contrary.

2. Clearly state the CT goals and objectives for your class

Once students are motivated, the instructor should focus them on what skills they will work on during the course. The APA task force on learning goals and objectives for psychology listed CT as one of 10 major goals for students (Halonen et al., 2002). Under critical thinking they have further specified outcomes such as evaluating the quality of information, identifying and evaluating the source and credibility of information, recognizing and defending against think­ing errors and fallacies. Instructors should publish goals like these in their CT course objectives in their syllabi and more specifically as assignment objectives in their assignments. Given the pragmatic penchant of students for studying what is needed to succeed in a course, this should help motivate and focus them.

To make instruction efficient, course objectives and lesson ob­jectives should explicitly target CT skills to be improved. Objectives should specify the behavior that will change in a way that can be measured. A course objective might read, “After taking this course, you will be able to analyze arguments found in psychological and everyday discussions.” When the goal of a lesson is to practice and improve specific microskills that make up argument analysis, an assignment objective might read “After successfully completing this assignment, you will be able to identify different kinds of evidence in a psychological discussion.” Or another might read “After suc­cessfully completing this assignment, you will be able to distinguish arguments from nonarguments.” Students might demonstrate they have reached these objectives by showing the behavior of correctly labeling the kinds of evidence presented in a passage or by indicating whether an argument or merely a claim has been made. By stating objectives in the form of assessable behaviors, the instructor can test these as assessment hypotheses.

Sometimes when the goal is to teach students how to decide which CT skills are appropriate in a situation, the instructor may not want to identify specific skills. Instead, a lesson objective might read, “After successfully completing this assignment, you will be able to decide which skills and knowledge are appropriate for criti­cally analyzing a discussion in psychology.”

3. Find opportunities to infuse CT that fit content and skill requirements of your course

To improve their CT skills, students must be given opportunities to practice them. Different courses present different opportunities for infusion and practice. Stand-alone CT courses usually provide the most opportunities to infuse CT. For example, the Frostburg State University Psychology Department has a senior seminar called “Thinking like a Psychologist” in which students complete lessons giving them practice in argument analysis, critical reading, critically evaluating information on the Internet, distinguishing science from pseudoscience, applying their knowledge and CT skills in simula­tions of psychological practice, and other activities.

In more typical subject-oriented courses, instructors must find specific content and types of tasks conducive to explicit CT skill instruction. For example, research methods courses present several opportunities to teach argument analysis skills. Instructors can have students critically evaluate the quality of evidence provided by studies using different research methods and designs they find in PsycINFO and Internet sources. This, in turn, could help students write better critical evaluations of research for research reports.

A cognitive psychology teacher might assign a critical evalu­ation of the evidence on an interesting question discussed in text­book literature reviews. For example, students might evaluate the evidence relevant to the question of whether people have flashbulb memories such as accurately remembering the 9-11 attack. This provides the opportunity to teach them that many of the studies, although informative, are quasi-experimental and cannot show causation. Or, students might analyze the arguments in a TV pro­gram such as the fascinating Nova program Kidnapped by Aliens on people who recall having been abducted by aliens.

4. Use guided practice, explicitly modeling and scaffolding CT.

Guided practice involves modeling and supporting the practice of target skills, and providing feedback on progress towards skill attainment. Research has shown that guided practice helps student more efficiently acquire thinking skills than unguided and discovery approaches (Meyer, 2004).

Instructors can model the use of CT rules, criteria, and proce­dures for evaluating evidence and drawing conclusions in many ways. They could provide worked examples of problems, writing samples displaying good CT, or real-world examples of good and bad thinking found in the media. They might also think out loud as they evaluate arguments in class to model the process of thinking.

To help students learn to use complex rules in thinking, instruc­tors should initially scaffold student thinking. Scaffolding involves providing product guidelines, rules, and other frameworks to support the process of thinking. Table 1 shows guidelines like those found in Bensley (1998) describing nonscientific kinds of evidence that can support student efforts to evaluate evidence in everyday psychologi­cal discussions. Likewise, Table 2 provides guidelines like those found in Bensley (1998) and Wade and Tavris (2005) describing various kinds of scientific research methods and designs that differ in the quality of evidence they provide for psychological arguments.

In the cognitive lesson on flashbulb memory described earlier, students use the framework in Table 2 to evaluate the kinds of evidence in the literature review. Table 1 can help them evaluate the kinds of evidence found in the Nova video Kidnapped by Aliens . Specifically, they could use it to contrast scientific authority with less credible authority. The video includes statements by scientific authorities like Elizabeth Loftus based on her extensive research contrasted with the nonscientific authority of Bud Hopkins, an artist turned hypnotherapist and author of popular books on alien abduction. Loftus argues that the memories of alien abduction in the children interviewed by Hopkins were reconstructed around the suggestive interview questions he posed. Therefore, his conclu­sion that the children and other people in the video were recalling actual abduction experiences was based on anecdotes, unreliable self-reports, and other weak evidence.

Modeling, scaffolding, and guided practice are especially useful in helping students first acquire CT skills. After sufficient practice, however, instructors should fade these and have students do more challenging assignments without these supports to promote transfer.

5. Align assessment with practice of specific CT skills

Test questions and other assessments of performance should be similar to practice questions and problems in the skills targeted but differ in content. For example, we have developed a series of practice and quiz questions about the kinds of evidence found in Table 1 used in everyday situations but which differ in subject matter from practice to quiz. Likewise, other questions employ research evidence examples corresponding to Table 2. Questions ask students to identify kinds of evidence, evaluate the quality of the evidence, distinguish arguments from nonarguments, and find assumptions in the examples with practice examples differing in content from assessment items.

6. Provide feedback and encourage students to reflect on it

Instructors should focus feedback on the degree of attainment of CT skill objectives in the lesson or assessment. The purpose of feedback is to help students learn how to correct faulty thinking so that in the future they monitor their thinking and avoid such problems. This should increase their metacognition or awareness and control of their thinking, an important goal of CT instruction (Halpern, 1998).

Students must use their feedback for it to improve their CT skills. In the CT exercises and critical reading assignments, students receive feedback in the form of corrected responses and written feedback on open-ended questions. They should be advised that paying attention to feedback on earlier work and assessments should improve their performance on later assessments.

7. Reflect on feedback and assessment results to improve CT instruction

Instructors should use the feedback they provide to students and the results of ongoing assessments to ‘close the loop,’ that is, use these outcomes to address deficiencies in performance and improve instruction. In actual practice, teaching and assessment strategies rarely work optimally the first time. Instructors must be willing to tinker with these to make needed improvements. Reflec­tion on reliable and valid assessment results provides a scientific means to systematically improve instruction and assessment.

Instructors may find the direct infusion approach as summarized in the seven guidelines to be efficient, especially in helping students acquire basic CT skills, as research has shown. They may especially appreciate how it allows them to take a scientific approach to the improvement of instruction. Although the direct infusion approach seems to efficiently promote acquisition of CT skills, more research is needed to find out if students transfer their skills outside of the class­room or whether this approach needs adjustment to promote transfer.

Table 1. Strengths and Weaknesses of Nonscientific Sources and Kinds of Evidence

Table 2. Strengths and Weaknesses of Scientific Research Methods/Designs Used as Sources of Evidence

Abrami, P. C., Bernard, R. M., Borokhovhovski, E., Wade, A., Surkes, M. A., Tamim, R., et al., (2008). Instructional interventions affecting critical thinking skills and dispositions: A stage 1 meta-analysis. Review of Educational Research, 4 , 1102–1134.

Angelo, T. A. (1995). Classroom assessment for critical thinking. Teaching of Psychology , 22(1), 6–7.

Bensley, D.A. (1998). Critical thinking in psychology: A unified skills approach. Pacific Grove, CA: Brooks/Cole.

Bensley, D.A. (2002). Science and pseudoscience: A critical thinking primer. In M. Shermer (Ed.), The Skeptic encyclopedia of pseudoscience. (pp. 195–203). Santa Barbara, CA: ABC–CLIO.

Bensley, D.A. (2006). Why great thinkers sometimes fail to think critically. Skeptical Inquirer, 30, 47–52.

Bensley, D.A. (2008). Can you learn to think more like a psychologist? The Psychologist, 21, 128–129.

Bensley, D.A., Crowe, D., Bernhardt, P., Buckner, C., & Allman, A. (in press). Teaching and assessing critical thinking skills for argument analysis in psychology. Teaching of Psychology .

Bensley, D.A. & Haynes, C. (1995). The acquisition of general purpose strategic knowledge for argumentation. Teaching of Psychology, 22 , 41–45.

Beyer, B.K. (1997). Improving student thinking: A comprehensive approach . Boston: Allyn & Bacon.

Chance, P. (1986) Thinking in the classroom: A review of programs . New York: Instructors College Press.

Ennis, R.H. (1987). A taxonomy of critical thinking dispositions and abilities. In J. B. Baron & R. F. Sternberg (Eds.). Teaching thinking skills: Theory and practice (pp. 9–26). New York: Freeman.

Halonen, J.S. (1995). Demystifying critical thinking. Teaching of Psychology, 22 , 75–81.

Halonen, J.S., Appleby, D.C., Brewer, C.L., Buskist, W., Gillem, A. R., Halpern, D. F., et al. (APA Task Force on Undergraduate Major Competencies). (2002) Undergraduate psychology major learning goals and outcomes: A report. Washington, DC: American Psychological Association. Retrieved August 27, 2008, from http://www.apa.org/ed/pcue/reports.html .

Halpern, D.F. (1998). Teaching critical thinking for transfer across domains: Dispositions, skills, structure training, and metacognitive monitoring. American Psychologist , 53 , 449–455.

Halpern, D.F. (2003). Thought and knowledge: An introduction to critical thinking . (3rd ed.). Mahwah, NJ: Erlbaum.

Lilienfeld, S.O. (2007). Psychological treatments that cause harm. Perspectives on Psychological Science , 2 , 53–70.

Meyer, R.E. (2004). Should there be a three-strikes rule against pure discovery learning? The case for guided methods of instruction. American Psychologist , 59 , 14–19.

Nieto, A.M., & Saiz, C. (2008). Evaluation of Halpern’s “structural component” for improving critical thinking. The Spanish Journal of Psychology , 11 ( 1 ), 266–274.

Penningroth, S.L., Despain, L.H., & Gray, M.J. (2007). A course designed to improve psychological critical thinking. Teaching of Psychology , 34 , 153–157.

Rotton, J., & Kelly, I. (1985). Much ado about the full moon: A meta-analysis of lunar-lunacy research. Psychological Bulletin , 97 , 286–306.

Ruscio, J. (2006). Critical thinking in psychology: Separating sense from nonsense. Belmont, CA: Wadsworth.

Solon, T. (2007). Generic critical thinking infusion and course content learning in introductory psychology. Journal of Instructional Psychology , 34(2), 972–987.

Stanovich, K.E. (2007). How to think straight about psychology . (8th ed.). Boston: Pearson.

Sternberg, R.J. (2007). Critical thinking in psychology: It really is critical. In R. J. Sternberg, H. L. Roediger, & D. F. Halpern (Eds.), Critical thinking in psychology. (pp. 289–296) . Cambridge, UK: Cambridge University Press.

Wade, C., & Tavris, C. (2005) Invitation to psychology. (3rd ed.). Upper Saddle River, NJ: Prentice Hall.

Walberg, H.J. (2006). Improving educational productivity: A review of extant research. In R. F. Subotnik & H. J. Walberg (Eds.), The scientific basis of educational productivity (pp. 103–159). Greenwich, CT: Information Age.

Williams, R.L. (1999). Operational definitions and assessment of higher-order cognitive constructs. Educational Psychology Review , 11 , 411–427.

' src=

Excellent article.

' src=

Interesting and helpful!

APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines .

Please login with your APS account to comment.

About the Author

D. Alan Bensley is Professor of Psychology at Frostburg State University. He received his Master’s and PhD degrees in cognitive psychology from Rutgers University. His main teaching and research interests concern the improvement of critical thinking and other cognitive skills. He coordinates assessment for his department and is developing a battery of instruments to assess critical thinking in psychology. He can be reached by email at [email protected] Association for Psychological Science December 2010 — Vol. 23, No. 10

research in practice analysis and critical thinking in assessment

Student Notebook: Five Tips for Working with Teaching Assistants in Online Classes

Sarah C. Turner suggests it’s best to follow the golden rule: Treat your TA’s time as you would your own.

Teaching Current Directions in Psychological Science

Aimed at integrating cutting-edge psychological science into the classroom, Teaching Current Directions in Psychological Science offers advice and how-to guidance about teaching a particular area of research or topic in psychological science that has been

European Psychology Learning and Teaching Conference

The School of Education of the Paris Lodron University of Salzburg is hosting the next European Psychology Learning and Teaching (EUROPLAT) Conference on September 18–20, 2017 in Salzburg, Austria. The main theme of the conference

Privacy Overview

SEP home page

  • Table of Contents
  • New in this Archive
  • Chronological
  • Editorial Information
  • About the SEP
  • Editorial Board
  • How to Cite the SEP
  • Special Characters
  • Support the SEP
  • PDFs for SEP Friends
  • Make a Donation
  • SEPIA for Libraries
  • Back to Entry
  • Entry Contents
  • Entry Bibliography
  • Academic Tools
  • Friends PDF Preview
  • Author and Citation Info
  • Back to Top

Supplement to Critical Thinking

How can one assess, for purposes of instruction or research, the degree to which a person possesses the dispositions, skills and knowledge of a critical thinker?

In psychometrics, assessment instruments are judged according to their validity and reliability.

Roughly speaking, an instrument is valid if it measures accurately what it purports to measure, given standard conditions. More precisely, the degree of validity is “the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests” (American Educational Research Association 2014: 11). In other words, a test is not valid or invalid in itself. Rather, validity is a property of an interpretation of a given score on a given test for a specified use. Determining the degree of validity of such an interpretation requires collection and integration of the relevant evidence, which may be based on test content, test takers’ response processes, a test’s internal structure, relationship of test scores to other variables, and consequences of the interpretation (American Educational Research Association 2014: 13–21). Criterion-related evidence consists of correlations between scores on the test and performance on another test of the same construct; its weight depends on how well supported is the assumption that the other test can be used as a criterion. Content-related evidence is evidence that the test covers the full range of abilities that it claims to test. Construct-related evidence is evidence that a correct answer reflects good performance of the kind being measured and an incorrect answer reflects poor performance.

An instrument is reliable if it consistently produces the same result, whether across different forms of the same test (parallel-forms reliability), across different items (internal consistency), across different administrations to the same person (test-retest reliability), or across ratings of the same answer by different people (inter-rater reliability). Internal consistency should be expected only if the instrument purports to measure a single undifferentiated construct, and thus should not be expected of a test that measures a suite of critical thinking dispositions or critical thinking abilities, assuming that some people are better in some of the respects measured than in others (for example, very willing to inquire but rather closed-minded). Otherwise, reliability is a necessary but not a sufficient condition of validity; a standard example of a reliable instrument that is not valid is a bathroom scale that consistently under-reports a person’s weight.

Assessing dispositions is difficult if one uses a multiple-choice format with known adverse consequences of a low score. It is pretty easy to tell what answer to the question “How open-minded are you?” will get the highest score and to give that answer, even if one knows that the answer is incorrect. If an item probes less directly for a critical thinking disposition, for example by asking how often the test taker pays close attention to views with which the test taker disagrees, the answer may differ from reality because of self-deception or simple lack of awareness of one’s personal thinking style, and its interpretation is problematic, even if factor analysis enables one to identify a distinct factor measured by a group of questions that includes this one (Ennis 1996). Nevertheless, Facione, Sánchez, and Facione (1994) used this approach to develop the California Critical Thinking Dispositions Inventory (CCTDI). They began with 225 statements expressive of a disposition towards or away from critical thinking (using the long list of dispositions in Facione 1990a), validated the statements with talk-aloud and conversational strategies in focus groups to determine whether people in the target population understood the items in the way intended, administered a pilot version of the test with 150 items, and eliminated items that failed to discriminate among test takers or were inversely correlated with overall results or added little refinement to overall scores (Facione 2000). They used item analysis and factor analysis to group the measured dispositions into seven broad constructs: open-mindedness, analyticity, cognitive maturity, truth-seeking, systematicity, inquisitiveness, and self-confidence (Facione, Sánchez, and Facione 1994). The resulting test consists of 75 agree-disagree statements and takes 20 minutes to administer. A repeated disturbing finding is that North American students taking the test tend to score low on the truth-seeking sub-scale (on which a low score results from agreeing to such statements as the following: “To get people to agree with me I would give any reason that worked”. “Everyone always argues from their own self-interest, including me”. “If there are four reasons in favor and one against, I’ll go with the four”.) Development of the CCTDI made it possible to test whether good critical thinking abilities and good critical thinking dispositions go together, in which case it might be enough to teach one without the other. Facione (2000) reports that administration of the CCTDI and the California Critical Thinking Skills Test (CCTST) to almost 8,000 post-secondary students in the United States revealed a statistically significant but weak correlation between total scores on the two tests, and also between paired sub-scores from the two tests. The implication is that both abilities and dispositions need to be taught, that one cannot expect improvement in one to bring with it improvement in the other.

A more direct way of assessing critical thinking dispositions would be to see what people do when put in a situation where the dispositions would reveal themselves. Ennis (1996) reports promising initial work with guided open-ended opportunities to give evidence of dispositions, but no standardized test seems to have emerged from this work. There are however standardized aspect-specific tests of critical thinking dispositions. The Critical Problem Solving Scale (Berman et al. 2001: 518) takes as a measure of the disposition to suspend judgment the number of distinct good aspects attributed to an option judged to be the worst among those generated by the test taker. Stanovich, West and Toplak (2011: 800–810) list tests developed by cognitive psychologists of the following dispositions: resistance to miserly information processing, resistance to myside thinking, absence of irrelevant context effects in decision-making, actively open-minded thinking, valuing reason and truth, tendency to seek information, objective reasoning style, tendency to seek consistency, sense of self-efficacy, prudent discounting of the future, self-control skills, and emotional regulation.

It is easier to measure critical thinking skills or abilities than to measure dispositions. The following eight currently available standardized tests purport to measure them: the Watson-Glaser Critical Thinking Appraisal (Watson & Glaser 1980a, 1980b, 1994), the Cornell Critical Thinking Tests Level X and Level Z (Ennis & Millman 1971; Ennis, Millman, & Tomko 1985, 2005), the Ennis-Weir Critical Thinking Essay Test (Ennis & Weir 1985), the California Critical Thinking Skills Test (Facione 1990b, 1992), the Halpern Critical Thinking Assessment (Halpern 2016), the Critical Thinking Assessment Test (Center for Assessment & Improvement of Learning 2017), the Collegiate Learning Assessment (Council for Aid to Education 2017), the HEIghten Critical Thinking Assessment (https://territorium.com/heighten/), and a suite of critical thinking assessments for different groups and purposes offered by Insight Assessment (https://www.insightassessment.com/products). The Critical Thinking Assessment Test (CAT) is unique among them in being designed for use by college faculty to help them improve their development of students’ critical thinking skills (Haynes et al. 2015; Haynes & Stein 2021). Also, for some years the United Kingdom body OCR (Oxford Cambridge and RSA Examinations) awarded AS and A Level certificates in critical thinking on the basis of an examination (OCR 2011). Many of these standardized tests have received scholarly evaluations at the hands of, among others, Ennis (1958), McPeck (1981), Norris and Ennis (1989), Fisher and Scriven (1997), Possin (2008, 2013a, 2013b, 2013c, 2014, 2020) and Hatcher and Possin (2021). Their evaluations provide a useful set of criteria that such tests ideally should meet, as does the description by Ennis (1984) of problems in testing for competence in critical thinking: the soundness of multiple-choice items, the clarity and soundness of instructions to test takers, the information and mental processing used in selecting an answer to a multiple-choice item, the role of background beliefs and ideological commitments in selecting an answer to a multiple-choice item, the tenability of a test’s underlying conception of critical thinking and its component abilities, the set of abilities that the test manual claims are covered by the test, the extent to which the test actually covers these abilities, the appropriateness of the weighting given to various abilities in the scoring system, the accuracy and intellectual honesty of the test manual, the interest of the test to the target population of test takers, the scope for guessing, the scope for choosing a keyed answer by being test-wise, precautions against cheating in the administration of the test, clarity and soundness of materials for training essay graders, inter-rater reliability in grading essays, and clarity and soundness of advance guidance to test takers on what is required in an essay. Rear (2019) has challenged the use of standardized tests of critical thinking as a way to measure educational outcomes, on the grounds that  they (1) fail to take into account disputes about conceptions of critical thinking, (2) are not completely valid or reliable, and (3) fail to evaluate skills used in real academic tasks. He proposes instead assessments based on discipline-specific content.

There are also aspect-specific standardized tests of critical thinking abilities. Stanovich, West and Toplak (2011: 800–810) list tests of probabilistic reasoning, insights into qualitative decision theory, knowledge of scientific reasoning, knowledge of rules of logical consistency and validity, and economic thinking. They also list instruments that probe for irrational thinking, such as superstitious thinking, belief in the superiority of intuition, over-reliance on folk wisdom and folk psychology, belief in “special” expertise, financial misconceptions, overestimation of one’s introspective powers, dysfunctional beliefs, and a notion of self that encourages egocentric processing. They regard these tests along with the previously mentioned tests of critical thinking dispositions as the building blocks for a comprehensive test of rationality, whose development (they write) may be logistically difficult and would require millions of dollars.

A superb example of assessment of an aspect of critical thinking ability is the Test on Appraising Observations (Norris & King 1983, 1985, 1990a, 1990b), which was designed for classroom administration to senior high school students. The test focuses entirely on the ability to appraise observation statements and in particular on the ability to determine in a specified context which of two statements there is more reason to believe. According to the test manual (Norris & King 1985, 1990b), a person’s score on the multiple-choice version of the test, which is the number of items that are answered correctly, can justifiably be given either a criterion-referenced or a norm-referenced interpretation.

On a criterion-referenced interpretation, those who do well on the test have a firm grasp of the principles for appraising observation statements, and those who do poorly have a weak grasp of them. This interpretation can be justified by the content of the test and the way it was developed, which incorporated a method of controlling for background beliefs articulated and defended by Norris (1985). Norris and King synthesized from judicial practice, psychological research and common-sense psychology 31 principles for appraising observation statements, in the form of empirical generalizations about tendencies, such as the principle that observation statements tend to be more believable than inferences based on them (Norris & King 1984). They constructed items in which exactly one of the 31 principles determined which of two statements was more believable. Using a carefully constructed protocol, they interviewed about 100 students who responded to these items in order to determine the thinking that led them to choose the answers they did (Norris & King 1984). In several iterations of the test, they adjusted items so that selection of the correct answer generally reflected good thinking and selection of an incorrect answer reflected poor thinking. Thus they have good evidence that good performance on the test is due to good thinking about observation statements and that poor performance is due to poor thinking about observation statements. Collectively, the 50 items on the final version of the test require application of 29 of the 31 principles for appraising observation statements, with 13 principles tested by one item, 12 by two items, three by three items, and one by four items. Thus there is comprehensive coverage of the principles for appraising observation statements. Fisher and Scriven (1997: 135–136) judge the items to be well worked and sound, with one exception. The test is clearly written at a grade 6 reading level, meaning that poor performance cannot be attributed to difficulties in reading comprehension by the intended adolescent test takers. The stories that frame the items are realistic, and are engaging enough to stimulate test takers’ interest. Thus the most plausible explanation of a given score on the test is that it reflects roughly the degree to which the test taker can apply principles for appraising observations in real situations. In other words, there is good justification of the proposed interpretation that those who do well on the test have a firm grasp of the principles for appraising observation statements and those who do poorly have a weak grasp of them.

To get norms for performance on the test, Norris and King arranged for seven groups of high school students in different types of communities and with different levels of academic ability to take the test. The test manual includes percentiles, means, and standard deviations for each of these seven groups. These norms allow teachers to compare the performance of their class on the test to that of a similar group of students.

Copyright © 2022 by David Hitchcock < hitchckd @ mcmaster . ca >

  • Accessibility

Support SEP

Mirror sites.

View this site from another server:

  • Info about mirror sites

The Stanford Encyclopedia of Philosophy is copyright © 2023 by The Metaphysics Research Lab , Department of Philosophy, Stanford University

Library of Congress Catalog Data: ISSN 1095-5054

Please sign in to continue

Processing payment, please wait...

  • children & families
  • Create account
  • open the search modal
  • open the menu

research in practice analysis and critical thinking in assessment

  • Workshop menu

Our online learning workshops are an effective and engaging way to deliver learning and development to meet organisational needs.

  • Learning and development

Online learning delivered directly to your organisation. Delivered by experienced facilitators, our online learning workshops use a range of methods to engage participants in active learning. This includes PowerPoint presentations, breakout room discussions, practice scenario examples, videos, individual exercises, and group discussion all aimed at addressing learning objectives.   

Each workshop is delivered via Microsoft Teams and designed to enable interactive participation for up to 20 people. Participants will benefit from having a quiet space to work without interruption. Participants will be provided with workshop materials, additional resources, and references. 

Following the workshop participants will be asked to provide feedback that will be collated and shared with your organisation. Participants will also be sent slides from the workshop, follow-up resources with further activities, and links to relevant materials. 

View the children and families workshop menu . 

View the adults workshop menu .  

How to book a workshop for your organisation

Research in Practice members 

Research in Practice members can select one online learning workshop per day of their membership allocation. Link Officers are invited to:   

  • Review the menu and choose workshop(s) that meet your local needs.   
  • Book via the  booking request form . Please only use this form if you are authorised to choose your organisation's membership allocation.
  • Once your booking is received, our learning team will work with you to confirm a delivery date and provide relevant information.  

The deadline for booking this year’s online workshops is Friday 4 October 2024 . Please ensure you have submitted your request before this date to avoid losing your allocation. If you are a Research in Practice member and would like to commission additional workshops, please contact us. 

Contact us 

For further information or if you are interested in commissioning an online workshop, please contact us:  [email protected] .

research in practice analysis and critical thinking in assessment

Queen's University Belfast Logo

  • Help & FAQ

Analysis and critical thinking in assessment: a literature review

  • School of Social Sciences, Education and Social Work

Research output : Other contribution

T1 - Analysis and critical thinking in assessment: a literature review

AU - Turney, D

M3 - Other contribution

PB - Research in Practice, Dartington

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Adv Med Educ Pract
  • PMC10408665

Assessing the Critical Thinking and Deep Analysis in Medical Education Among Instructional Practices

Abdulaziz i alhassan.

1 Department of Medical Education, College of Medicine, King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia

2 King Abdullah International Medical Research Center, Riyadh, Saudi Arabia

The purpose of this study was to examine the application of faculty to stimulate the critical thinking and deep analysis of their students through instructional practice including lecture design, assessment structure, and assignment instructions.

Faculty from multiple different health colleges at Saudi Arabia were asked to respond to survey items about the activities they use in their classrooms with regards to designing lectures, assessment structures, and instructional assignments. A correlation analysis was performed to determine if the level of applied critical thinking and deep analysis that is stimulated by faculty members were statistically related between designing lectures, assessment structure, and instructional assignments. An analysis of variance (ANOVA) was also performed to determine if there were significant differences based on the demographic characteristics of the participants and level of applied critical thinking and deep analysis.

A correlational analysis revealed that the mean score for designing lectures was 67.276, following by a mean score of 65.233 for instructional assignment and 64.688 for assessment structure. The result of the ANOVA showed that there was a significant difference in the perceptions of the participants between designing lectures, assessment structure, and instructional assignment (p<0.05).

The participants applied critical thinking and deep analysis when they design their lectures more than assessments and instructional assignments. They had the flexibility to stimulate critical thinking during the lecture activities. In contrast, this flexibility was limited when they were structuring the assessment as they had instructions to consider and were required to provide a rubric with unified key answer which is a mandatory requirement from the assessment department. This is due to the nature of high level of critical thinking answers that lead to high subjectivity in student responses.

An on-going debate within higher education is whether the goal of higher education should be to merely prepare students for jobs and employment or whether it should it be to prepare students to engage in critical thinking regardless of specific content or course of study. 1 Some have argued that education has been reduced to memorization and tests when the focus should be on helping students to think critically about information and develop the skills to engage in deep thought and analysis. 2 From the arts and humanities to medical education, there are discussions and debates about the need to help students engage in more critical thinking and to provide teaching and course design that is based on helping students become critical thinkers. 3 , 4 However, if there is a focus within higher education for faculty to increase efforts to have students engage in critical thinking and deep analysis rather than simply to memorize content for tests, it is necessary to understand what faculty are doing at the present time to motivate critical thinking and deep analysis.

The purpose of this study was to examine the application of faculty members to stimulate the critical thinking and deep analysis of their students through instructional practice including lecture design, assessment structure, and assignment instructions. If there is a concern across academic disciplines that higher education should not be solely about teaching to tests but to help students learn critical thinking and deep analysis skills, then there is a need to understand the activities that faculty are currently using to motivate critical thinking and deep analysis among their students. The findings of this study provide some understanding of the application of critical thinking and deep analysis stimulated by faculty activities in the classroom and can be used to make suggestions for future studies and future changes related to motivating critical thinking in the higher education classroom.

Defining Critical Thinking and Deep Analysis

Before examining some of the recent literature related to higher education activities related to increasing critical thinking and deep analysis skills of students, it is useful to define what is meant by critical thinking and deep analysis. Kahlke and Eva argued that while the idea of critical thinking is ubiquitous within higher education, there is a lack of agreement about what is meant by critical thinking. 4 Unlu explained that one definition of critical thinking that is often cited is that of John Dewey who defined critical thinking as the highest level of awareness that is possible to a person through both human senses and the mind. 5 Rear further explained that John Dewey also described critical thinking as reflective thinking in which people engage in attentive consideration of opinions and knowledge based on evidence that support the conclusions that they wish to make. 6

Dumitru (2019) argued that the contemporary thinkers and philosophers on critical thinkers generally based their ideas of critical thinking on the definition and explanation of the concept provided by Dewey. 3 In this regard, critical thinking can be viewed as the act of engaging with opinions and information to draw conclusions based on support and data for those conclusions. Another way of thinking about critical thinking might be to use data and facts to consider whether the opinions and information provided by others are indeed accurate and correct. Critical thinking is not merely about memorizing information provided by others but engaging with the information in relation to other information and facts.

The definition of critical thinking also relates directly to the idea of deep analysis. The concept of deep analysis is defined as the process of engaging in reflection of ideas and connecting information and knowledge for a greater understanding. 7 Deep analysis is the process of using critical thinking to draw conclusions that are valid based on broader knowledge. 8 As with critical thinking, deeper analysis requires more than just memorizing information. Instead, deep analysis requires bringing information and knowledge together from a variety of sources and disciplines to draw informed conclusions.

Course Design

The way in which courses are designed has gained an increase in interest and concern in higher education. The concern that exists around course design is whether higher education faculty are designing courses that require students to engage with information and take part in critical thinking and an innovation of ideas rather than simply listening to lectures and taking notes. 9 Rather than having students sit through traditional lectures in which the professor presents information and the students attempt to memorize information, courses should be designed so that students have critically think about information and even develop new ideas from the information that is presented. 10

Johnke and Liebscher (2020) explained that even while most educators and researchers agree that creativity in which students engage with problems and develop innovative solutions is important. 11 Higher education continues to focus on teaching routines and replication to students, focuses on problems and solutions that are already well-defined, lacks a focus on current problems and developing new solutions for current problems, and does not provide students with the ability to think and act creatively. 11 One of the problems that has been identified is that while higher education students have shown the ability to identify relevant and important information, they often lack the ability to justify solutions and critically assess information. 12

Ulger explained that when students are given problems that do not have routine solutions or in which multiple solutions may be possible, they engage in greater critical thinking as compared to students who are given a problem with a single, routine solutions. 13 Rather than students being given a problem for which a solution is already pre-determined or in which there may only be one solution, students should be given problems that require exploration, critical reflection, and self-assessment. 14 In this regard, higher education courses should be designed so that students are not focused on finding a pre-determined correct answer to a problem, but in using information, assessing their actions, and reflecting on information to not only suggest a solution, but also justify why the solution is valid.

The idea of assessment with regards to stimulating critical thinking and deep analysis in higher education students is an important issue given that faculty may not understand how to create assessments that require critical thinking. Rawlusyk (2018) noted that most higher education faculty learned about creating tests and assessments not through some formal course, but from personal experience and from information and advice provided by colleagues. 15 One of the problems that exists in higher education is that motivating and measuring critical thinking often requires putting students in situations in which they have to solve real-world problems as opposed to giving students standardized tests. 16 It is easier for higher education faculty, especially those who teach classes with large numbers of students, to rely on standardized tests to measure student performance. However, such tests are not likely to stimulate critical thinking in students.

Assessing students in a way that necessitates critical thinking requires giving them real-life problems and situations for which a solution is needed. For example, students might be given a problem such as whether increasing the number of migrants admitted into a country increases crime rates or whether a company is utilizing its financial resources in the best way to efficiently service its customers. Then, the students would be allowed to use course knowledge, statistics, data from other sources, and knowledge from other courses to address the problem and answer the question or provide a solution with justification. 17 In this way, students are not merely required to remember information, but are instead required to engage in real-world problem solving involving the use of various types of information, knowledge, reflection, and justification for solutions. 18

Another argument that has been made regarding assessment in higher education related to critical thinking and deep analysis is that the focus on critical thinking must occur in everyday practices before any assessments occur. In this regard, critical thinking activities such as reflective writings and problem solving need to be built into everyday lessons and assignments. 19 Students cannot be expected to engage in critical thinking and deep analysis on a formal test if critical thinking and deep analysis have not been stimulated in instructional practices.

Instruction

If instructional practice is a vital part of the ability to stimulate critical thinking and deep analysis in assessment in higher education, it is important to understand the means of instruction that are likely to lead students to engage in critical thinking and deep analysis. The teaching methods that stimulate critical thinking and deep analysis are those that are diverge from the traditional lecture. Higher education students who receive instruction using methods that require their active engagement, such as group projects, group discussions, and case studies, elicit critical thinking and deep analysis. 5

One instructional method that has received attention in recent years is the flipped classroom. The idea of the flipped classroom is that activities that would normally occur in the classroom, such as reading basic course information or receiving a lecture about new content, is performed at home while classroom time is used for using the content for problem solving and engagement. 20 The flipped classroom is promoted in higher education because students are more engaged with their instructors and with each other in actively using the information and knowledge that is part of a course. 21 However, a downside of the flipped classroom is that this instructional method generally requires more work for instructors because lectures and other learning materials must be prepared ahead of time and made available to students outside of classroom time. Furthermore, problems arise in the classroom if students have not consumed and studied the learning materials outside of class in preparation for the classroom activities. 22

There are other instructional methods that are also used when the goal is to stimulate critical thinking and deep analysis. One method is the case study method in which students are given a narrative about a problem or situation and then asked to address the specific problem by creating a solution and justifying that solution with knowledge and data. 23 Another instructional method that has been found to stimulate critical thinking and deep analysis among higher education students is peer review. The process of peer review involves students critiquing and assessing the work of other students and discussing the content of the work and ways to improve upon it. 24 The peer review process serves as a way for students to actively share ideas and information, collaborate on how to improve the work that is performed, and better evaluate their own work. 25

The underlying issue that seems apparent in stimulating critical thinking in the way in which higher education instruction is carried out is to have students engage in problem solving. Students need to receive instruction in which they are asked to examine information and use knowledge from various sources to create solutions to problems that they required to justify. 25 While this may initially require additional work on the part of higher education faculty and be a change for those who are accustomed to traditional lecture instruction, it is what is needed if students are going to learn to engage in critical thinking and deep analysis.

Methodology

Sampling method and size.

In order to evaluate the level of applied critical thinking and deep analysis that is stimulated by faculty members with regards to their instructional style, designing lectures, assessment structure, and assignment instructions, a cross sectional quantitative study design was used. A questionnaire was distributed to faculty members at King Saud bin Abdulaziz University for Health Sciences (KSAU-HS) at Saudi Arabia. KSAU-HS is a specialized university in health sciences and it has three campuses in three different cities situated at Riyadh, Jeddah and Al Ahsa. These campuses run the same curriculum for each educational program. The faculty members at all campuses share the same academic responsibilities. A unified criterion for student enrollment is applied at all campuses. A non-probability consecutive sampling technique was used in which faculty at the three different campuses were potential participants. The potential sample included faculty of all academic titles, teaching assistant, lecturer, assistant professor, associate professor, and full professor, to obtain a better representative sample of the intended population. Those faculty that do not have a teaching role were excluded from the study.

To achieve a confidence level of 95% with a margin of error of 5% and a prevalence of 50% faculty members in a population of 1714 faculty members, the estimated sample size required was 314 faculty members. The estimated sample size was calculated using Piface by Russell V. Lenth, version 1.76. The final sample consisted of 232 faculty members.

Data Collection

The participants were given a questionnaire that was designed based on the hierarchy of Bloom’s Taxonomy of critical thinking and deep analysis with items that were used to measure the three areas designing lectures, assessment structure, and instructional assignments. The questionnaire was administered via email to the targeted participants. The reason for administering the questionnaire via emails was to easily access health science faculty members at the three different campuses from which the participants were drawn. Three reminders were sent during a duration of one month. The emails were sent to the participants by the author includes a link for the survey that will redirect the participant to fill the survey without the need to reply to the email nor the need of identification disclosure to ensure confidentiality and anonymity of participants. In addition, by administering the questionnaire via email, it was hoped that a larger number of faculty would complete the questionnaire because of the ability for them to complete it at their convenience.

The questionnaire consisted of a total of 18 items with six items for each area of designing lectures, assessment structure, and instructional assignments. A 5-point Likert scale ranging from Strongly Disagree to Strongly Agree was used as response options for each survey item. The questionnaire also contained five demographic questions regarding the gender, academic job title, years of experience, academic role, and college in which the participants taught. Two open-ended questions were also included in which participants were asked to list any points that might stimulate or hinder the critical thinking and deep analysis in the instructional practices. The questionnaire was administered on a smaller group of health science faculty members as a pilot test in another university to ensure clarity and feasibility of questionnaire items. Face validity was performed through medical education experts while construct validity was attained through alignment of each item with bloom’s taxonomy levels for evaluating critical thinking.

Data Analysis

The data analysis consisted of both descriptive statistics and correlational analysis. Descriptive statistics are presented for the means and standard deviations of the demographic variables and the questionnaire items related to designing lectures, assessment structure, and instructional assignments. A correlation analysis was performed to determine if the level of applied critical thinking and deep analysis that is stimulated by faculty members were statistically related between designing lectures, assessment structure, and instructional assignments. An analysis of variance (ANOVA) was also performed to determine if there were significant differences based on the demographic characteristics of the participants and level of applied critical thinking and deep analysis that is stimulated by faculty members with regards to their instructional style, designing lectures, assessment structure, and assignment instructions.

Descriptive Statistics

Table 1 shows the descriptive statistics for the participants who took part in the study. The sample was split fairly evenly between males and females with 52.6% of the participants being female and 47.4% being male. In terms of academic job titles, most of the participants were either Lectures or Assistant Professors at 34.7% and 43.4%, respectfully. Another 12.8% of the participants held the job title of Teaching Assistant, while Only 6.6% of the participants were Associate Professors and only 2.6% were Full Professors. In terms of years of experience, 27% of the participants had 1 to 5 years of experience, 29.6% had 5 to 10 years of experience, and 43.4% had 10 or more years of experience. Most of the participants, 54.4%, were from Riyadh region, while 22.4% were from Jeddah region, and 23.2% were from Al Ahsa region.

Demographic Variables of Faculty Member

Abbreviations : COM, College of Medicine; COSHP, College of Science and Health Professions; CON, College of Nursing; CAMS, College of Applied Medical Sciences; COD, College of Dentistry; COPH, College of Public Health; COPHHI, College of Public Health and Health Informatics Research.

Table 2 shows the descriptive statistics for the 18 questionnaire items used to measure designing lectures, assessment structure, and instructional assignment of the participants. The faculty members who took part in the study indicated that they agreed or strongly agreed with almost all of the questionnaire items. The item that received the highest mean response of 4.50 was “ask students questions to ensure their understanding”. The item that received the lowest mean response of 3.560 was “student recalling of information”. The only three questionnaire items that received less than a mean of 4.000 from the participants was “student recalling information” at 3.560, “to recall knowledge from their memory” at 3.890, and “students’ ability to create and come up with innovative solutions” at 3.960.

The Mean Response of Faculty Members for Questionnaire Items

Figure 1 shows the percentage of participants who indicated that they strongly agreed with each of the items 18 items used to measure designing lectures, assessment structure, and instructional assignment. In total, 91% of the participants strongly agreed with the statement “to understand what they have learned” The questionnaire item that received the lowest percentage of participants indicating that they strongly agreed was “students recalling of information” at 58.6%.

An external file that holds a picture, illustration, etc.
Object name is AMEP-14-845-g0001.jpg

Percentage of participants that strongly agreed with each questionnaire item.

Item Consistency

Before more closely examining the perceptions of the participants along each of the three areas of designing lectures, assessment structure, and instructional assignment, it is important to determine if there was internal consistency among the six questions that made up each of the three areas of interest. Cronbach’s alpha was calculated for all 18 items together to determine if there was internal consistency between the items for the entire questionnaire. The Cronbach’s alpha for all 18 questions was 0.945, which indicated a very high level of internal consistency among the questions. Next, Cronbach’s alpha was calculated for each of the three sections of the questionnaire. Cronbach’s alpha was 0.899 for designing lectures, 0.856 for assessment structure, and 0.876 for instructional assignment. Based on these values, it was determined that internal consistency existed among the six questions that comprised each of the three areas of the questionnaire.

Designing Lectures

Figure 2 shows the percentage of participants who indicated that they strongly agreed with the six items related to designing lectures. The item that received the most agreement at 90.0% was “ask students questions to ensure their understanding”. In contrast, the item that received the lowest agreement related to designing lectures was “allow students to make judgements based on a set of criteria via evaluating evidence” at 76.2%.

An external file that holds a picture, illustration, etc.
Object name is AMEP-14-845-g0002.jpg

Percent of participants who agreed on designing lecture items.

Assessment Structure

In terms of assessment structure, Figure 3 shows that the item that received the most agreement among the participants with regards to the questions in their assessments examining the students’ abilities was “to understand what they have learned” at 91.0%. However, the item that received the lowest agreement among the participants with regards to assessment structure was “to create and come up with innovative solutions”. Only 65.6% of the participants strongly agreed that that questions in their assessments tended to involve having students come up with innovative solutions was an important aspect for assessment structure.

An external file that holds a picture, illustration, etc.
Object name is AMEP-14-845-g0003.jpg

Percent of participants who agreed on assessment structure items.

Instructional Assignment

Regarding instructional assignment, Figure 4 shows that 88.3% of the participants strongly agreed that their assignments were structured to foster “application of learned knowledge”. In contrast, “student recalling of information” received the least agreement among the participants regarding instructional assessment. Only 58.6% of the faculty who were surveyed strongly agreed that their assignments were structured to foster the ability of students to recall information.

An external file that holds a picture, illustration, etc.
Object name is AMEP-14-845-g0004.jpg

Percent of participants who agreed on instructional assignment items.

Correlation Analysis

A correlation analysis was conducted between the three areas of designing lectures, assessment structure, and instructional assignment. The correlation analysis was performed by summing the scores for the six items in each of the three sections of the questionnaire and finding the mean total score for each section. Table 3 shows that the mean score for designing lectures was 67.276, following by a mean score of 65.233 for instructional assignment and 64.688 for assessment structure. An analysis of variance (ANOVA) was performed to determine if there was a significant difference in the total mean scores between the three areas. The result of the ANOVA showed that there was a significant difference in the perceptions of the participants between designing lectures, assessment structure, and instructional assignment (p<0.05). Based on these results, the participants more strongly agreed with the items related to designing lectures for deep learning and critical thinking as compared to the items related to using questions in their assessments for deep learning and critical thinking or having assignments for deep learning and critical thinking.

The Mean Response of Faculty Members for Questionnaire Sections

Table 4 shows the correlation matrix for the three sections of the questionnaire. The mean responses from the participants to each of the three sections of the questionnaire were significantly correlated with each other. The perceptions of the participants regarding designing lectures, assessment structure, and instructional assignment were statistically significantly related to each other. The significant correlations indicate that the perceptions of the participants regarding designing lectures was related to their perceptions of assessment structure, which were related to their perceptions of instructional assignment.

Correlation of Questionnaire Sections

Note : **p<0.01.

Differences Based on Demographic Factors

One other analysis of the data was performed, which was an analysis of variance (ANOVA) of the mean responses to the three areas of the questionnaire in relation to each of the demographic variables. The reason for performing an ANOVA between each of the demographic variables and each of the three areas of the questionnaire was to determine if there were significant differences in the perceptions of the participants regarding designing lectures, assessment structure, and instructional design based on their gender, their academic job titles, their years of experience, the three main cities on which they worked, and health science colleges in which they worked.

Table 5 shows the mean values for each of the three areas of the questionnaire in relation to each of the demographic variables. The p-value below each demographic variable is the p-value of the ANOVA that was performed. The table shows that for each of the demographic variables and each of the three areas of the questionnaire, there was not a significant difference among the participants with regards to their perceptions of designing lectures, assessment structure, and instructional design. In this regard, there was not a significant difference in the application of stimulating critical thinking and deep analysis in the students in relation to the designing lectures, assessment structure, and instructional assignment based on gender, job title, years of experience, the three main cities on which the participants worked, or the specific health science colleges in which the participants taught.

Analysis of Variance of Mean Questionnaire Section Responses Based on Demographic Variables

Note : All the P-values (bold text) shown in the table are insignificant.

The results of this study raise some question as to whether the participants engaged in designing courses, assessment structure, and instructional assignment that stimulated critical thinking and deep analysis on the part of their students. While 84.1% of the participants strongly agreed that they designed lecture items to require students to apply what they had learned, only 76.2% of the participants strongly agreed that they design lectures to allow students to make judgements based on a set of criteria via evaluating evidence. Activities such as requiring students to apply what they have learned and to make judgements by evaluating evidence are important aspects of course design that stimulate critical thinking and deep analysis on the part of students. 9 , 13

While it may seem a large percentage of the participants designed their lectures around activities, it is also concerning that about one-fifth of the participants did not design their lectures around activities that require students to engage in critical thinking and deep analysis. Instead, 90.0% of the participants strongly agreed that they design their lectures to ask students questions to ensure their understanding. This would seem to be a traditional action that occurs when higher education faculty deliver lectures. An instructor may lecture for a few minutes before stopping to ask students if they have any questions or understand the material. The argument can be made that asking questions to determine if students understand course content, even if all students participate in some way, is merely an activity of memorization. If students are only asked to recall something that was presented in a lecture without using that information to engage in problem solving or another task that requires the use of new information combined with other knowledge, then critical thinking and deep analysis are not occurring.

In terms of assessment structure, 87.0% of the participants strongly agreed that their assessment structures are created for students to apply what they have learned. In addition, 77.6% of the participants strongly agreed that their assessment structures allowed students to evaluate and make judgments based on a set of criteria or evidence and only 65.6% of participants strongly agreed that their assessment structures allowed students to create and come up with innovative solutions. From these figures, it seems appropriate to conclude that a large percentage of the faculty who took part in this study are not stimulating critical thinking and deep analysis among their students in their assessment structures.

Assessment structures that stimulate critical thinking and deep analysis in students require that students engage in analysis of knowledge and data and create solutions to problems. 18 About one-fourth of the participants in this study did not strongly agree that they create assessments to allow students to provide answers based on their own analysis. Even more troubling is that about 45% of the participants did not strongly agree that they create assessment structures to allow students to come up with innovative solutions. The conclusion that can be made is that the faculty who took part in this study are not fully incorporating activities into their assessment structures to stimulate critical thinking and deep analysis in their students.

Instructional assignments that stimulate critical thinking and deep analysis among students are those that require the students to utilize information and knowledge to engage in problem solving and justifying solutions to problems. 23 In terms of instructional assignment, 88.3% of the participant strongly agreed that their instructional assignments are created so that students can apply learned knowledge. However, only 84.1% of the participants strongly agreed that their instructional assignments were designed based on students’ ability to evaluate existing information or evidence and only 73.8% strongly agreed that their instructional assignments were created based on students’ ability to come up with innovative solutions.

The one area that is positive in terms of the participants stimulating critical thinking and deep analysis among their students with regards to instructional assignments is creating assignments for students to recall information. Only 58.6% of the participants strongly agreed that their instructional assignments were created for student recalling of information. Critical thinking and deep analysis do not occur when students merely must recall information. 10 However, having nearly 60% of the participants strongly agree that their instructional assignments are created for students to recall information means that more than half of the participants give students assignments that do not stimulate critical thinking and deep analysis.

Overall, the responses provided by the participants show that some faculty are designing lectures, having assessment structures, and create instructional assignments that are meant to stimulate critical thinking and deep analysis among higher education students. The problem, however, is that there are many activities and processes that the participants indicated that they used in their lecture designs, assessment structures, and instructional assignments that do not stimulate critical thinking and deep analysis. Based on the data collected for this study, it appears that while the faculty who were surveyed engage in some activities that stimulate critical thinking and deep analysis, changes could be made to incorporate more activities that would further stimulate critical thinking and deep analysis among their students.

The strength of this study is that the participants were asked to respond to items that encompassed course design, assessment, and instruction. By collecting data that encompassed these three areas, it was possible to examine how faculty were stimulating critical thinking and deep analysis in their students across instructional practices. Fewer assumptions had to be made about how the participants may have interpreted survey items in relation to designing their courses as compared to instructional assignments they gave to students or the types of assessments they used.

The limitation of this study is that the majority of faculty were from one university in Saudi Arabia given the fact that it includes three campuses in three different cities. The ability to generalize the findings of this study to the larger population of higher education faculty in Saudi Arabia or even a single area of Saudi Arabia is not possible. In addition, the sample of participants likely does not represent the health science faculty members from which they were drawn. However, even with this limitation, the results of this study provide a basis from which to engage in further research to understand whether faculty are using activities in their courses that stimulate critical thinking and deep analysis among their students. While a great deal of discussion occurs within academia about the importance of helping students move beyond memorization to critical thinking and deep analysis, it is necessary to understand whether higher education faculty are stimulating critical thinking and deep analysis among students. All the discussion about engaging students in critical thinking means nothing if faculty are not responding to those discussions.

Several recommendations can be made for future research. One recommendation for future research is to replicate this study with faculty at other universities and in other locations. By replicating this study, it would be possible to compare the responses of higher education faculty to determine if other factors may be present in the types of activities that higher education faculty use that stimulate critical thinking and deep analysis among students. Another recommendation is to ask faculty about whether certain activities and practices stimulate critical thinking and deep analysis in students. It is possible that higher education faculty do not fully understand the types of activities that are most likely to stimulate critical thinking in their students.

The purpose of this study was to examine the application of faculty members to stimulate the critical thinking and deep analysis of their students through instructional practice including lecture design, assessment structure, and assignment instructions. The results of this study showed that the faculty who were surveyed are using some activities and processes in their lecture designs, assessment structures, and instructional assignments that stimulate critical thinking and deep analysis among their students. However, the results also showed that the faculty who were surveyed continued to rely on activities that did not stimulate critical thinking and deep analysis among their students, and instead required students to only engage in memorization of information gained in the classroom.

The significance of the results of this study is that higher education faculty still have work to do to utilize the types of activities that are likely to stimulate critical thinking and deep analysis among students. While faculty used some activities that encourage critical thinking in their students, there is still a reliance on activities, such as recalling information on instructional assignments, that do not stimulate critical thinking and deep analysis. If the goal for higher education institutions is to have faculty stimulate critical thinking and deep analysis in their students, then more work is needed to help faculty achieve that goal.

Acknowledgments

We would like to extend our gratitude to the faculty members who participated in this study. We would also like to thank King Abdullah International Medical Research Center for the support.

Data Sharing Statement

The datasets used and/or analyzed during the current study are available from the corresponding author on responsible request.

Ethics Approval and Consent to Participate

The methods of the study were performed in accordance with the guidelines and regulations. Participation in the study was voluntary, and all the participants had the option to withdraw from the study at any stage of the research without giving any reasons. Informed consent was obtained from all participants. It included the explanation, the purpose and benefits of the study, and they were reassured about anonymity. Information that could identify participants was saved securely. Ethics approval was obtained from the Institutional Review Board of King Abdullah International Medical Research Center, National Guard Health Affairs, Riyadh, Saudi Arabia with study number (NRC21R/489/11).

The author declares that he has no competing interests.

IMAGES

  1. 6 Main Types of Critical Thinking Skills (With Examples)

    research in practice analysis and critical thinking in assessment

  2. Guide to improve critical thinking skills

    research in practice analysis and critical thinking in assessment

  3. The benefits of critical thinking for students and how to develop it

    research in practice analysis and critical thinking in assessment

  4. CRITICAL THINKING SKILLS. 1. Analytical Part of critical thinking…

    research in practice analysis and critical thinking in assessment

  5. Critical Thinking Skills

    research in practice analysis and critical thinking in assessment

  6. why is Importance of Critical Thinking Skills in Education

    research in practice analysis and critical thinking in assessment

VIDEO

  1. Critical Thinking Assessment Series [Disk 2] [Part 3]

  2. Critical Thinking Assessment Series [Disk 3] [Part 5]

  3. Critical Thinking Assessment Series [Disk 2] [Part 4]

  4. Planning for Rigor: Critical Thinking, Assessment, Planning Grade 7

  5. Critical Thinking Assessment Series [Disk 3] [Part 6]

  6. Critical Thinking Assessment Series [Disk 1] [Part 3]

COMMENTS

  1. Analysis and critical thinking in assessment ...

    Analysis and critical thinking in assessment has consistently been highlighted as a concern in inspection reports, family court proceedings, serious case reviews (SCRs) and inquiries into child deaths. Good-quality assessment is a vital element of work with children and families, whether at the early stages of considering if a child has ...

  2. Analysis and Critical Thinking in Assessment

    Introducing the five Anchor Principles and good decision-making and how these support assessment. Identifying factors that can affect our ability to perform sound analysis and critical thinking in everyday practice. Developing different tools that can support sound analysis and help us evidence informed decision-making.

  3. Assessing Critical Thinking in Higher Education: Current State and

    Critical thinking is one of the most frequently discussed higher order skills, believed to play a central role in logical thinking, decision making, and problem solving (Butler, 2012; Halpern, 2003).It is also a highly contentious skill in that researchers debate about its definition; its amenability to assessment; its degree of generality or specificity; and the evidence of its practical ...

  4. Analysis and Critical Thinking in Assessment

    T1 - Analysis and Critical Thinking in Assessment - Resource Pack. AU - Brown, Liz. AU - Moore, Sarah. AU - Turney, Danielle J. PY - 2012. Y1 - 2012. KW - Analysis. KW - Critical thinking. KW - Assessment. M3 - Other contribution. SN - 9781904984412. PB - Research in Practice. ER -

  5. Analysis and critical thinking in assessment: a literature review

    T1 - Analysis and critical thinking in assessment: a literature review. AU - Turney, D. PY - 2009. Y1 - 2009. M3 - Other contribution. PB - Research in Practice. ... PB - Research in Practice. ER - Turney D. Analysis and critical thinking in assessment: a literature review. 2009. 12 p.

  6. Analysis and critical thinking in assessment

    Analysis and critical thinking in assessment. This handbook focuses on the issue of 'thinking in practice' and, in particular, on supporting practitioners to be more analytical in the assessments they undertake. It emphasises that assessment is an ongoing process and highlights the importance of critically evaluating the situation as a case ...

  7. Assessment of Critical Thinking

    2.1 Observing Learners in the Process of Critical Thinking. The desire for empirical assessment of competence in CT has spawned a variety of different lines of argument and assessment procedures based on them, depending on intent, tradition, and associated conceptual understanding (Jahn, 2012a). Depending on what is understood by CT and what function the assessment is supposed to have, there ...

  8. Frontiers

    An Approach to Performance Assessment of Critical Thinking: The iPAL Program. The approach to CT presented here is the result of ongoing work undertaken by the International Performance Assessment of Learning collaborative (iPAL 1). iPAL is an international consortium of volunteers, primarily from academia, who have come together to address the dearth in higher education of research and ...

  9. Analysis and Critical Thinking in Assessment: Change Project Pilot

    T1 - Analysis and Critical Thinking in Assessment: Change Project Pilot Resources. AU - Brown, L. AU - Moore, S. AU - Turney, DJ. N1 - Other: Resources produced as part of the Change Project: Analysis and Critical Thinking in Assessment. PY - 2011. Y1 - 2011. M3 - Other contribution. PB - Research in Practice. ER -

  10. Analysis and Critical Thinking in Assessment Ring-bound

    After the successful sell-out first edition (2012), Research in Practice has published the second edition of this popular handbook. Analysis and critical thinking are central to good practice in children's services, whether assessing if a child has additional needs or proceeding with a complex child protection inquiry.

  11. A Brief Guide for Teaching and Assessing Critical Thinking in

    Instructional interventions affecting critical thinking skills and dispositions: A stage 1 meta-analysis. Review of Educational Research, 4, 1102-1134. Angelo, T. A. (1995). Classroom assessment for critical thinking. Teaching of Psychology, 22(1), 6-7. Bensley, D.A. (1998). Critical thinking in psychology: A unified skills approach.

  12. Michaela Rogers and Dan Allen, Applying critical thinking and analysis

    Although the quantity of literature about critical thinking and analysis has increased in line with recommendations for safe practice arising from a variety of sources, including public inquiries, this book is very much targeted at students undertaking social work training. ... the authors provide excellent practical guidance for application in ...

  13. PDF Analysis and Critical Thinking in Assessment

    Critical thinking is associated with reasoning, which includes: Having reasons for what we believe and do, and being aware of what they are. Critically evaluating our own beliefs and actions. Being able to present to others the reasons for our beliefs and actions. Source: (Cottrell S (2005) Critical Thinking Skills: Developing effective ...

  14. Critical Thinking > Assessment (Stanford Encyclopedia of Philosophy

    A superb example of assessment of an aspect of critical thinking ability is the Test on Appraising Observations (Norris & King 1983, 1985, 1990a, 1990b), which was designed for classroom administration to senior high school students. The test focuses entirely on the ability to appraise observation statements and in particular on the ability to ...

  15. Instruments to assess students' critical thinking—A qualitative

    Critical thinking (CT) skills are essential to academic and professional success. Instruments to assess CT often rely on multiple-choice formats with inherent problems. This research presents two instruments for assessing CT, an essay and open-ended group-discussion format, which were implemented in an undergraduate business course at a large ...

  16. Analysis and critical thinking in assessment ...

    Where the welfare of a vulnerable child is at stake, if that assessment is either not done or not done well, the consequences can be catastrophic. This literature review examines analytical, critical and reflective thinking and writing in assessment, which is vital not only in social work but across a range of disciplines that work together to ...

  17. Using analysis and critical thinking in assessment

    This workshop will introduce the core 'anchor principles' to support assessment and decision-making, identify factors that can affect analysis and critical thinking in everyday practice, and explore ways to support analysis and evidence-informed decision-making. Designed for: Frontline practitioners. Social workers. Occupational therapists

  18. The Safe Care Framework™: A practical tool for critical thinking

    The definition of critical thinking used for the development of the S.C.F.™ is "critical thinking is purposeful, self-regulatory judgement resulting in interpretation, analysis, evaluation, and inference including evidential, conceptual, methodological, contextual considerations upon which judgement is placed" (Facione, 1990).

  19. Analysis and critical thinking in assessment: a literature review

    TY - GEN. T1 - Analysis and critical thinking in assessment: a literature review. AU - Turney, D. PY - 2009. Y1 - 2009. M3 - Other contribution. PB - Research in Practice, Dartington

  20. From critical reflection to critical professional practice: Addressing

    Critical reflection is a fundamental component of critical practice in social work (Fook, 2016; Testa and Egan, 2016).Yet while an extensive body of literature addresses critical reflection methods and processes (Chiu, 2006; Fook and Gardner, 2007; Morley, 2014a), the examination of the process that links critical reflection and critical practice in the professional field remains ...

  21. Revisiting creativity and critical thinking through content analysis

    Based on these key methodological principals of content analysis, our research aim was to elucidate the relationship between creativity and critical thinking through a content analysis of empirical evidence that have, in varying degrees, revealed the nature of the relationship between the two constructs.

  22. Assessing the Critical Thinking and Deep Analysis in Medical Education

    The concept of deep analysis is defined as the process of engaging in reflection of ideas and connecting information and knowledge for a greater understanding. 7 Deep analysis is the process of using critical thinking to draw conclusions that are valid based on broader knowledge. 8 As with critical thinking, deeper analysis requires more than ...