Assessment: How do we know what learners know?

Chapter 3 The Lights Are On, Is Anybody Home? Education In America

How Do We Know What Students Know? And
 Testing: How Dumb Can We Be?

Be patient…with the type of mind that cuts a poor figure in examinations. It may, in the long examination which life sets us, come out in the end in better shape than the glib and ready reproducer, its passions being deeper, its purposes more worthy, its combining power less commonplace, and its total mental output consequently more important.

William James3

Understanding how students learn is a critical piece of the educational puzzle. However, another significant factor has to do with knowing whether or not students are in fact learning. And like many other educational issues discussed in this book, we aren’t doing this part very well.

Assessment

In education, we tend to use the term “assessment” to describe the evaluation of learning. It is a catch-all word that is often misapplied and misunderstood, even within educational circles. The primary problem with assessment in education today is that, like the educational process in general, it is frequently not rational. In other words, the tools we use to measure “xyz” either don’t measure “xyz” at all, or they measure one manifestation of “xyz,” but we draw other conclusions far beyond the scope of the assessment. A conceptual analogy might be using a thermometer to determine the likelihood of snow. A thermometer cannot, in fact, tell us if it will snow. At best, it can only tell us if one required factor, temperature, is favorable for snow. It will tell us nothing about humidity, barometric pressure, or other criteria necessary for snowfall. In education, we frequently use a thermometer to predict rain and snow.

Standardized Tests

The most widely applied and politicized examples of this are “standardized tests” such as those used to measure achievement and intelligence. A common achievement test is the Iowa Test of Basic Skills, given to a zillion kids a year. A common example of an intelligence test is the Stanford-Binet.

The basic problem is not that these tests are inherently flawed (or evil as some of my colleagues believe), but that they are consistently misunderstood and misapplied. First, they are “standardized,” also known as “norm-referenced.” This means that when a person takes the test, he or she can be compared to other people who have taken the same test, theoretically under the same conditions. These tests can tell us roughly what a person knows based upon a isolated set of questions, e.g., at what temperature does water freeze, what is the proper punctuation at the end of an interrogative sentence, etc. However, even this basic concept assumes that the test taker understands the questions and is familiar with the format of the test. This is a BIG assumption. Standardized tests assume a certain level of what E.D. Hirsch (1987) calls “cultural literacy” on the part of test takers. For example, if a test has a word problem de- signed to assess the test taker’s knowledge of fractions, but uses American currency as the numerical basis of the problem, the test taker must have solid knowledge of American currency in addition to mathematical skill with fractions.

The degree to which such tests rely on cultural literacy cannot be overstated. Even this wouldn’t be as significant a problem as it is if we used standardized tests properly. For example, if we used them to compare a student’s understanding of basic information to that of the larger group (or norming sample), then adapted that student’s learning experience to build the skills that he or she seemed to be weak in, then there would be value in the test. This, of course, also assumes that the skills and information tested are important to have in the first place—another big assumption.

One of the worst misuses or abuses of standardized tests, however, is the attempt to measure the quality of teachers and schools based upon student test scores. Doing so is a blatantly invalid use of standardized tests for at least three reasons (Popham, 1999)*. First, although there is some general similarity in curriculum from state to state, there are often differences in what is taught between school districts and huge differences between individual classrooms. Therefore, what national standardized tests evaluate, by definition, is not closely correlated with what individual teachers teach. Second, because the folks who construct standardized tests want a broad distribution of test scores (a bell curve), they purposely throw out questions that too many students answer correctly. As a result, the things that teachers teach most effectively are removed from the tests! Finally, standardized test scores are strongly influenced by at least two factors besides what students learn in school: innate intelligence and “out of school” learning. In other words, two out of the three major influences on test scores are beyond the control of teachers and schools! The combination of the three factors stated above clearly show that using standardized test scores to measure the quality of schools or the teachers in them is at best invalid and at worst negligent.

*The article cited here by James Popham in the March, 1999 issue of Educational Leadership, is an excellent and eloquent explanation of why standardized tests are not valid instruments for assessing educational quality. I highly recommend reading the entire article, available here.

As an aside, there is a statistical correlation between achievement tests and IQ tests. In other words, most people tend to score similarly on both kinds of tests. In some cases, there is also a correlation between test scores and success in school. However, this does not mean that people who do better on standardized tests are smarter, therefore more likely to succeed in school. What it means is that the kinds of skills, knowledge, and experiences necessary to do well on standardized tests are similar to the kinds of skills, knowledge, and experiences that have traditionally fostered success in school. As Howard Gardner (1983) says in his seminal book about multiple intelligences, Frames of Mind, “IQ tests have predictive power for success in schooling, but little predictive power outside the schooling process”(p.16). This is partly because the same traditional assumptions about intelligence testing have found their way into assumptions about intelligence in students. As such, the schooling process validates students with traditional skills and knowledge and penalizes children without them. Gardner adds, “The IQ movement is blindly empirical…There is no view of process, of how one goes about solving a problem” (pp. 17, 18). The same can be said of standardized tests in general.

“High Stakes” Testing

Unfortunately, in addition to misusing standardized tests to measure educational quality, at least two other applications of such tests frequently happen that are bad for students and bad for the schools. The first is that these tests are used to include and exclude students in/from a myriad of educational offerings from Special Education to Honors English to college admissions. This is becoming more common and is often referred to as “high stakes” testing. In other words, students who do well on such tests are often provided what would be considered desirable learning opportunities such as honors classes, enriched curriculum, promotion to higher grade levels, admission to exclusive colleges, and others because the high test scores “qualify” students for these opportunities. Similarly, students who do poorly are not only often denied the “good stuff,” but are “tracked” into lower level classes and curricula such as special education, “resource” programs, remedial classes, etc. This would potentially be harmless if these lower level programs actually helped students attain the skills that standardized tests say they lack. Generally, they don’t. And unfortunately, once tracked, students rarely get out (Gay, 1993).This reality tends to disproportionately favor middle and up- per class, white students, while disfavoring lower income students of color. This dynamic is a cultural phenomenon. Advantaged students are acculturated in many ways that make them “better matches” for norm-referenced tests. In other words, many of the values and experiences that advantaged students have are precisely the same values and experiences that the tests are based on. The erroneous assumption is that the standardized test is a measure of a student’s intellectual abilities. It is not. It simply measures a person’s ability to answer a discreet, isolated set of questions in the context and for- mat generated by the test.

The second, and much more political application of norm-referenced tests has to do with the comparisons and competition that arise between schools, districts, states, public vs. private, etc. This is a built-in feature of such tests. Because they are “standardized,” the results of one individual or group can be compared against other individuals or groups who’ve taken the test. The fundamental problem with such comparisons, despite the fact that they are invalid for comparing the efficacy of different schools, is that they do nothing to improve the educational opportunities of the students who score lower than others! As cynical as it sounds, the fact is that improving student learning isn’t the objective of standardized tests, and usually is far from the minds of those who require students to take them. And the manifestations of such competition can be insidious. In the last few years there has been a constant supply of news stories about teachers, principals and superintendents who’ve been caught doing everything from changing answer sheets to stealing copies of tests in an effort to improve their school’s or district’s scores. A more common scenario, discussed earlier in this chapter, is simply “teaching to the test,” or even to specific test problems, which robs children of time that should be spent learning the curriculum. Such testing is often seen as “high stakes” for educators as well as for children. Not only are “poor performing” students often penalized, but educators can lose pro- motions or pay increases, or even their jobs, and school districts can see funding and other resources cut as a result of “poor” test scores (Armstrong, 1991; Kantrowitz & McGinn, 2000). A Newsweek expose on cheating by school officials describes the dilemma succinctly: “The problem is that high scores—not high standards—have become the holy grail” (Kantrowitz & McGinn, 2000, p, 48). In effect, a political effort to manufacture accountability for school performance has resulted in the pervasive use of standardized tests in ways for which they were never designed nor intended.

One might legitimately argue that poor performing educators and schools should be “penalized,” but, remember, standardized test scores are totally invalid for measuring the quality of education in schools! So, if we want to hold educators accountable, fine, but we can’t use standardized test scores to do it.

At the classroom level, teachers are often coerced into “teaching to the test” so that the standardized test scores of their students will rise. Again, this wouldn’t be a problem if there were a solid relationship between the curriculum that most teachers are asked to follow and the tests they are asked to teach to, but this isn’t even possible without a nationally controlled curriculum—an idea that is fraught with its own nasty problems. If such a relationship existed, everyone would win with higher test scores and “teaching to the test” would be a natural, rational policy. Because of the insidious results of the competitive use of standardized tests, however, teachers are often required to devote precious instructional time to material (the test) that doesn’t match what they’re supposed to be teaching (the curriculum). And this has nothing to do with student learning. It is a way of responding to political pressure to increase test scores.

In terms of comparing students to each other via test scores, Eric Jensen, a leader in brain-based education theory, doesn’t pull any punches when he succinctly says, “Comparing one learner to another is one of the most irrelevant and damaging assessment strategies ever devised” (Jensen, 1996, p. 280). Whether one agrees with such an extreme position or not, the bottom line is that standardized tests are consistently misused and misunderstood, and young learners often bare the brunt of this practice.

Unfortunately, as irksome as “high stakes” standardized testing can be for teachers, it can be debilitating for students. There are several reasons for this. First, for the youngest test takers, these tests are a confusing and utterly foreign intrusion into the classroom. Many children in the early elementary grades do not have the fine motor or spatial skills to fill in the bubbles on answer sheets, nor do many of them understand the relationship between the question and possible answers in the test booklet and the corresponding bubbles on an answer sheet, even if it is located in the same booklet on the same page.

Moreover, neither the format of the tests, nor, in many cases the content, has anything whatsoever in common with a “normal” day in class. Anyone who has observed students in kindergarten through second or third grade in the midst of a standardized achievement test knows, that even if all the challenges I’ve just described above didn’t cause problems for the students, the test results are still worthless for many children. These are little kids who, very shortly after a testing session begins, are spending far more time staring out the window, looking at their friend’s test, doodling on their desks, making designs with the bubbles on the answer sheet, etc. than they are concentrating on the test. A parent commenting at a recent community forum on educational issues at which I was a panelist confided that when he was in elementary school, he used the bubble sheets for “bead work!”

For many small children, these testing sessions are confusing, frustrating, and demeaning. For these reasons, and others, organizations such as the National Commission on Testing and Public Policy, the National Association of the Education of Young Children, and the Association for Childhood Education International have taken a strong position against any such testing of young children (Association for Childhood Education International & Perrone, 1991). I would take such a position a step further and suggest that any standardized testing for children before third grade is unethical. Not only does it create a hostile, disconcerting environment for such students, but there is no defensible rationale for putting kids through this stress because the test results aren’t appropriately used anyway. Even if they were, the nature of such instruments isolates the child from any meaningful context for the assessment.

With older students, standardized tests can also be debilitating because the very nature of these instruments excludes varied or creative ways of establishing competence. Moreover, on such tests, how fast a student can answer a question is as important or more important than his or her knowledge. As a result, otherwise intelligent and creative students find themselves “incompetent” to do well on such tests and have weak test scores to “prove” their inability. Older students go into these tests knowing that their validity as intelligent students is on the line. They know that placement decisions are often made on the basis of their scores. They know the scores go in their student files and that their “rank” among students is based on their performance on these tests. All of these factors combine to create incredible stress and anxiety for many students. As a result, their performance is often further compromised.

One might argue that, “Well, if it’s the same for all students, then it should all come out in the wash.” There are a couple of problems with this argument. The first is that it isn’t the same for all students. Some students, primarily white, economically stable, and from educated homes, tend to do better on standardized tests, because, as I stated previously, they are acculturated toward such instruments. As a result, students who are already “advantaged” get their status cemented by their test scores, and the gap between educational haves and have-nots widens. Secondly, students who do not have highly formalized skills in language and mathematics, but who may have exceptional intellectual capabilities nonetheless, simply cannot manifest their abilities meaningfully on typical standardized tests. As such, their lower scores are interpreted as lack of intelligence or ability relative to the norm, when in fact, what the test has measured is the students’ ability (or inability) to negotiate the test. As an example, imagine if you were required to cook what you believe is your best meal. However, you have to cook it in someone else’s kitchen, with their utensils, spices, etc. You prefer fresh pressed garlic, they have powdered garlic salt; you normally use a glass baking dish, they have cast iron. Not only that, but while you usually take about an hour and a half to prepare and cook your favorite dish, in this case you must do it in 45 minutes. Rules are rules. How would it feel to have someone judge your cooking under these conditions? This is analogous to what students must do on standardized tests.

I’m not suggesting that there aren’t, in fact, distinctions in ability between students. Nor am I suggesting that there are no appropriate uses for standardized tests. What I am saying is that relying on standardized testing to tell us what students know and can do because it is fast and cheap (compared to more authentic assessment that is discussed below) is a mistake that ill serves everyone—students, educators, parents, politicians, etc. It is a mistake because, like the analogy used early in this chapter, it creates the illusion that a thermometer can predict snow.

Standardized tests make sense if we acknowledge several things. One, they do not determine intelligence or ability beyond the limited contexts of the tests. Two, they provide only limited information about what someone knows or doesn’t know relative to the isolated questions on the test. Three, for the test to be valid on any level, the test taker must understand the questions and the format of the test. Four, they can only be used as one small piece of the puzzle in educational decision-making. Five, unless we actually use the testing appropriately, we shouldn’t do it at all. In other words, standardized tests should be used sparingly, if at all. Testing shouldn’t drive curriculum (unless the tests actually measure what we want students to learn), and testing should never be a political weapon.

Other Kinds of Assessment

Ironically, high stakes political issues aside, the trouble with standardized testing described above is actually of less importance when trying to determine what kids know than other assessment problems, because it is usually a seasonal phenomenon that directly impacts classrooms for a couple of weeks a year. Of much greater concern on a day-to-day basis, is the “criterion-referenced” assessment that dominates most classrooms.

Criterion-Referenced Tests

Criterion-referenced evaluation is a model for evaluating students that rates them based on how many questions they can answer or problems they can solve out of a total number of items on an assignment or test. Unlike norm-referenced or standardized tests, criterion-referenced tests compare students to themselves, usually with a percentage of “correct” answers as a bench mark, i.e., 90% and up is an “A,” 80% to 90% is a “B” etc. So what’s the problem?

To begin with, this system usually has a minimally acceptable passing rate of 60%, which is a “D.” What this means is that a student can have barely more than 50% mastery, and still “pass.” There is no provision for improving the level of mastery, because, technically, everything above 60% passes. Students who miss over a third of what they are supposed to learn are simply carried along, doing very poor, but “passing” work, until the cumulative deficiencies become too great and many fail outright. The second significant problem is that it doesn’t recognize the improvement an individual might achieve relative to him or herself. For example, if a student is flunking a class for the first eight weeks of a term with a 20% average, then improves in the second eight weeks to a 50% average, such a student would have realized a personal improvement of 250% and still fail the class. A third problem is that it doesn’t take long before many students are far more concerned about their grade than what they learned. Most of us are familiar with this phenomenon. I’ve seen second graders and graduate students who are so concerned about what grade they are going to get, the fact that the point was to learn something becomes completely lost. This obsession with grades can also produce quite a bit of anxiety, and that kind of stress doesn’t help most of us learn.

Another, more “systemic” problem with criterion-referenced assessment is that the criteria are almost exclusively generated by teachers or publishers (in the form of chapter and unit tests, worksheets, etc.), which, of course, leaves students out of the loop. This is a problem because students cannot learn to critically analyze information; cannot learn to distinguish significance from trivia if they don’t participate in the process of determining what questions should drive inquiry and assessment, and how meaningful questions differ from inconsequential ones.

Developmental and Authentic Assessment

So what kinds of assessment make more sense? Within education, two widely used labels are “developmental” and “authentic” assessment. They are not widely practiced, however, because they take more time and effort than standardized or criterion referenced assessment. Moreover, many teachers working today simply don’t know how to use such evaluation techniques.

At a fundamental level, both developmental and authentic assessment are challenging because they require teachers to be cognizant of where students are and what they need in the learning context. They both require that teachers have a more broad understanding and knowledge of students than is typically required with conventional assessment methods. They also require teachers to be much more aware of how the lessons they teach are related to assessment and vice versa before they actually teach them. This is because in both types of assessment, the learning process itself is integral to evaluation.

In the case of developmental assessment, the teacher must evaluate students individually or “developmentally.” What this means is that students are not compared to a norm (standardized test) nor to the same criteria that everyone else is. Instead, the teacher must be aware of where each student’s learning is at any given time. The teacher, student, and often parents agree on goals for learning, then the student is evaluated based upon his or her performance relative to those goals. In this way, students of varying abilities can be assessed relative to their own development. This is, of course, much more meaningful. Developmental assessment is not new. It is common in kindergarten and has been the basis of individual education plans or “IEPs” in special education for years.

One typical mechanism for developmental assessment is the use of portfolios. Very briefly, portfolios are collections of student work over time that demonstrate the developmental improvement of individual students as their own work progresses. For example, a first or second grader would have multiple entries in the category of literacy or writing. One might see that in September, the child was writing simple, two or three sentence responses in a journal. The sentences may reflect a problem with syntax and limited creativity. Entries in December may show eight or ten sentence responses with some contractions and more creativity of thought. By May, the student might be writing an entire page for each journal entry with more complex grammatical structures and improved fluidity of thought. Such a portfolio would show graphic, developmental improvement in that child’s writing over a school year. Such information is far more valuable than spelling test scores or letter grades to teachers, students, and parents.

As anyone who has any experience in the classroom can imagine, evaluating students developmentally, and thus individually, is not easy in large classrooms. But, huge class sizes are at the heart of many problems in our schools today. If real education reform is to be successful, efforts must be integrated and holistic, i.e., they must involve curriculum, assessment, class size, governance, teacher training, funding, etc. All must be addressed together. All of these issues are discussed in subsequent chapters.

Authentic assessment is a broad based term which essentially means evaluation of realistic, meaningful tasks in the contexts in which they occur. Developmental assessment can be authentic. Standardized tests, however, are not “authentic” because they actually isolate the test taker from the skill or knowledge being assessed. Conversely, performance based tasks in which a certain product or end result is indicative of successful learning is “authentic” because the student cannot achieve the desired product or performance without the skills and information necessary to do so.

Most professionals experience relatively authentic assessment on the job. For example, a sales manager might be assessed on a sales quota, customer satisfaction or other criteria that are realities of the job. It would be ridiculous to base such a person’s job performance on his or her ability to answer esoteric questions about his or her sales techniques. The proof is in the pudding of sales. In an educational context, it makes sense to employ equally authentic evaluation strategies. In a real-life example, a group of at-risk high school students and their teacher, Bruce Kaiper, from Northern New Mexico, recently traveled to India to compete in an international robotics competition. These are kids who’ve been identified as “flunkies,” as real problem kids. They took second place in the world and won 14 classifications outright—more than any other team. Their work was the manifestation of their skills. There were certain criteria for effective robotics and the students were assessed on the products they created. This is authentic. A more typical classroom example might be something like a social studies lesson in which rather than students simply reading about the separation of powers in American government and taking a test on the material (typical assessment), the students might read, research, and discuss the separation of powers in the context of an actual political issue such as the development of educational policy. By the way, this kind of lesson can happen at both the elementary and secondary levels, but the way the lesson is taught and evaluated would vary depending on the developmental level of the students. In this example, after carefully exploring what the separation of powers are, students might create a mock process involving a real-life issue in their own school such as whether to have open or closed campus. A “law” concerning the issue could be proposed (legislative bill) and acted on by all three branches of government in the mock process. In such a scenario, there is ample opportunity for “authentic” assessment, and it can be a combination of more traditional methods such as essays, and more innovative performance based evaluation. For example, the instructor could have students define the separation of powers, either in writing or in small group presentations. The small group presentations allow for dialogue and clarification at the time the students are presenting. It facilitates far more meaningful assessment because both students and the instructor can address questions in detail as they arise. And for those of you who may be uncomfortable with such “open ended” assessment, keep in mind that it is always the instructor’s prerogative to structure student presentations such that they address points that the instructor feels are critical. In the present example, the instructor can require that as part of the presentation, students specifically address any issues related to separation of powers that the instructor feels are important.

As the lesson unfolds further, more opportunities for authentic assessment avail themselves. For example, as students take the open campus bill through the legislative process, then the executive and judicial branches of government, the teacher can evaluate the students’ understanding of each branch of government as they enact the mock process. The teacher can clearly see to what extent students correctly understand the process and can intervene as mistakes or questions arise. And, the teacher can devise evaluation criteria as generally or specifically as he or she chooses.

It is important to note two important things at this point. One, is that authentic assessment changes the nature of assessment itself, and two, that such evaluation techniques cannot be separated from the teaching techniques that go with them. Authentic assessment changes the nature of evaluation because the focus shifts from what a student can recapitulate to what a student has learned. There are practical implications for this. For example, quantitative grades become less significant. Letter grades and GPA become secondary to the bigger issue of what students are learning. In our example above, the primary issue becomes whether or not students understand the separation of powers well enough to effectively implement a mock governmental process, not what percentage they score on the test.

Admittedly, this is a stretch for some of us. And when evaluations are narrative instead of quantitative, it occasionally impacts other systemic realities such as transfers between schools and college admissions. “The system” likes numbers and letter grades because they are clean, and in theory, allow comparisons between students. Having said that, colleges and universities in particular, are beginning to give far more credence to such authentic assessment in their applicants because they are finding that some of their best prospects come from innovative high schools that don’t use grade point averages, or have been home schooled. For the traditional thinkers in our midst, however, there are ways around the qualitative-quantitative conundrum such as rubrics, which will be discussed briefly later in this chapter.

The second significant implication of authentic assessment is the degree to which it requires close connection to instruction. With traditional kinds of assessment, cookie cutter assignments and tests accompany cookie cutter curricula. Because these kinds of evaluation tend to assess recall of facts and figures, they do not need to be as carefully integrated into instruction. With authentic assessment, however, teaching and evaluation are integrated. In the American government example above, the activities that make up the lesson have assessment built into them. As the students present their findings, the teacher evaluates their learning as part of their presentations. In a traditional model, the assessment would be a separate activity that is isolated from instruction and learning such as reading followed by a test on the reading.

Although assessment could be a book of its own (and is for some authors), the constraints of this chapter don’t allow for a more in-depth discussion of how developmental and authentic assessment are applied on a daily basis in the classroom. I would like to address one other issue of importance, however. Regardless of the specific modality of assessment, e.g., written, verbal, performance, etc., teachers must provide clear expectations of what is being assessed and how. Amazingly, teachers rarely communicate to students what criteria comprise the basis of the grades students receive from assignment to assignment. For example, most of us have had papers returned from instructors with a grade on them that seems totally arbitrary, because “we didn’t know what the teacher wanted.” There might even have been detailed comments about what was missing or what was wrong, but these comments come from a secret, unwritten set of expectations that only the teacher knows. What happens, frankly, is that teachers typically give guidelines for assignments, but do not provide a corresponding guideline for how they are evaluated.

One way this can be fixed is through rubrics. Rubrics are frame- works for assessment that contain two components: 1) a list of criteria, or what has to be done, and 2) gradations of quality (Andrade, 2000). See figure 3.1 as an example.5 Rubrics, then let students know what is expected of them and how it will be evaluated. They can be qualitative or quantitative .What this means is that the teacher or peers can respond to each point on the rubric with a narrative about perceived strengths and weaknesses in student work, or can ascribe a numerical value to each criterion that total a certain amount. In figure 3.1 there are multiple criteria, which happen to be worth between 1 and 5 points. Notice that this rubric is also qualitative, which provides the learner specific feedback. More importantly, students know exactly what is being evaluated. Rubrics can be individualized for students of disparate needs and they can evolve as student abilities evolve. Additionally, as mentioned above, instructors can quantify rubrics to provide numerical grades if they are required.

In general, developmental and authentic assessment are more meaningful ways of determining what students know and what they can do. They are also more time consuming and require greater knowledge of students on the part of teachers. However, as the diversity of student populations increases, the traditional cookie cutter evaluation techniques referred to earlier will become less effective for an increasing percentage of students. If we hope to successfully evaluate student learning under the circumstances we now have and will have in American schools, we really don’t have a choice but to work with children as individuals.

A Final Note on Assessment

I mentioned at the beginning of this chapter that a primary problem with typical assessment in schools today is that we aren’t measuring what we say we are measuring. Although, for the ease of discussion, I have repeatedly referred to the assessment of “learning,”

This rubric was designed by Dr. Jackie Gerstein. It is used in a graduate course at the College of Santa Fe.

Porfolio

a secondary, but fundamental problem with contemporary assessment is that it is based on the flawed assumption, that we can, in fact, truly or fully assess what people have learned (Gardner, 1993; Jensen, 1996). To some extent we can evaluate whether or not someone has memorized information; likewise, we can evaluate an individual’s ability to manipulate symbols within a prescribed framework. For example, a teacher can ask a student to read a chapter in a book about the American Revolution, then have the student answer a set of questions that demonstrate recall of information, or even that demonstrate the ability to make an abstract connection. Or a teacher can present an algebraic formula, then ask a student to solve a related algebraic equation. If the student has “learned” the formula, he or she should be able to solve the problem. Generally, that is how we assess learning. A student answers the questions or doesn’t; solves the problems or doesn’t. In this typical model, the number of questions answered or problems solved “correctly” reflects the level of learning; in some cases there is a slightly more sophisticated interpretation of student work, but the premise is fundamentally the same. In either case, for the most part, we are not assessing learning; we are assessing task completion. Even with developmental and authentic assessment, which provide a much more meaningful and contextualized picture of student progress, we have only scratched the surface in terms of determining what students are truly learning.

In the example above, one might argue that we have also assessed: the ability to think linearly, the ability to follow directions, and the capacity to negotiate the overt and hidden requirements of the learning environment. We may even be able to infer an improved understanding of a given subject matter. But we don’t have a clue what the student truly knows or thinks or feels about the American Revolution or the particular algebraic equation—or a thousand other things that the lessons may be related to in this learner’s mind. We don’t know what meaningful connections might have been made that can’t possibly be reflected in the assessment. Likewise, we don’t know what connections weren’t made. We don’t know what metacognitive (thinking about thinking) tools or processes the student used, and we don’t know how else the student can apply what has been “learned.”

We don’t know these things for two primary reasons. First, learning is an intensely personal and unique phenomenon. It is biologically and psychologically different from person to person and from setting to setting (Caine & Caine, 1994; Gardner, 1983; Jensen, 1996). Although we can “observe” learning in limited ways in terms of brain function (see chapter 2), we cannot, at the time it is happening, determine how a given learning experience will affect or change a given learner in terms of belief systems, self-concept, long-term problem solving, or even spirituality. And this leads to the second reason we don’t truly know what students (or any of us) is learning at any given time. Simply put, learning takes time. It is longitudinal. As Jensen explains it, “Biologically, the best, most valuable and deepest learning does not produce any tangible results for a considerable time” (1996, p. 115).

The connections that a learning experience creates in an individual may not have application for hours or days or years. One learning experience may interact with another (from the past, or in the future) in ways that are unimaginable. And ultimately, we simply don’t know what we know or what we can do until a relevant challenge arises that requires us to apply our knowledge and skills.

Having stated, rather unequivocally, that we can’t fully assess learning, let me suggest that we must, nonetheless, strive to assess the learning process. We can evaluate longitudinal progress, and we can, in limited ways, assess what knowledge students possess relative to specific contexts. And, if we work with students to create relevant, reasonable objectives for the learning process, while working closely with them to evaluate their progress toward those goals, then assessment, despite its limitations, will serve both teachers and students well. We can then at least infer learning is taking place.