You teach, we’ll grade

Steven Hayward writes (WSJ),

Mr. Benson’s model is spreading in variations at other universities. A few similar programs already existed, such as the James Madison Program at Princeton and the Thomas Jefferson Center for the Study of Core Texts and Ideas at the University of Texas at Austin. Since 2016 I have been a visiting scholar at the University of California, Berkeley—an unimaginable prospect if not for my time at Boulder.

Hayward praises the reign of Bruce Benson as head of the University of Colorado at Boulder.

The question of what to do about higher education is top of mind for many conservatives these days, and it should be. Also from the WSJ, Allen C. Guelzo reviews a book by Richard Vedder.

some of the reforms Mr. Vedder puts forward—converting federal loan programs to vouchers and allowing students to assemble self-tailored programs across a variety of institutions; making a national Collegiate Learning Assessment the real credential for a degree rather than the mix of vacuous classes and inflated grading that now suffices; upping campus facility use to year-round schedules that will permit the completion of a degree program in three years rather than four.

I like the idea of using someone other than the professor to grade students. That was the way the Honors program worked at Swarthmore when I was there. The professor sent the syllabus to an examiner from another college or the world of work, and the examiner made up an exam and graded it.

The student cannot be certain that the examiner will bring the same prejudices as the professor. So students have to think about the material, not just aim to please their own professor. If there is a chance that the outside examiner has a conservative point of view, this might force professors and students to take conservative viewpoints seriously during the course.

Think about how this would change how students rate professors. If the professor is not doing the grading, then the students are not going to reward the professors who are easy graders.

I’m all for outside examiners.

18 thoughts on “You teach, we’ll grade

  1. That will require students to unlearn much of what they have learned in high school about how schooling works.

    In high school (and middle school), it is important for students to feel that the teacher is “on their side”, that she wants them to pass. In fact, if they don’t feel that way, “classroom management” becomes almost impossible. Lots of students won’t care and won’t try, and some will be actively disruptive.

    One way teachers do that (often without realizing) is to signal what will be on the test. What the teacher emphasizes, what the teacher reviews. (Students often help the process along by asking, “Will this be on the test?”) Most students don’t have the time or the interest to think deeply about the course and to get a real understanding. They want to know enough at the end of the unit to get their target grade–and usually don’t care if they then forget, which they mostly do.

    If you test them on things you haven’t emphasized, or if you require independent thinking on the test, they will feel confused and betrayed. You have changed the rules in the middle of the game.

    To the extent that college is just a continuation of high school–and for the majority of young people it is–the independent grader idea won’t work. Or after a few years, a consensus will develop about what to emphasize and what to test, and you’re pretty much back where you started. As long as there is an “everybody should go to college” mentality, I don’t see this changing.

    (Why does it work at Swarthmore? Because students there are in the 99th percentile of intelligence and academic interest.)

    • I imagine the following dialog. The student did okay but not great in high school and is going to an okay but not great college:

      Student: Will this be on the test?

      Professor: I don’t know. I don’t make up or grade the tests.

      Student: Then how will I know what to study?

      Professor: If you’ve done the readings, come to the lectures, thought about them, and participated in the study sessions, I’m sure you’ll do fine.

      Student: (to himself) As if anyone does all that.

      • If someone takes an SAT test prep class and asks the coach, “So, what’s on the SAT test?”. Would the coach say, “Well I don’t write the test, so I have no idea”? No!

        • That’s because the SAT has been around for years and years and the College Board releases old tests. That gives SAT prep people a very good idea of what will be on the next test.

          I tried to suggest that after a while, the same thing might happen with “you teach; we’ll grade”: “Or after a few years, a consensus will develop about what to emphasize and what to test, and you’re pretty much back where you started.”

  2. Grading isn’t the problem, and to the extent it’s a problem, it can’t be fixed with that tweak.

    Even if the tweak ‘worked’, it certainly wouldn’t work for long.

    It’s kind of like peer review for exams. Sometimes peer review works, but as we all know too well now, it’s got its own problems, so sometimes it doesn’t. And even if it worked, who cares, if the syllabus is full of pernicious nonsense like a typical Grievance Studies class?

    There’s no escaping the need for good, honest judges (or graders, or whatever), and if the problem is bias among your set of judges that doesn’t statistically average out to neutrality, then using more or different bad judges wont help. Actually, under a mean biased away from zero, the Law of Large Numbers makes getting an un-biased result less likely, the more judges you use.

    Actually, it creates a much bigger and more insidious structural problem. I can agree with being charitable in interpretation for some people, in some situations, as if I merely misunderstood an angel.

    But when it comes to strategy, it’s typical best practice to consider the absolute worst and most dangerous thing one’s opponents could do in response to one’s actions, as if they were a devil.

    See, you can’t dam up a river at its mouth without it overflowing its banks. There will just be a new path of least resistance, and the pressure will build and the water will circumvent the obstacle and meet the sea eventually, one way or another.

    When you take some kind of important control away from a single individuals and put it in the hands of a necessarily selective set of other individuals, then the next logical move is to try and cooperate with like-minded people to influence the selection process for that set. Which is how academic peer review systems fail when they are corrupted and come under control of a logrolling clique. What was supposed to improve accuracy instead cements inaccuracy and ensures contrary views will never see the light of day.

    Is there really any doubt that progressives would apply the same pressures to the grader selection system the minute it looked like such a system threatened to undermine any area perceived as important enough that it simply must be presented in a completely one-sided manner in the same way by everyone, to bolster the impression of a solid consensus with which disagreement is a sure sign of crankery or worse?

    It’s even worse than that. At this point we are lost in the forest of bureaucratic distributed responsibility where it becomes impossible to ferret out a rat and hold any blameworthy parties accountable, as opposed to what we have with an individualized process. All while the process is perversely perceived as having risen from bronze to gold standard, when it is inferior to what came before.

    Let’s say you are looking at a kind of complicated machine and there is some major engineering problem with its performance you’d like to solve.

    One approach could be optimally efficient tweaking: trying to figure out the easiest, quickest, and cheapest change – almost necessarily a relatively small change – that would have the biggest bang for the buck in term of performance improvement. This is a kind of “starting where you are” gradual evolutionary approach, which can over time produce really impressive results. Though maybe you just don’t have that kind of time given the pressing nature of your concerns.

    Also, a mere chain of tweaks can also hit all kinds of diminishing returns or the wall and get painted into a corner or stuck at the dead-end local-maximum with no good way of making any further progress towards adequately addressing the fundamental problem. Perhaps the machine started out with a core approach that turns out to be inherently irredeemable in terms of solving the problem that wasn’t important or anticipated in the original design phase.

    For example, you can’t get a piston-engined, propeller-driven aircraft to break the sound barrier, no matter what you do.

    Another approach would be to imagine how one would redesign an alternative machine or some key component from scratch, now that one has the advantage of the insight of the existence and importance of the Big Problem, to provide a reset and new starting point from which one can then re-initiate the evolutionary tweaking process, the first series of which are likely to produce major improvements quickly.

    From the F-86 jets barely breaking the barrier (70 years ago) to A-12s going Mach 3 only took 15 years, with nothing at all like modern computers to help! Progress has been much more gradual since then.

    For non-progressives, there are all kinds of Big Problems with contemporary higher education, and one could try to rank them in importance. Then we can see whether any particular small tweaks or big redesigns effectively addresses the top issues. If it doesn’t, well, it’s a first world problem when there are third world problems to deal with, and attention and effort are probably best placed elsewhere. Noise abatement is not much use when the enemy bombers are bouncing the rubble of what was once your home town.

    So for example, there is the “Case Against Education” problem. And there is the “Progressive Madrasa” problem. And there is the “Ideological Purity Test” problem for admissions and hiring and tenure. And there is the “Logrolling Clique” problem for entire fields.

    My impression is that personalized grading is a second-order issue at best.

    • > Grading isn’t the problem, and to the extent it’s a problem, it can’t be fixed with that tweak.

      Schools tweak their classes and grading methodologies all the time. This thread is talking about bigger changes.

      Why can’t you have one institution measure mathematic ability, and separate organizations offer coaching services. Why not?

      • There are two questions.

        One is about good practices for teaching and learning and testing in general, when ideological bias isn’t a problem.

        The other is what to do about biased professors, content, and grading which tends to implant that bias in students, and specifically whether an alternative grading strategy would have a direct or incidental benefit in reducing that bias.

        I am addressing the second question, and my answer is no. Who cares if get schooled by a university, coached by BarBri, and tested by the state, if I’m taking the Bar Exam For Grievance Studies or SAT for Social Justice?

        Decision power is conserved – someone is always making the important choices. For example, whether the content will be biased. If you remove that choice from an individual, then the important choice just moves upstream, like which other individuals qualify to be on the board or committee or whatever.

        Our civilisation discovered only one ok answer to this problem, which is structural adversarialism, which is what one gets at an Anglosphere trial, and for obvious reasons how legal education is conducted, with the dissents and minority or historically losing viewpoints getting their guaranteed quota of time for study and fair consideration with every student in the position of judge for whether the arc of history really went the right way or not in any particular instance.

        My proposal has been to expand structural adversarialism and “influence quotas” across society (cf bipartisan commissions) and especially in education. Have at least two rival professors share the platform argue their cases in each class, and they only answer to their own camp as regards suitability.

        You could have two tests, with the grade the average of the two scores. If you want an A, you’d better be able to express the perspective of both sides with accuracy. An educational approach to train all college students how to pass Ideological Turing Tests. We could use some of that.

        • what to do about biased professors, content, and grading which tends to implant that bias in students

          In STEM subjects, there is generally no ideological bias in the classroom. Have you taken a math class, for example? It’s strictly technical content. The administrative staff is the opposite, even in STEM departments. The Dean of Engineering, is fundamentally an administrative job, not a technical engineering job. Many Deans of Engineering are outrageously vocal on hot button political topics, and are highly political.

          My preferred solution is to give regular people avenues to get job skills and compete for evaluation scores without going through the present university system. Today, people have cheap access to STEM content, textbooks are cheap, online resources are cheap or even free with a laptop, but the university system has a monopoly on giving meaningful evaluations, grades, and credentials.

          Beyond the political bias issue, people should have much more flexibility in how they learn, and our institutions should be much more efficient at encouraging the right learning and skills.

          • Stalin said it’s not the people who vote that counts, but the people who count the votes. (We’re looking at you, Cook and Broward counties).

            The two party system gets a bad rap, but one of its advantages is that it lends itself to a solution to the problem of how to pick neutral and honest polling place staffers and vote counters for a recount or close election.

            The answer is, you don’t. You can’t expect people to be neutral, so a better system is one which expects them to be partisan – which is more consistent with human reality – and even benefits from that partisanship as those partisans are strongly incentivized to detect any funny business from their opponents.

            So, each party gets to send their own observers and lawyers, and each ballot gets looked at by a counter from both sides.

            So, imagine having two ideologically adversarial Deans or two CEOs. Nothing is decided or done or said unless both agree – no tie-breaking, kind of like unanimous jury deliberations. The zone of agreement is reduced to the set of organizational strategies and solutions either without salient contemporary ideological controversy or which give ‘both’ sides fair expression, because nothing else gets through.

          • You can’t expect people to be neutral, so a better system is one which expects them to be partisan

            When I buy good/services, sure those people aren’t saintly neutral, but I don’t care.

            If I want to buy math coaching, why should I care about the political bias of the coach?

            If I buy fitness coaching or music lessons, I don’t really care about the political leanings of the coach, why should math or STEM or academics be different?

          • @Niko You’re the only one here who’s insisting on limiting the discussion to STEM subjects. The original WSJ article talks about visiting professors of political science and environmental studies and about things called James Madison Program at Princeton and the Thomas Jefferson Center for the Study of Core Texts and Ideas, not about calculus or electrical engineering.

            @Handle: Wasn’t it you who’d observed that these adversarial designs presuppose a certain (actually quite high if one thinks of it) level of cooperation between adversaries? Stakes have to be sufficiently low to play adversarial games in law, two-party politics, or in games for that matter. Adversaries must abide by common rules, exercise a minimum level of honesty, and be able to agree on an outcome. Chess-players are supposed to make only legal moves — there can be no argument about which moves are legal — and are not supposed to poke at the other’s eyes out with chessmen or to upset the board (a move Russians call “the Chinese draw”). The higher the stakes, the closer the contest is to war, and the worse adversarial solutions work. It’s arguable that after a certain point they actually exacerbate the problem, accelerating polarization through brinksmanship, doubling-down etc., and in fact I doubt that they can ever avoid having this effect whatever the level of stakes is. The adversaries’ restraint has to come from somewhere else, and it’s the presence of that “something else” that distinguishes adversarial games from war. What little restraint there is on adversarial conduct in (modern) war arises from game-theoretical consideration of phenomena, such as MAD, possessed of immediate impact and saliency, rather than from vague concerns for the public good.

          • @Candide: You make a good point in general, though it needs to be addressed from several angles, and I’m leery about responding this far down a comment thread to a post on a different, specific topic, so I’ll defer to another day or place. I am borrowing the usual term used to describe Anglosphere trial processes, but perhaps I should be more specific in terms of emphasizing the quotas / “guaranteed equal opportunity for influence” aspect, which can be use, for example, to protect one’s side from getting railroaded by the other.

            The question of whether any particular adversarial set-up is more likely to provoke escalation or incentivize stability is indeterminate in general and depends on the details and context.

            The point of the quota approach is precisely to defuse a situation and lower the stakes, both in the perception of the group most worried about getting railroaded, and to disincentive over-investment in winning the rivalry in terms of the limited gains achievable by any side. (As an aside, certain ‘pragmatic’ arguments for social intervention in resource-reallocation take this form, to include enforced monogamy, i.e., “seize the means of reproduction.”)

            Admittedly, any quota system is bound to come under stress the further the allocation deviates from the true underlying power differential of the parties.

            Furthermore, the need to impose quotas is usually reflective of the breakdown of mechanisms with superior equilibria, due to the absence (or sad loss) of certain essential inputs. It is a lower-trust, fall-back contingency for the collapse of a formerly high-trust arrangement.

            That being said, at least in the realms of law and education, I would draw a distinction between the categories of adversarial games to try and differentiate between those which present ‘cases’ directed at a third party’s judgment (fair commercial competition can be seen to be of this form), and those that are more absolute or objective contests with small or negligible roles for arbitrators or audiences.

            Certainly any conflict can always be escalated to somewhere in the spectrum of physical violence from a shove to mass homicide, and keeping any competition within any particular bounds is like you said a matter of stakes and incentives, some created by the strategic logic of the game itself, but also to include those imposed by third parties or state authorities in line with “political power grows out of the barrel of a gun,” which if deployed wisely will be used precisely to contain certain socially useful and productive games within optimal bounds.

            An an anecdote, the stakes of close-call elections are pretty high, and yet the rival-observers system seems to work pretty well. Behind all that of course is the prospect of escalation to the courts and then, if held in contempt, to the chain of enforcement leading eventually to military imposition.

            The composition of SCOTUS and the rest of the Federal Judiciary itself is very high stakes, leading to all kinds of bad behaviors in nomination processes, and dark-humor memes about the grim reaper looming over the oldest ones.

            But, thankfully, assassinations of judges have been truly and profoundly rare in American history, and almost all of them seem to be a matter of personal grievance as opposed to any organized conspiracy with a view to shaping political composition via death and replacement, or the kind of terror-based intimidation as is common in certain Latin America countries.

            (As a tangential historical note, things got pretty rough for many local judges – such as the case of Judge Bradley in Iowa – during the Great Depression when economic circumstances made them responsible for issuing a large number of foreclosure and eviction. At that time is was much more common for state Governors to declare martial law, usually just in particular counties, and deploy national guard units to impose order. Those units were very effective in quickly restoring peace, which to me means they posed a much more credible threat back then because unworried about the consequences of simply shooting unruly civilians. They didn’t have to shoot people because everybody knew they were very willing to shoot people. Things would be different today.)

  3. > upping campus facility use to year-round schedules that will permit the completion of a degree program in three years rather than four.

    Year-round schedules are also required for co-op programs which have been a spectacular success at the University of Waterloo in Canada.

  4. If you completely separate grading + evaluation, then teaching becomes coaching. Like an SAT test coach or an athletic sport coach.

    The phenomenon of grade inflation exists because A-F letter grades are inherently arbitrary. Some classes give easy A’s, others are much harder. A good evaluation system wouldn’t be so arbitrary. In sports, when we measure a runner’s time in the 500 meter dash, there is no concept of grade inflation or easy graders versus hard graders, because it’s a better objective measurement. It’s both precise and accurate. Can you imagine if instead of recording 500 meter dash times, we just gave each runner this somewhat arbitrary A-F letter grade?

    With the SAT, the institutions for evaluation and for test-prep coaching are entirely separate. Imagine if math was done this way, and some institutions measured people’s math ability, and other institutions offered coaching services.

    If you have good evaluation systems, you can compare people from different schools, countries, or time periods. With A-F letter grades, those are much more arbitrary, you can have easy graders or hard graders, different curriculum choices, etc.

    With good evaluation systems, you could eliminate college admission. Everyone should be able to measure their performance. And everyone should be able to buy as many coaching services as they want and can afford.

  5. I laugh, when those who at the spear are bold
    And vent’rous, if that fail them, shrink and fear…
    -John Milton

  6. Maybe college classes become more like high school AP classes. And college professors who are reluctant to give up control over final grades would weight class participation and projects higher.

    Partially off topic: I was helping some rising college seniors at their summer internship and was surprised at how their college cs programs had left them poorly prepared to actually program. Neither had web development experience, and one had barely programmed because his classes were focused on teaching him agile methodology instead.

  7. Two very worthwhile links.

    Interestingly, according to the Voluntary System of Accountability’s College Portraits web site, the University of Colorado-Boulder participated in the College Learning Assessment one in 2014 but did not participate in the value added measurement:

    “At CU-Boulder, senior students who completed the CLA+ Performance Task (n=99) scored higher than 92% of seniors at all other CLA+-participating institutions in Spring 2014.

    At CU-Boulder, senior students who completed the CLA+ Selected Response (n=99) scored higher than 91% of seniors at all other CLA+-participating institutions in Spring 2014.

    As CU-Boulder did not participate in a value-added administration, scores are not adjusted to account for the incoming ability of CU-Boulder students.”

    So the CLA in this case turns out to be less than informative.

    Separating the grading from the teaching would appear to be a step in the right direction, but it won’t reform the curriculum. Nor will it address the ideology problem. Nor will it address the ideological licensure processes in medicine, social work, and education. Nor will it give employers an alternative to using college degrees as an inferior substitute for intelligence tests.

    The CLA+ claims it “measures college students’ performance in analysis and problem solving, scientific and quantitative reasoning, critical reading and evaluation, and critiquing an argument, in addition to writing mechanics and effectiveness.” Rather than rely upon accreditation, the federal and state and local governments should offer such an examination and recognize a passing score as the equivalent of a degree; offer an associate’s or bachelor’s degree by examination if you will. A certificate with specific scores in subareas would accompany the recognition. This would not only offer an alternative to employers in assessing potential employees, it would also provide an avenue for the tens of millions of former students who were fleeced by the higher education system but never graduated.

    A real, objective measure of competence would also be helpful in opening occupations like education that have become hard-left cliques.

    In an excellent Summer 2019 National Affairs piece entitled “Busting the college-industrial complex,” Frederick M. Hess and J. Grant Addison lay out this approach persuasively.

    They identify the problem:

    “This all raises an obvious yet oft-overlooked question: Why are college-degree requirements treated differently from other employment tests?

    The burdens of degree inflation, of course, fall most heavily on those of modest means: low-income and working-class individuals who are less likely to attend college or to complete a degree. Degree requirements summarily disqualify non-credentialed workers with relevant skills and experience from desirable jobs. They impede young workers who could otherwise take entry-level jobs and build the skills and expertise needed to pursue new opportunities. And they hold students and families hostage, forcing them to spend substantial time and money on collecting degrees, regardless of whether students wish to attend college and whether the degree in question actually conveys relevant skills or knowledge. The privileged status of the degree, meanwhile, has insulated colleges from non-degree competition. As the de facto gatekeepers to “good” jobs, colleges have increasingly operated as an employer-sanctioned cartel.”

    And they offer solutions:

    “Broadly speaking, the playing field can be leveled in one of two ways. The first and far more desirable tack would be to deregulate: broaden the ability of employers to use professionally devised employment tests in the same manner they use degrees, without fear of legal liability based almost entirely on disproportionate outcomes. The second, far less desirable approach would be to regulate further: subject college degrees to the same stringent tests that are applied to other kinds of employment tests.”

    Conservatives should step up and do the heavy lifting that will be needed to advance such necessary reforms.

  8. In software testing, there are merits to both separation of testing and coding or combining the two functions. Where I work, we do both. I see merit in outside tests like the GRE and SAT and in teacher-created tests.

    If the coders (analogues to the teacher) write the tests, the tests will test the expected use of the software and will exercise more of the capabilities, but the tests will also embody the assumptions made by the coders about what the function is and how it will be used and we won’t discover that our users have a completely different idea of how it works than we do until the product has been shipped.

    If the QA organization writes the test, they won’t cover all the function domain — that is, they will often write lots of tests which are essentially similar (e.g., imagine testing an addition function by calling it on pairs from {1..1000, 1..1000}), instead of testing the ‘edges’ of the function (e.g., addition of MAXINT to MAXINT, addition of negative MINREAL to positive MINREAL with a non-representable result, attempts to add character strings or structures or functions).

Comments are closed.