Stanford Study: AI Outperforms Law Professors in Contract Law Q&A

The scarcity of legal education is shifting from "providing answers" to "evaluating answers."

In the lecture hall of the law school, the most celebrated stories are often those about Socratic questioning. For a long time, the core of legal education has been regarded as a craft deeply reliant on human experience: finding boundaries in ambiguity, weighing opposing arguments, and cultivating judgment in questions that appear to have no single answer.

But a recent empirical study from Stanford Law School is putting a pause on this imagination.

This study is not about having AI take a law school exam or write a complete legal opinion. Instead, it examines a more specific and classroom-relevant scenario: when first-year law students ask questions after a contracts lecture or during office hours, would AI-generated short-form responses be more favored by fellow professors than those written by law professors?

The answer is quite striking.

One: 75.33% win rate: AI is preferred by professors in anonymous blind evaluations

The study titled “Law Professors Prefer AI Over Peer Answers,” spearheaded by Stanford Law School professor Julian Nyarko and his Legal Innovation through Frontier Technology Lab (liftlab), includes a research team of scholars from Yale, New York University, the University of Chicago, and other institutions.

The research team invited 16 U.S. contract law professors to design 40 representative questions based on common inquiries students raise during office hours in first-year contract law courses. Both human professors and large language models then provided answers, which were subsequently anonymously evaluated by the professors without knowledge of the answer sources.

The results show that, across 2,918 anonymous blind comparisons, the average win rate of large language model answers was 75.33%. The proportion rated as pedagogically misleading was 12.06% for human instructor answers and 3.53% for AI answers.

The impact of this data set does not stem from the AI correctly answering a few conceptual questions about legal knowledge. Previous AI evaluations often focused on black-and-white facts: right or wrong. But the most challenging aspect of legal education lies not in memorizing rules, but in interpreting them, applying them, and analyzing arguments between two seemingly plausible positions. This experiment tests whether the AI can meet the subtle yet rigorous professional standards legal experts use to evaluate the quality of reasoning.

II. Clash in the Gray Area: AI Wins with Clarity, Structure, and Teaching Ability

These questions require respondents to understand specific facts, identify student misconceptions, apply abstract legal rules to new scenarios, and explain them in a way suitable for teaching.

This is precisely the human advantage that legal education has long emphasized: not providing standard answers, but guiding students to develop analytical pathways. Therefore, AI outperforming in such scenarios is no small matter.

The research team deliberately controlled answer length, format, and writing structure during experiment design to prevent reviewers from developing bias simply because AI-generated answers were longer, neater, or stylistically more akin to “machine-generated text.” During the human blind evaluation phase, the study primarily compared Gemini 2.5 Pro with Google NotebookLM based on relevant casebooks. The paper further extended the evaluation to additional models using the LLM-as-judge method.

The advantage of AI is not just having "more information" or "writing faster." In this specific experiment, it better aligned with several key elements preferred by law professors in short-answer tutoring: clear structure, coherent reasoning, direct responses to the question, and a consistent teaching tone.

Law professors naturally possess richer experience and judgment in daily teaching, but in a Q&A scenario condensed into a few hundred words, human impromptu responses are not always optimal. AI, by contrast, excels at breaking down questions into multiple layers and delivering clear, reusable, and emotionally stable answers.

Three: Not replacing professors, but shifting their focus of work

Of course, interpreting this study as "AI can replace law professors" is still an overinterpretation.

The scope of the paper is clearly defined: it evaluates short-answer, office-hours-style student Q&A in a contracts law course, not full classroom instruction, thesis supervision, fact-finding, professional ethics judgments, or real-client representation skills.

AI performing well in anonymous blind evaluations does not mean it has acquired all the capabilities required for legal education. It may still generate hallucinations, exhibit overconfidence, or mislead students in the absence of context. More importantly, the goal of legal education is not merely to help students “arrive at a seemingly good answer,” but to teach them how to question, deconstruct, and reconstruct answers.

This is precisely where professors remain irreplaceable.

But this study also reminds law schools that they can no longer rely on the comfort zone of saying, "The law is too complex for AI to judge." At least in some everyday teaching scenarios, AI is already capable of generating explanations that are clear, well-structured, and even preferred by professorial peers.

The key question of the future may no longer be “Can AI answer questions?” but rather “How can law schools integrate AI into their teaching design?” It can serve as the first-layer interpreter for student pre-class preparation, a supplementary tool for post-class Q&A, and a means to train students in evaluating the strengths and weaknesses of different answers. The most valuable classrooms may shift from “professors delivering answers” to “professors guiding students in judging answers.”

Four: The moat of legal education is shifting from answers to judgment.

The most interesting aspect of this study is that it reveals how certain skills previously considered scarce in legal education—such as interpreting rules, analogizing cases, constructing preliminary arguments, and answering classroom questions—are now being revalued, as AI can consistently produce quite competent versions of these in specific contexts.

The professor's value will not disappear, but will be forced upward: from providing answers to designing questions; from explaining rules to training judgment; from correcting errors to helping students identify arguments that “seem correct but are still suspect.”

This may not be a bad thing for legal education. On the contrary, it may force law schools to confront a long-neglected question: If AI can provide clear preliminary explanations, then what truly deserves the time of human instructors in the classroom? The answer may lie in more complex facts, more authentic conflicts, more difficult-to-standardize value judgments, and more rigorous critical training.

AI outperforming professors in contract law Q&A doesn't mean professors have lost their significance—it means the scarcity in legal education is shifting: from "who can recite the answer" to "who can judge whether the answer is good enough."

References

Ashe, S. (2026, June 1). AI outperforms law professors in Stanford law study. Stanford Law School.

Salinas, A., Frieders, C., Guha, N., Ma, S., Sanga, S., Nyarko, J., et al. Law Professors Prefer AI Over Peer Answers. Stanford Law School / liftlab, 2026.