By Michael Borella --
Years ago, I was a proud parent when my children were invited to participate in an honors math program at their grade school. But this initial delight turned to confusion, and eventually frustration.
As just one example of why I was less than pleased with our school's pedagogy, one very highly emphasized part of the curriculum required that the kids memorize as many digits of pi as they could, with the minimum being 25. Sure, they also learned how pi defined the ratio of a circle's circumference to its diameter and how to use it in simple algebra, but this memorization task was the focus of the unit, with the child who memorized the most digits (130 one year) winning special accolades.
To me, this assignment missed the point. Pi is a critical value in many aspects of science and engineering, and can be taught directly or indirectly in a number of compelling and fun ways involving wheels, pizza, spirographs, and so on. And its importance in aviation and communications can at least be mentioned.
But the focus was on committing those 25-plus digits to memory and being able to recite them on demand. When I pointed out to the teachers that maybe -- just maybe -- this was not the best way to prepare children to have an appreciation for STEM fields, they looked at me like I was from another planet. The curriculum was designed around what was easy to test (can the kid produce the 25 digits when asked?) rather than the harder-to-evaluate skills (does the kid know how and when to use pi to solve problems?) that are actually important when using math in the real world.[1]
Thus, when the news broke that OpenAI's GPT-4 large language model passed the uniform bar exam at the 90th percentile, I was less than impressed. In fact, this outcome is completely unremarkable given that it was trained on billions of units of human text.
The bar exam is a memorization exam. Aspiring lawyers typically spend 10-12 weeks taking a bar exam review course, which involves committing massive amounts of legal rules and principles to memory, as well as learning how to write essays in a formulaic fashion (IRAC). Then you sit for two days of testing in which you regurgitate as much as you can. If you manage to score highly enough, you pass and become a licensed attorney.
During the summer that I spent preparing, I remember at one point mentioning in frustration to my study partner that what the bar exam is actually testing is how much pain one is willing to accept to be a lawyer, and that rapping us across the knuckles a few times with a ruler would probably have the same effect. Indeed, I know of individuals who graduated law school in the top ten percent of their class (in terms of GPA), failed the exam on their first try, later passed, and went on to be excellent attorneys. Clearly, these folks were bright, but when speaking to them they attributed their failure (which was quite the source of shame) on not studying hard enough during bar review. Let that sink in -- top law students can fail to be licensed because they do not learn the mechanical proclivities of one specific exam.
A recent paper from Professor Daniel Katz evaluates GPT-4's bar exam performance and states that "These findings document not just the rapid and remarkable advance of large language model performance generally, but also the potential for such models to support the delivery of legal services in society."[2] The key word in this sentence is "potential" but even so this statement is misleading.
GPT-4 scoring well on the bar exam is not because AI is achieving human levels of intelligence. It is because the bar exam tests a human's ability to perform like a robot. Missing from the bar exam are tests of executive function (e.g., staying organized, keeping to deadlines), soft skills (e.g., client interaction and counseling, interpersonal competencies), and law firm operation (e.g., finance, marketing, managing groups, how to be a good employer), all of which are more relevant to a lawyer's success than their ability to stuff facts into their brains.
Indeed, it is now widely accepted that GPA is much more predictive of a student's ultimate success than standardized test scores. This is because maintaining a high GPA requires more than the raw cognitive ability to do well on memorization-based exams -- the aforementioned executive functioning and soft skills play a significant role. Intellectual ability is important, but so is emotional intelligence.
Turning to patent law, there might be one multiple choice question out of 200 addressing intellectual property on the typical year's bar exam. So for us patent attorneys, the bar exam is measuring our ability to regurgitate law that we are unlikely to ever apply in practice. To that point, the USPTO requires that we pass a separate patent bar exam. Admittedly, it is also memorization-based, but at least it is open book.
So, to the extent that Professor Katz is implying that GPT-4 or any other of the current generation of large language models can perform significant legal tasks, I have to disagree. Large language models are tools that lawyers can employ, not unlike search engines or Wikipedia. They may be able to carry out certain first-level research functions in place of a junior associate. But when it comes to crafting creative legal strategies that guide clients through complex transactions, they are still far from the mark.
Nonetheless, the strong performance of GPT-4 on memorization-based exams provides us with a golden opportunity to re-evaluate how we teach both children and law students. If the goal is to turn out humans with skills that can be easily replaced by automation, then maintaining the status quo will get us there. But we would be much better off by recognizing and embracing large language models, while remaining cognizant of their strengths and weaknesses. Integrating these tools into a broad-spectrum education system with a flexible curriculum is much more likely to produce graduates who can adapt to the changing needs of the legal profession, or any other field for that matter.
The modern education system is still based too much on a paradigm established in the 1800s, one in which an instructor lectures and the students passively receive their lessons. Given that large language models can outperform most humans in these scenarios, we need to seriously consider changing the system to meet the demands of 21st century life.
And for anyone who absolutely needs to know the first 25 digits of pi, don't worry because GPT has you covered: "The first 25 digits of pi (π) are: 3.14159265358979323846264. Note that pi is an irrational number, meaning that its decimal representation goes on infinitely without repeating." Or, it almost has you covered, as the 25th digit is missing from its output.
[1] To get a sense of how prevalent issues like this are in education, 60 years ago Nobel laureate physicist Richard Feynman was asked to help the state of California select math textbooks for its schools. He wrote about the process, which is both humorous and disheartening. From what I have seen, today’s textbooks are better than they were back then but still leave plenty of room for improvement . . . such as justifying why one needs a textbook, period.
[2] Katz, Daniel Martin and Bommarito, Michael James and Gao, Shang and Arredondo, Pablo, GPT-4 Passes the Bar Exam (March 15, 2023). Available at SSRN: https://ssrn.com/abstract=4389233 or http://dx.doi.org/10.2139/ssrn.4389233.
Nice article Michael, and your point is well taken. The hype around ChatGPT2-4 reminds me of when the calculator, and later the home computer, came on the scene (yes, I was alive then).
These NLPs won't replace attorneys, just as neither the calculator nor the computer did. But I do expect them to make us more efficient. Just as computers freed us from having to look at the "pocked part," I believe NLPs like Chat GPT will free us from boolean searching case law.
Are you listening Lexis and Westlaw?
Posted by: David Austin | April 09, 2023 at 11:33 PM
Very thought-provoking. My thoughts are that it is premature to test the skills needed to manage clients or a law firm in an examination of competence to practise as a lawyer. Here in Europe, the professions of i) patent attorney and ii) attorney at law, are different but of equal standing. To qualify as a patent attorney, since the 1970's (much longer in the UK) one must satisfy the examiners that one is competent to safeguard the interests of a client. One does this by answering questions in which one first studies the client's prototype, and the given prior art, and must then draft a set of claims that captures the full scope possible while at the same time satisfying all conditions of patentability, finding that narrow line that is not an inch too narrow nor an inch too wide. And then there is the exam paper in the UK, in which you get the client's patent, the prior art, and the accused infringing embodiment and are required as your answer to write an opinion covering i) infringement ii) validity, and iii) what the client can do to optimise their situation.
Examining the answers and deciding who passes and who fails is a tough task for the (human) Examiners. But how else shall you protect the public from patent attorneys who put out their brass plate but are not competent to deliver reliable advice?
One reads that AI will replace members of the learned professions before it replaces humans with the skills to care for the bodily needs of other humans. As a patent attorney, I'm doubtful about that.
Posted by: Max Drei | April 10, 2023 at 04:25 AM
I have to disagree with perhaps an unwitting premise of the story (as well as the asserted general acceptance that GPA is anything other than being a memorization 'tool' in its own right).
The 'evolution' story here is NOT mere advance of a tool being used, but the 'tool' itself being unleashed on tasks of creativity to which no human 'in the loop' satisfies a legal definition of being the 'inventor' of the output of that loop.
Yes, I "get" that a spit-back memorization is a bit of a non-sequitur to the larger point of the intersection of Intellectual Property Law and the continued development of AI (at pure memorization - I have to scoff at ONLY achieving at the 90th percentile...?).
It is a disservice to dance in the weeds. We've already had several years since the DABUS case first came on the scene, and putting off serious discussion of the impacts of non-human innovation serves no one.
Posted by: skeptical | April 10, 2023 at 06:06 AM
Skeptical,
I am similarly not a fan of GPA, but it is slightly more helpful than just regurgitation-based standardized examination. Evaluating humans is hard but many of the ways we do it now are useless at best and harmful at worst.
Mike
Posted by: Michael Borella | April 10, 2023 at 10:34 AM
I agree Dr. Borella.
Evaluating more than rote, practically necessitates nuance and context, and risks losing objectivity (interactions with the person doing the evaluation may well become more than passively interactive).
Add in the desire for repeatability and low cost and quick turnaround while desiring to maintain accuracy....
Posted by: skeptical | April 10, 2023 at 11:42 AM
Back in the dark ages, 1978, when I took the patent bar, it was a fully written exam, requiring drafting of claims/responses/client letters. Some memorization helped, for sure - the more you had in mind, the faster you could pick out the issues you needed to deal with and address them. Now and for some time it has been multiple guess - easier for the examiners, but far less helpful in testing competence.
And if you pass, you can hang out your shingle, even if you've never seen a patent application.
At least some other countries require an "apprenticeship" before you get licensed (cf. Max Drei's comment above).
Posted by: Derek Freyberg | April 10, 2023 at 12:35 PM
> Aspiring lawyers typically spend 10-12 weeks taking a bar exam review course, which involves committing massive amounts of legal rules and principles to memory, as well as learning how to write essays in a formulaic fashion (IRAC).
If one wants to ascribe this to malice, rather than to incompetence in assessment design, one might note that the "10-12 weeks of study" and utility of expensive "bar exam review courses" add a distinctly economic aspect to who is most able to pass the bar, while requiring nothing more than the ability to memorize and operate in a formulaic manner.
It's not unreasonable to suspect that the bar exam, as it exists now, is designed primarily to prevent less-privileged people from passing (and thus gaining access to a middle-class income) while still allowing the various flavors of Bush, Kennedy, and other famous idiot children of wealth to get a credential by tossing enough money and time at the problem.
Posted by: Benjamin Rellinger | April 11, 2023 at 02:36 PM
At least some of the famous children had to take the exam multiple times. John Kennedy Jr. and Richard M. Daley needed three tries each.
Posted by: Not a Bar Exam Grader | April 11, 2023 at 02:42 PM
Ascription to malice is unbecoming.
Many of us (myself included) belong to that purported "victim" class of "less-economically-well-off."
I passed the bar with my first attempt without the expensive bar exam review courses.
Of course, the bar IS set up to exclude and/or limit membership in 'the club' - the guild model (harkening back to feudal times) is ALL ABOUT controlling supply and demand. Sure, some 'token' arguments can (and will be appropriately) made as to controlling quality, but only the most niave do not consider the larger guild controls as providing very important reality 'checks.'
Posted by: skeptical | April 12, 2023 at 09:33 AM