By Michael Borella --
Years ago, I was a proud parent when my children were invited to participate in an honors math program at their grade school. But this initial delight turned to confusion, and eventually frustration.
As just one example of why I was less than pleased with our school's pedagogy, one very highly emphasized part of the curriculum required that the kids memorize as many digits of pi as they could, with the minimum being 25. Sure, they also learned how pi defined the ratio of a circle's circumference to its diameter and how to use it in simple algebra, but this memorization task was the focus of the unit, with the child who memorized the most digits (130 one year) winning special accolades.
To me, this assignment missed the point. Pi is a critical value in many aspects of science and engineering, and can be taught directly or indirectly in a number of compelling and fun ways involving wheels, pizza, spirographs, and so on. And its importance in aviation and communications can at least be mentioned.
But the focus was on committing those 25-plus digits to memory and being able to recite them on demand. When I pointed out to the teachers that maybe -- just maybe -- this was not the best way to prepare children to have an appreciation for STEM fields, they looked at me like I was from another planet. The curriculum was designed around what was easy to test (can the kid produce the 25 digits when asked?) rather than the harder-to-evaluate skills (does the kid know how and when to use pi to solve problems?) that are actually important when using math in the real world.[1]
Thus, when the news broke that OpenAI's GPT-4 large language model passed the uniform bar exam at the 90th percentile, I was less than impressed. In fact, this outcome is completely unremarkable given that it was trained on billions of units of human text.
The bar exam is a memorization exam. Aspiring lawyers typically spend 10-12 weeks taking a bar exam review course, which involves committing massive amounts of legal rules and principles to memory, as well as learning how to write essays in a formulaic fashion (IRAC). Then you sit for two days of testing in which you regurgitate as much as you can. If you manage to score highly enough, you pass and become a licensed attorney.
During the summer that I spent preparing, I remember at one point mentioning in frustration to my study partner that what the bar exam is actually testing is how much pain one is willing to accept to be a lawyer, and that rapping us across the knuckles a few times with a ruler would probably have the same effect. Indeed, I know of individuals who graduated law school in the top ten percent of their class (in terms of GPA), failed the exam on their first try, later passed, and went on to be excellent attorneys. Clearly, these folks were bright, but when speaking to them they attributed their failure (which was quite the source of shame) on not studying hard enough during bar review. Let that sink in -- top law students can fail to be licensed because they do not learn the mechanical proclivities of one specific exam.
A recent paper from Professor Daniel Katz evaluates GPT-4's bar exam performance and states that "These findings document not just the rapid and remarkable advance of large language model performance generally, but also the potential for such models to support the delivery of legal services in society."[2] The key word in this sentence is "potential" but even so this statement is misleading.
GPT-4 scoring well on the bar exam is not because AI is achieving human levels of intelligence. It is because the bar exam tests a human's ability to perform like a robot. Missing from the bar exam are tests of executive function (e.g., staying organized, keeping to deadlines), soft skills (e.g., client interaction and counseling, interpersonal competencies), and law firm operation (e.g., finance, marketing, managing groups, how to be a good employer), all of which are more relevant to a lawyer's success than their ability to stuff facts into their brains.
Indeed, it is now widely accepted that GPA is much more predictive of a student's ultimate success than standardized test scores. This is because maintaining a high GPA requires more than the raw cognitive ability to do well on memorization-based exams -- the aforementioned executive functioning and soft skills play a significant role. Intellectual ability is important, but so is emotional intelligence.
Turning to patent law, there might be one multiple choice question out of 200 addressing intellectual property on the typical year's bar exam. So for us patent attorneys, the bar exam is measuring our ability to regurgitate law that we are unlikely to ever apply in practice. To that point, the USPTO requires that we pass a separate patent bar exam. Admittedly, it is also memorization-based, but at least it is open book.
So, to the extent that Professor Katz is implying that GPT-4 or any other of the current generation of large language models can perform significant legal tasks, I have to disagree. Large language models are tools that lawyers can employ, not unlike search engines or Wikipedia. They may be able to carry out certain first-level research functions in place of a junior associate. But when it comes to crafting creative legal strategies that guide clients through complex transactions, they are still far from the mark.
Nonetheless, the strong performance of GPT-4 on memorization-based exams provides us with a golden opportunity to re-evaluate how we teach both children and law students. If the goal is to turn out humans with skills that can be easily replaced by automation, then maintaining the status quo will get us there. But we would be much better off by recognizing and embracing large language models, while remaining cognizant of their strengths and weaknesses. Integrating these tools into a broad-spectrum education system with a flexible curriculum is much more likely to produce graduates who can adapt to the changing needs of the legal profession, or any other field for that matter.
The modern education system is still based too much on a paradigm established in the 1800s, one in which an instructor lectures and the students passively receive their lessons. Given that large language models can outperform most humans in these scenarios, we need to seriously consider changing the system to meet the demands of 21st century life.
And for anyone who absolutely needs to know the first 25 digits of pi, don't worry because GPT has you covered: "The first 25 digits of pi (π) are: 3.14159265358979323846264. Note that pi is an irrational number, meaning that its decimal representation goes on infinitely without repeating." Or, it almost has you covered, as the 25th digit is missing from its output.
[1] To get a sense of how prevalent issues like this are in education, 60 years ago Nobel laureate physicist Richard Feynman was asked to help the state of California select math textbooks for its schools. He wrote about the process, which is both humorous and disheartening. From what I have seen, today’s textbooks are better than they were back then but still leave plenty of room for improvement . . . such as justifying why one needs a textbook, period.
[2] Katz, Daniel Martin and Bommarito, Michael James and Gao, Shang and Arredondo, Pablo, GPT-4 Passes the Bar Exam (March 15, 2023). Available at SSRN: https://ssrn.com/abstract=4389233 or http://dx.doi.org/10.2139/ssrn.4389233.
Drugs May Cost Too Much, But Patents Are Not the Cause
By Kevin E. Noonan --
Mr. Gaugh not surprisingly asserts that the only way to reliably reduce drug prices is generic and biosimilar competition. This case can certainly be made for generic drugs, which have an almost 40-year track record leading to the statistic that "generics and biosimilars account for 91% of prescriptions filled in the U.S. but only 18% of prescription spending." But Mr. Gaugh argues that these gains are at risk from problems with sustainability of the generic (and biosimilar) drug industries. As Mr. Gaugh explains, often "the price of generic medicines has fallen to an unsustainably low level, resulting in market exits and creating the optimal conditions for shortages," which shortages are appearing in the aftermath of the economic and supply chain disjunctions caused by the pandemic. (This statement is ironic albeit truthful, because Mr. Gaugh also quotes FDA statistics that generic competition results in "an astounding 95% price drop on a mature market." This suggests that the meme that high prices for branded drugs were solely caused by pharmaceutical company greed was incorrect.)
Even the newer generic and biosimilar drugs are "being squeezed" by "historically slow adoption," Mr. Gaugh writes (although the reasons for this between these classes of drugs are likely not to be the same). With regard to biosimilars, the financial benefits are patent, being "on average more than 50% less than the brand price was when the biosimilar launched" (for drugs that although representing only a fraction of prescriptions, drive almost half of all drug spending) and yet are "woefully underutilized." Mr. Gaugh uses Humira® as an example, which starting July 1st of this year is subject to competition by several biosimilars (see "The New York Times Is at It Again Regarding Patents"). But who will benefit may not be patients; Mr. Gaugh identifies "middlemen" as being able to exact greater rebates from Humira® sales while formularies are expected (by Mr. Gaugh) to "sideline" these biosimilar equivalents.
Mr. Gaugh uses Semglee, the first interchangeable insulin biosimilar, to illustrate the effect of the market and its participants on this failure of biosimilar substitution to reap the benefits promised by passage of the Biologics Price Competition and Innovation Act (BPCIA) as part of Obamacare. According to the article:
Semglee has two different prices, one with a slight decrease in price compared to the brand and a high rebate, and another with a major (65%) decrease in price. Although the lower list price would have translated into lower costs to patients, PBMs have largely stuck with the higher priced brand insulin rather than encouraging use of the lowest list price.
In addition to these economic consequences, Mr. Gaugh also argues that "manufacturing and regulatory challenges, runaway price deflation driven by middlemen market consolidation, and government policies in Medicaid, Medicare and 340B that reduce the financial viability of generic manufacturing."
While conceding that there is no "magic bullet" for correcting (or at least improving) these circumstances, Mr. Gaugh argues that adoption of the following options could provide some solutions:
• Improving FDA internal collaboration between inspectors and its drug shortage staff (DSS) and between the agency and manufacturers working to avoid a shortage,
• Creating a reserve capacity supply of key medicines as well as creating incentives for hospitals to purchase reserve supply at sustainable, long-term fixed price and volume contracts,
• Improving Medicare drug formulary coverage of new generics and biosimilars, and
• Removing financial burdens such as the Medicaid inflation penalty and 340B that make continued production of low-margin generics unsustainable.
Mr. Gaugh concludes his article that both the branded and generic/biosimilar drugs industries are businesses driven by investment and "if government policies continue to penalize low-cost generic medicines and block adoption of new generics and biosimilars" decreased investment may follow. Which of course will just exacerbate high drug prices and increased shortages that burden the health care system.
While the message of Mr. Gaugh's article are anything but hopeful, it was refreshing for a change to have problems with drug pricing in the U.S. not to focus on (or even mention) patents as being the cause. That may be a popular refrain from the media and some politicians (see, e.g., "The New York Times Is at It Again Regarding Patents"; "Faux-Populist Patent Fantasies from The New York Times"; "The More the Merrier: The Journal Joins the Times in Complaining about Patents"; "New York Times to Innovation: Drop Dead"; "Science Fiction in The New York Times") but Mr. Gaugh's assessment provides a welcome, informed alternative to what people think "everybody knows."
Posted at 10:36 PM in Media Commentary | Permalink | Comments (8)