About the Authors

  • The Authors and Contributors of "Patent Docs" are patent attorneys and agents, many of whom hold doctorates in a diverse array of disciplines.
2018 Juristant Badge - MBHB_165
Juristat #4 Overall Rank

E-mail Newsletter

  • Enter your e-mail address below to receive the "Patent Docs" e-mail newsletter.

Contact the Docs


  • "Patent Docs" does not contain any legal advice whatsoever. This weblog is for informational purposes only, and its publication does not create an attorney-client relationship. In addition, nothing on "Patent Docs" constitutes a solicitation for business. This weblog is intended primarily for other attorneys. Moreover, "Patent Docs" is the personal weblog of the Authors; it is not edited by the Authors' employers or clients and, as such, no part of this weblog may be so attributed. All posts on "Patent Docs" should be double-checked for their accuracy and current applicability.
Juristat #8 Overall Rank


« RegenxBio Inc. v. Sarepta Therapeutics, Inc. (D. Del. 2024) | Main | Rethinking In re Cellect and Its Consequences »

February 19, 2024


The reason an LLM, if properly prompted will provide a copy of some of the training data is that
memorization can be part of training (see
Memorisation versus Generalisation in Pre-trained Language Model, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics
Volume 1: Long Papers, pages 7564 - 7578, 2022).

As to note 1, there is also the possibility that there is required a series of prompts to arrive at the output. As such, there are also contractual implications for the Times in arriving at their 'evidence.'

The bottom line here is that the Times do NOT have a strong case on the actual merits.

This will be an easy call.

Early on your criticize OpenAI for making "non-legal arguments" but then later on the last fair use factor you talk about "OpenAI will have a tough time establishing that it is not *effectively free-riding* off of The Times' investment in journalism" and "Bing will ... glean[] the *underlying information* used to formulate its answers" from e.g. the NYT website. (Asterisks are my poor man's attempt at emphases here.) I don't think those are really legal arguments either.

To be clear, I have zero canines in this confrontation, but color me a bit skeptical that NYT can somehow bootstrap some inadvertent, stray instances of verbatim copying against what is clearly its real target—the non-verbatim, deliberate training that incorporates the verbatim content.

Having been filed in the Southern District of NY, relevant Second Circuit precedent that might give OpenAI some comfort includes Authors Guild v. Google (2016)

Referencing fairuse.stanford.edu, my Co-pilot states in part:

Key Points: The court concluded that Google’s unauthorized digitizing of copyright-protected works, creation of a search functionality, and display of snippets from those works are non-infringing fair uses. The purpose of the copying is highly transformative, the public display of text is limited, and the revelations do not provide a significant market substitute for the protected aspects of the originals.

Anyone knowing anything about how the training is actually processed at the front end can easily point to the far more extensive transformation.

As I noted - this is an easy call.

While I have seen some SAY that the NY Times has a pretty good legal position, I have yet to actually see any compelling analysis by way of the FACTS PERTINENT TO AI being presented.

In this manner, the recent Warhol case is decidedly NOT on point.

The Warhol case did not pivot AT ALL on technical processing in the view of transformation. Rather, the facts of that case pivoted almost exclusively on a different factor of commercial use.

The facts in AI are much more aligned with technical transformation, and even less such technical transformation in the Google cases EASILY saw Fair Use reached.

Again, I note that this is an easy call.

Seen the news?

Allegations against NYT - curious as to whether if they hold up, if this case will make it to decision (or perhaps would be a directed decision).

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Comments are moderated, and will not appear until the author has approved them.

Your Information

(Name is required. Email address will not be displayed with the comment.)

April 2024

Sun Mon Tue Wed Thu Fri Sat
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30