About the Authors

  • The Authors and Contributors of "Patent Docs" are patent attorneys and agents, many of whom hold doctorates in a diverse array of disciplines.
2018 Juristant Badge - MBHB_165
Juristat #4 Overall Rank

E-mail Newsletter

  • Enter your e-mail address below to receive the "Patent Docs" e-mail newsletter.

Contact the Docs


  • "Patent Docs" does not contain any legal advice whatsoever. This weblog is for informational purposes only, and its publication does not create an attorney-client relationship. In addition, nothing on "Patent Docs" constitutes a solicitation for business. This weblog is intended primarily for other attorneys. Moreover, "Patent Docs" is the personal weblog of the Authors; it is not edited by the Authors' employers or clients and, as such, no part of this weblog may be so attributed. All posts on "Patent Docs" should be double-checked for their accuracy and current applicability.
Juristat #8 Overall Rank


« Judge Newman's Suit Continues | Main | Blue Whale Genome Determined: Implications »

April 30, 2024


Will those that understand the nature of technical transformation in any AI application readily spot how easy a question of Fair Use it is to train AIs?

Perhaps I should change my moniker to shocked.

You state that “it stands to reason that you can use an article you purchased to train your own LLM, so long as the LLM doesn't output copyrighted material directly to third parties.” But what if one has not purchased the article, but picked it up for free from the Internet (where it was voluntarily posted by the author)?

@Ivan Poli - Would likely depend on whether there were a license associated with the article or not. Simply because something is accessible on the internet doesn't mean it is free of copyright protection. For example, if you find a video of a copyrighted song on YouTube, you do not necessarily have the right to retain a local copy merely because the owner of the copyright has yet to exercise a DMCA strike against the video. Generally, though, once you purchase a copyrighted work, you can do what you want with that particular copy (within certain limits) (e.g., based on the First Sale Doctrine - https://en.wikipedia.org/wiki/First-sale_doctrine)

@Ivan Poli, Andrew Velzen -

But if I train the neural network in my brain by viewing the copyrighted work, and then I use the trained neural network in my brain to create a new work, have I infringed any copyright? I think the answer is obviously "no", as long as my work is different in expression from the copyrighted work. Why, then, should the answer be "yes" if my neural network is not in my brain, but on my computer?

(Note -- perhaps the answer could become "yes" if Congress amends the Copyright Act, but as of today, it hasn't.)

@Extraneous Attorney - I would obviously agree that if you train your own biological neural net (i.e., your brain) and then create a new work, you have not committed copyright infringement. However, the New York Times (and other plaintiffs mentioned in footnote 27) would have courts believe that there is an inherent difference between a human mind doing it and a generative AI doing it.

One could argue that an AI trained on a copyrighted work includes some sort of durable copy of that underlying copyrighted work (even if in a very mutated form, e.g., spread out across multiple layers of a neural net and stored on a hard drive). So, the mere existence of the trained AI could, at least arguably, be considered copyright infringement (regardless of what it outputs). Obviously, though, if you owned an underlying copy of the original work (e.g., by virtue of purchasing it), having a copy (mutated or not) for your own use is acceptable.

To your point, there are certainly strong counter-arguments that, regardless of the method of training, so long as a copyrighted work isn't directly output by the generative AI, it's fair use.

Will be interesting to see what the district courts think of all this.


Quite apart from any human/machine analogy (I have seen arguments against such an analogy), I am still waiting for a cogent view that the ingest for training is NOT Fair Use.


You have a split actor problem. There are two actions involved, and the question of Fair Use (for training) is a distinct question.

The view that “a copy is in there,” is factually untrue, as that is not how the training works.

@skeptical - Do you have a link to a source with arguments against the human/machine analogy in this context? I would be curious to see them, as perhaps there are good reasons why the analogy should not be valid.

@Andrew Velzen -

I agree, and I would go so far as to hope that OpenAI and Microsoft will not settle with the NYT. (They have good reasons not to, given the relief requested by the NYT, but that doesn't mean no settlement is possible.) The question will come up so often in the future (I think) that any clarity would be welcome.


Apologies as I am not at liberty to share that.

@skeptical -

No problem, I understand.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Comments are moderated, and will not appear until the author has approved them.

Your Information

(Name is required. Email address will not be displayed with the comment.)

June 2024

Sun Mon Tue Wed Thu Fri Sat
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29