By Michael Borella --
In August, the U.S. Patent and Trademark Office announced that it planned on raising various fees. One of those involved an additional $400 fee for non-provisional utility application filings with a PDF specification. This fee would be avoidable if the specification was filed in DOCX format. The USPTO's electronic filing system (EFS) has supported DOCX specification filings for over two years.
As of the time of this writing, the USPTO has not made a final decision on this issue. In anticipation of the fee increase, patent attorneys, agents, and paralegals have been trialing DOCX uploads. So far, the EFS DOCX parser has proven to work well with many files, but it is rather fragile in some cases, and outright buggy in others.
DOCX is an open standard for word processing files. Since Microsoft Word 2007, it has been the default choice for the save format of that application. Unlike the proprietary DOC files that Microsoft Word used to produce, DOCX files are structured in XML. This makes them more portable between word processing applications and easier to parse.
The USPTO justified its encouragement of DOCX adoption based on the format's ability to facilitate instant feedback in EFS regarding common document errors, improved searchability, metadata removal, and general compatibility. Behind the scenes, it is likely that the USPTO intends to automate some of its application intake processes, such as automatically detecting the number and type of claims and classifying applications into art units.
We have been testing the USPTO's DOCX upload options in EFS. What we found was not particularly encouraging. Most notable was that EFS will often reject a DOCX specification upload, stating that it found one or more particular types of errors in the application file. But upon inspection, we found that the indicated errors frequently did not exist. Instead, the parser flags false positives when it encounters legitimate formatting or content that it cannot properly handle.
One such issue relates to font support. For one particular application, we received dozens of error messages alleging that the file contained text in the unsupported Century font. We thoroughly reviewed the application and determined that there was no Century characters present. After some trial and error, we ultimately determined that this error was actually being caused by our custom Microsoft Word styles. These styles allow the drafter to rapidly format applications so that they are consistent with one another and pleasing to the eye. But some of our styles had been based on other styles. Apparently this was problematic, because the errors went away once we changed these styles to be based on "no style".
Even worse, one of our applications kept getting rejected because it allegedly contained two or more of the specification, claims, and abstract (EFS DOCX support requires these three sections of the application be uploaded in three separate DOCX files). Yet, the file clearly contained only the specification. After manually removing sections of the application in a systematic fashion, we found the culprit -- the USPTO's DOCX parser apparently will not accept the word "conclusion" on a line by itself. When placed in a sentence, no problem. But on its own, "conclusion" consistently resulted in a rejected upload. Again, the error provided had nothing to do with the purported problem with the DOCX file. Only after hours of manual debugging were we able to satisfy EFS.
Needless to say, DOCX support is not ready for prime time. Practically speaking, an attorney or agent up against a bar date may find that he or she cannot upload a reasonably-formatted DOCX file, and may be unable to address the issue in the necessary time frame due to the DOCX parser's obtuse and misleading error messages. Instead, he or she may have to just eat the $400 fee and file a PDF.
But that's not all. When you can successfully upload a DOCX file, the USPTO converts it into a PDF. Afterward, EFS provides a link to the PDF and displays the message, "The PDF(s) have been generated from the docx file(s). Please review the PDF(s) for accuracy. By clicking the continue button, you agree to accept any changes made by the conversion and that it will become the final submission." This effectively puts the onus on the attorney or agent to manually check, line by line, that the USPTO's conversion from DOCX to PDF is correct. Doing so is especially important if your application contains complex mathematical expressions or chemical formulas.
We are not to the first to point out some of these issues. I highly recommend an article by Carl Oppedahl criticizing the USPTO's implementation of DOCX.
While the USPTO may have legitimate reasons for transitioning to DOCX, the fundamental defects in the DOCX parser reflect a lack of adequate software quality assurance. Thus, it needs to address these problems before imposing the $400 fee on non-DOCX specification filings. If anything, the USPTO's attempt to reduce its application intake burden currently transfers much of that burden to individuals preparing the applications.
Thanks for the Oppedahl link.
As I recall, once upon a time he ran a listserv forum. I wonder if he still runs it. It had some cludgy aspects to it, but had a much better than your typical blogging effort rapport between contributors.
To the point of the article, I believe that Dave Boundy has provided feedback to the USPTO also critical of the DOCX (and associated charging of fees).
Will the USPTO listen?
I remain somewhat:______
Posted by: skeptical | January 08, 2020 at 07:31 AM
The DOCX uploads also fail to handle in-specification formulas well too.
Posted by: EFS Pain | January 08, 2020 at 10:16 AM
Earlier today the USPTO reached out to me and I have provided them with some example files that cause the DOCX intake to fail, as well as "fixed" versions that succeed. I am glad to see that they are taking this issue seriously.
@EFS Pain: I tested some formulas and did not see a problem, but if you are you should get in touch with the USPTO. That's a serious problem.
Posted by: Mike | January 08, 2020 at 12:25 PM
Quick update. I received this from the USPTO:
"If people have specific errors (i.e. files that won’t upload properly), please feel free to have them email the [email protected] address. The other option is for people to contact the EBC, either via email ([email protected]) or phone (866-217-9197). The agents that staff the EBC will create a ticket that is escalated directly to our team.
We greatly appreciate when people bring issues to our attention so that we can address them."
Posted by: Mike | January 08, 2020 at 01:29 PM
Why take chances with the USPTO cram down "acceptance" procedure? Does the USPTO make the filer do this at time of filing? Crazy. Folks with formulas and the like will just have to file in PDF and charge the $400 to the client....
Posted by: Blindman | January 08, 2020 at 02:12 PM
Great post. With the current framework, the $400 fee would frequently feel worth paying given the amount of time it takes to confirm that a .docx file was properly converted to PDF by the automated process. Given billable hour values, it may often just be cheaper to pay the fee, and even where it's a push or worse, it's much easier to sleep at night trusting Microsoft or Adobe to handle PDF conversion.
Posted by: J. Doerre | January 08, 2020 at 02:17 PM
Mike, thanks for sharing you experiences. "Only after hours of manual debugging were we able to satisfy EFS." Did you bill your clients for that time? :-)
Skeptical, the Oppedahl listservs still exist (although there have been problems receiving emails since the servers were switched in November), and the contributions there are uniformly of higher quality than those in the comments sections on blogs. For many of us, those listserv fora are far more informative and helpful than any CLE sessions. And a good number of listserv subscribers reported problems with docx similar to what Mike reported.
A large group of us listserv people signed onto a letter that David Boundy prepared criticizing the proposed rule changes, especially the docx requirement. That letter was submitted as a comment to the PTO.
I don't understand why the PTO didn't propose to do like WIPO, viz. continue to let us file pdfs that *we ourselves* create, but with the option of *also* submitting docx files for use in case of discrepancy or if something is unclear, but which won't be made public. One cynical listserv subscriber viewed the docx requirement as something the PTO knows conscientious practitioners will never comply with, and which therefore is just a money-grab. Personally I suspect it's not that sinister, and merely that the PTO personnel involved in this have little experience on the practitioner side, and don't realize how unrealistic the proposed requirement really is.
Posted by: Dan Feigelson | January 08, 2020 at 02:19 PM
"I don't understand why the PTO didn't propose to do like WIPO, viz. continue to let us file pdfs that *we ourselves* create"
Honestly, its just much easier to extract data from the underlying xml in a DOCX file. I cannot even begin to tell you how much of a migraine a PDF files are. So there is nothing nefarious going on, the USPTO is just trying to increase automation.
I'll get a parser up on corepatent.app that will replicate the USPTO's header matches (should be trivial) and identify any issues identified above. This shouldn't be too difficult to implement.
Mike - can you send me the problematic docx files? I'd love to take a look at them and see if I can get the parser to catch these issues.
Simon
Posted by: Simon Booth | January 08, 2020 at 08:53 PM
"Honestly, its just much easier to extract data from the underlying xml in a DOCX file. I cannot even begin to tell you how much of a migraine a PDF files are. So there is nothing nefarious going on, the USPTO is just trying to increase automation."
Simon, if your explanation of the rationale behind the USPTO's proposed rule is correct, then you've confirmed that unlike WIPO, the USPTO doesn't care a whit about how end users are affected. What's paramount is how easy it is for the USPTO to do what the USPTO *thinks* it needs.
I doubt that extracting data is any less important to WIPO than it is to the USPTO, yet WIPO has figured out how to manage with applicant-generated pdfs, with applicants having the option of also submitting documents in docx format.
Posted by: Dan Feigelson | January 09, 2020 at 12:56 AM
I do wonder how the $400 amount was settled on.
Posted by: skeptical | January 10, 2020 at 02:23 PM
On our first docx submission (and, for now, our last), EFS reformatted the specification and added three pages and changed the margins, which made a line-by-line comparison quite time-consuming. The PTO had no explanation. I would have no confidence that a PTO-created PDF would accurately convert mathematical formulas, chemical structures, or greek characters.
Posted by: DC | February 03, 2020 at 01:27 PM