A Warning
Protesters beware – the Government Accountability Office (“GAO”) wants you to know they can and will sanction misuse of Generative Artificial Intelligence (“GenAI” or “AI”). Four recent decisions issued between May and July 2025 all involve citation to fake decisions by protesters without legal counsel.[1] In each instance, GAO dismissed on other grounds, but fired a warning shot, invoking GAO’s “inherent right to dismiss any protest and to impose sanctions” if the protester uses AI to “undermine the integrity and effectiveness of our process.”[2] In addition to these problems, recent reports raise concerns about discoverability in lawsuits and inadvertent disclosure of seemingly private GenAI conversations.[3]
Popular GenAI systems are built on Large Language Models (“LLMs”) that essentially predict the next word based on enormous data sets. These remarkable systems are very good at many things. But in the legal domain, they suffer from at least three stubborn problems: (1) factual hallucination; (2) sycophancy; and (3) overconfidence. In other words, AI is different from most lawyers: it seldom says “well, it depends.” More importantly, it has the unique trait to confidently make up information from whole cloth. It is this characteristic that is leading to a deluge of sanctions from various courts and now GAO. AI may give you what you want almost immediately and it looks very good. The second you dig in? You realize AI says a protest was sustained when it was denied. Or AI (inexplicably) gives the wrong case for a real protest. Or it completely fabricates information when it can't find the support it needs to make you happy.
In the bid protest context, unhappy bidders draft a letter outlining the legal factual and legal grounds for their position. This includes supporting law, the Federal Acquisition Regulations, and legal precedent. It goes without saying that the citations submitted with any protest must be correct in form and support the position a protestor is asserting. When a lawyer submits a protest on behalf of their client, the lawyer is bound be the rules of professional conduct. That lawyer must take the time to ensure that what he or she is submitting is true. In any event, the protestor or their lawyer must make sure they are not misleading GAO by providing false information, including citations. To do otherwise is not only risking the success of the protest, but also risking sanctions and reputational damage.
Hallucination
First, AI has a strange and unique trait to create information. Suppose you’re talking with a lawyer and ask them an obscure legal question. They are unlikely to instantly respond with a perfectly plausible but entirely made-up reference to a case. Yet this is exactly what AI does on a regular basis. In my own practice, clients are now regularly sending me AI-generated documents. In each case, I find hallucinated case law, embedded with factual distortions and inaccurate summaries. From my perspective, the benefits of clients doing work with AI prior to engaging with me is outweighed by the time wasted untangling what the AI gets right from what it gets terribly wrong.
I commonly hear the refrain: “But even if there are minor problems, AI today is as bad as it will ever be.” The implication is that even if there are some kinks to work out, the problems will be resolved in short order. After all, technology improves exponentially, right? Not true. The newer and more complicated reasoning models, which have become ubiquitous, hallucinate significantly more than older models.[4] This problem, in other words, has only gotten worse, and it shows no signs of improving. Some experts believe this is a fundamentally intractable problem with LLMs: “These systems use mathematical probabilities to guess the best response. . . [and] ‘[d]espite our best efforts, they will always hallucinate.’”)[5]
This is bad news for laymen who lack domain-specific expertise because they are unlikely to detect when AI spits out something that smells fishy. This leads directly into the next problem, which is sycophancy.
Sycophancy
Second, people intuitively understand when you surround yourself with sycophants or “yes-men.” You lose a critical ability to detect when there are flaws in your own logic. AI is the ultimate sycophant, designed to flatter and please the user. If you don’t believe me, go ask your AI what your chat history says about you.[6] If it says something critical about you, I’ll eat my hat.
OpenAI and other LLMs admit that this is a problem.[7] In April, OpenAI wrote that after an update “GPT‑4o skewed towards responses that were overly supportive but disingenuous.” The issues caused the company to roll back the update and commit to building in more “guardrails to increase honesty and transparency.” Tweaking model behavior on the back-end is all well and good, but right now baseline levels of sycophancy are very high.
Even with advanced prompting, experienced users are likely to get high levels of agreement from their chatbot with little critical pushback. And prompting and back-end fiddling are only marginally useful. These LLMs are so large and complex that they are difficult to direct. As a beta tester for legal GenAI tools, I’ve experienced serious failures with products that I brought to the platforms’ technical teams. They took my feedback but were unable to promise any improvements or real changes because these systems are a black box. To illustrate this point, consider a conversation I had with a C-Level executive a couple of months ago. No matter what sort of back-end prompting they tried, their AI wouldn’t stop telling its users “bless your heart.”
If you ask a lawyer about your odds for success on a case, they are likely to hedge because there is inherent uncertainty in litigation and there are many contingent facts, both known and unknown, that can intervene to change the outcome. A reasoned response from a lawyer to a client is based on that lawyer’s experience. A lawyer will tell a client what the client needs to hear. While a lawyer may want to please their client, the lawyer will (or at least should) give a thoughtful response, rather than a response that simply confirms what the client wants to hear.
Understandably, clients may prefer to hear that they are totally right. For better or worse, GenAI will seldom question underlying assumptions. This sycophancy is part of what likely contributes to AI hallucination. You want a case to support your argument – but there is none. What is a “yes-man” to do? Well, make it up, of course!
Confidence
Third, an underlying limitation in GenAI is that it has extremely high confidence in its answers, even under direct questioning. I’ve had exchanges with GenAI where it gives me a suspect citation. After searching for the case (which looks amazing and supports my viewpoint perfectly), I come to the conclusion that it is a fake. If I challenge the AI and ask where it found the case, the response tends to go one of two ways. First, in some cases, the AI will repeatedly insist that the case is real with absolute confidence. This double-down is problematic for obvious reasons, but even worse when a layman (or a lawyer who is not familiar with the inherent issues of GenAI) is involved who can’t or won’t check the AI’s accuracy. The other option relates back to sycophancy: the AI will become incredibly apologetic, admit that it made up the citation, and try to account for its mistake. In some cases, this means searching for a substitute citation that leads us back to square one: another fake case.
Here’s the issue: a word probability machine does not have the ability to generate confidence intervals.[8] GenAI is not currently capable of reliable self-reflection, uncertainty, or doubt. It does not arrive at answers in the same way that humans do. It is very easy to anthropomorphize this technology when it presents as an intelligent mind. This is even easier when it tells you exactly what you want to hear. I understand that it can be maddening to hear a lawyer answer “it depends” to a seemingly straight-forward question. But unfortunately, often the answers AI gives are the equivalent of empty calories: all form and no substance.
Privacy
Fourth, on top of these three intractable problems are privacy, confidentiality, and related concerns. On the legal front, OpenAI CEO Sam Altman has made it clear that personal legal details should not be shared with ChatGPT because “[i]f you go talk to ChatGPT about your most sensitive stuff and then there's a lawsuit, we could be required to produce that.”[9] Altman is referring to an order by a judge in a copyright dispute between the New York Times and OpenAI, which means OpenAI cannot delete chats.[10] Unlike attorney-client privilege, there is no established legal privilege between a chatbot and its user. Personally, I am interested in issuing a discovery request for “all relevant conversations with GenAI.” I do not see how an opposing party could justify withholding this information, and even if they deleted it[11] OpenAI is telling us that these records now exist on the cloud and could be accessed by subpoena.
A more recent concern arose when Fast Company broke the story that seemingly private conversations with ChatGPT were searchable after being indexed by Google.[12] Users have the option to “share” a link to a conversation by various means, including text. When users created this link, even when sharing with a single other person, the generated link was publicly viewable. As a result, it was indexed by Google and searchable by anyone on the web. OpenAI has since moved aggressively to change this feature and scrub the contents from Google.[13] Although this “breach” has seemingly been resolved, many of these conversations have been eternally documented by websites like Archive.org.[14]
Both of these stories make me think that contractors will use GenAI without understanding the confidentiality, privacy and privilege risks. Even if you have hired a lawyer, disclosing information to a third party like OpenAI could break attorney-client privilege. More importantly, these are probably just the start of data-related breaches.
What Next at GAO?
For the record, I am optimistic about the use of AI despite the criticisms outlined here. And notably, I do not believe that hiring a lawyer is a silver bullet to combat these concerns. There are countless cases of lawyers sanctioned for misuse of AI.[15] In fact, now more than one judge has had to withdraw orders based on misuse of AI. But at least in the bid protest context, hiring a lawyer could be beneficial, especially if your lawyer has significant government contracts experience.
Back to the recent protest decisions: I’ve been tracking these since the first decision came out in May. So far, these decisions have only warned protesters that GAO can dismiss or sanction based on AI misuse. To say it another way, GAO has not yet dismissed a case as a sanction for misuse of AI. However, while the first three were probably doomed regardless, the latest protest might have had a chance with a lawyer.[16]
Another intriguing question is whether the protester would have even filed the protest without access to AI. On the one hand, maybe without AI this company never would have filed the protest in the first place, or perhaps they would have made the argument. But because they had access to and (unwarranted) confidence in AI, they decided to protest without a lawyer.
In any case, AI reliance is generally becoming much more normalized. In its recent decisions, GAO has indicated they are likely to impose sanctions when they see citations to nonexistent or inappropriately cited case law in the future. GAO is just waiting for an opportunity to dismiss a protest purely based on AI misuse. The problems with relying on AI will persist, so there are likely to be an increasing number of decisions like this. I am aware of at least one case where no public decision was issued, but the agency argued that the protester relied on nonexistent case law. I’m confident there are other examples that are not public.
Conclusion
In an attempt to save costs and time, protestors are relying on AI rather than hiring legal counsel. There is no shortcut, and to roll the dice on an overreliance on AI can create significant issues. You may have an excellent case. Do not let the issues set forth in this article torpedo what might otherwise be a clean win.
GAO does not like these false citations. And GAO is right. Tracking down false citations is a waste of time for everyone involved. Agency counsel are getting up to speed on this issue and are aggressively moving to dismiss based on AI usage. Ultimately protesters care about whether they can win their protest. If GAO sees these sort of issues in your protest, they are likely to downgrade your credibility. It does not matter if GAO dismisses your protest explicitly because it has hallucinated case law in it or because they treat the remainder of your argument more skeptically. The effect is the same: your chances of winning are getting worse.
[1] To date, I am not aware of any AI hallucination discussion at the Court of Federal Claims (“COFC”) in the bid protest context. However, COFC has discussed AI misuse in an unusual $1B slavery reparations claim. See Sanders v. United States, No. 24-cv-1301 (March 31, 2025)(“The cases referenced by Plaintiff have the hallmarks of cases generated by AI found in other courts.”).
[2] Raven Investigations & Security Consulting, LLC B-423447 (May 7, 2025); see also Assessment and Training Solutions Consulting Corporation B-423398 (June 27, 2025); Wright Brothers Aero, Inc. B-423326.2 (July 7, 2025); BioneX, LLC B-423630 (July 25, 2025).
[3] Schwartz, E. H. (2025, August 1). OpenAI pulls chat sharing tool after Google search privacy scare. TechRadar. https://www.techradar.com/ai-platforms-assistants/chatgpt/openai-pulls-chat-sharing-tool-after-google-search-privacy-scare; Al-Sibai, N. (2025, July 28). If you've asked ChatGPT a legal question, you may have accidentally doomed yourself in court. Futurism. https://futurism.com/chatgpt-legal-questions-court
[4] Yao, Z., Liu, Y., Chen, Y., Chen, J., Fang, J., Hou, L., Li, J., & Chua, T.-S. (2025). Are reasoning models more prone to hallucination? arXiv preprint arXiv:2505.23646 (LRMs [are] consistently improving their ability to solve formal tasks, [but] bring inconsistent effects in terms of hallucination on fact-seeking tasks.”); see also Metz, C., & Weise, K. (2025, May 5); see also Zeff, M. (2025, April 18). OpenAI's new reasoning AI models hallucinate more. TechCrunch. https://techcrunch.com/2025/04/18/openais-new-reasoning-ai-models-hallucinate-more/ (“According to OpenAI’s internal tests, o3 and o4-mini. . . hallucinate more often than the company’s previous reasoning models. . .”)(Emphasis in original).
[5] AI is getting smarter, but hallucinations are getting worse. The New York Times. https://www.nytimes.com/2025/05/05/technology/ai-hallucinations-chatgpt-google.html (Emphasis added).
[6] ChatGPT says that I am “ambitious, reflective, strategic, and intentional. You pursue excellence while staying thoughtful about relationships, systems, and the broader impact of your actions. You value progress—both personal and societal—and you invest in it with care.”
[7] OpenAI. (2025, April 29). Sycophancy in GPT-4o: What happened and what we're doing about it. OpenAI Blog. https://openai.com/index/sycophancy-in-gpt-4o/
[8] Lewis, P. (2025, June 17). Can we trust generative AI to know and tell us when it doesn't know the answer? Ontario Tech University News. https://news.ontariotechu.ca/archives/2025/06/can-we-trust-generative-ai-to-know-and-tell-us-when-it-doesnt-know-the-answer.php (“research shows that AI systems are often overconfident in what they tell us, and are not able to judge their own ability very well.”).
[9] Von, T. (Host). (2025, July 24). Sam Altman [Podcast episode 599]. In This Past Weekend w/ Theo Von. https://open.spotify.com/episode/272maKnMzjm0Sb4bDqzZ2y
[10] Werth, T. B. (2025, June 5). Court orders OpenAI to save all ChatGPT chats. Mashable. https://mashable.com/article/court-orders-openai-to-save-all-chatgpt-chats
[11] Deleting discoverable evidence can result in sanctions for spoliation of evidence.
[12] Stokel-Walker, C. (2025, July 31). Exclusive: Google is indexing ChatGPT conversations, potentially exposing sensitive user data. Fast Company. https://www.fastcompany.com/91376687/google-indexing-chatgpt-conversations
[13] Shanklin, W. (2025, Aug. 1). OpenAI is removing ChatGPT conversations from Google. Engadget. https://www.engadget.com/ai/openai-is-removing-chatgpt-conversations-from-google-194735704.html
[14] Tech Desk (2025, Aug. 3) ChatGPT Privacy Breach Exposes User Chats via Wayback Machine: Urgent Risks Revealed. ZoomBangla. https://inews.zoombangla.com/chatgpt-privacy-breach-wayback-machine-exposure/
[15] Patrice, J. (2025, July 23) Partner Who Wrote About AI Ethics, Fired For Citing Fake AI Cases. Above the Law. https://abovethelaw.com/2025/07/partner-who-wrote-about-ai-ethics-fired-for-citing-fake-ai-cases/
[16] BioneX argued that “FAR clause 52.219-4 operates by default.” The regulation it includes language that the clause “shall” be inserted into solicitations like this one and it was left out. GAO said that the dismissal was based on failure to provide a "detailed statement of the legal and factual grounds for the protest."
[View source.]