On robot judges: part 5 - in which we consider magic bricks.

Back to Viewpoints

On robot judges: part 5 - in which we consider magic bricks.

Published on:

5 Aug 2023

min read

#notlegaladvice

#LLM

#AI

#notlegaladvice

This article is part of a series. View related content below:

Photo credit: Madison Inouye; https://www.pexels.com/photo/photo-of-brickwall-1101125

On #robot judges: part 5 - in which we consider magic bricks.

In part 4,¹ I discussed how a Large Language Model (#LLM) generates output, and acknowledged that anecdotally, LLMs often generate accurate and useful output.

So isn't the proof in the pudding? What's stopping us from using LLM outputs in legal practice?

---

Let's try a thought experiment.

Suppose we want to build a house with bricks.

A mysterious figure² appears and gives us a brick-generating machine. When we press a button on the machine, a magic brick appears - no other input necessary.

The magic bricks look and feel good, and the gift-giver's intentions are purely benevolent: they believe that the machine does nothing apart from generating good bricks.

But.

For every hundred magic bricks generated, one brick is dodgy: one day, it will suddenly shatter without warning.

However, such dodgy bricks can be identified by a brick expert upon inspection.

So should we use the brick machine?

I suggest that this would depend on, among other things:
a) what kind of project the magic bricks are for;
b) the consequences if a magic brick fails, and whether there are redundancies and safety measures in place;
c) the cost to engage a brick expert to inspect each magic brick; and
d) the cost of ordinary bricks.

---

As you would have guessed by now, I'm not really talking about magic bricks.

I'm talking about LLM hallucinations, of course.³

At present, hallucinations are unavoidable. And more insidiously, we may not even know that the output is a hallucination.

But in our thought experiment, I did not suggest that we should never use the machine.

What I am suggesting, however, is that before we use the machine / LLM, we should consider whether:

a) the project that we are using the machine / LLM for can tolerate errors. For example, an unrepresented litigant using an LLM to generate a letter of demand may be able to live with a few errors in the letter. But for a practicing lawyer using an LLM to generate written submissions to be filed in Court,⁴ the level of tolerance for errors drops to zero; and

b) there is a system in place to catch errors - i.e. the brick expert. If an LLM is used to generate written submissions for Court, will the output be reviewed for errors by an experienced practicing lawyer who is familiar with this area of law?

---

So if we can eliminate hallucinations, does that mean that we can freely use LLM outputs without (a) worrying about errors, and (b) human review?

Sam Altman, the CEO of OpenAI, asserts that "we⁵ will get the hallucination problem to a much, much better place" within 1.5 or 2 years, and that by then, "we won’t still talk about these".⁶

But I suggest that getting the problem to a "much better place" is not the same as saying that it can, or will, be eliminated.

In part 6, we'll explore whether we can ever be sure that an LLM no longer hallucinates.

Disclaimer:

The content of this article is intended for informational and educational purposes only and does not constitute legal advice.

Footnotes:

¹ Part 1: https://www.linkedin.com/posts/khelvin-xu_robot-ai-llm-activity-7100325203108397056-Ghnn
Part 2: https://www.linkedin.com/posts/khelvin-xu_robot-llm-ai-activity-7102135406124548096-KPpB
Part 3: https://www.linkedin.com/posts/khelvin-xu_robot-llm-chatgpt-activity-7111997957616373760-vna5
Part 4: https://www.linkedin.com/posts/khelvin-xu_robot-llm-chatgpt-activity-7113371842815393792-2atP

² Say, an alien / fairy godmother / time traveller from the distant future.

³ In simple terms, LLMs often make up facts (and, when questioned, may even confidently assert that they are right). My colleague, Rajesh Sreenivasan, has a more direct (and controversial?) manner of referring to such inaccuracies.

I also make no comment as to the rate of hallucinations. It could be less, or more, than 1 in 100 outputs. But while a reduction in the hallucination rate may reduce the risks involved in using LLMs, and change what should be done to reduce risk, it does not change the underlying principle that so long as hallucinations continue to exist, there will be risks involved if LLM outputs are blindly used without human review and/or intervention.

Further reading on LLM hallucinations:

https://techcrunch.com/2023/09/04/are-language-models-doomed-to-always-hallucinate
https://machinelearningmastery.com/a-gentle-introduction-to-hallucinations-in-large-language-models
https://masterofcode.com/blog/hallucinations-in-llms-what-you-need-to-know-before-integration

⁴ Not that I condone this, mind you.

⁵ Although I'm not sure whether the "we" he's referring to is OpenAI, or humanity as a whole.

⁶ https://fortune.com/2023/08/01/can-ai-chatgpt-hallucinations-be-fixed-experts-doubt-altman-openai

Never miss a post

Share It On:

Featured Viewpoints

Tech

On AI, arbitration, and asset recovery (digital): part 4.¹

23 Jun 2025

min read

Trials and Tactics

On contract roles, careers, and choosing roles carefully.

16 Jun 2025

min read

Tech

On AI, arbitration, and asset recovery (digital): part 3.¹

2 Jun 2025

min read

Tech

On AI, arbitration, and asset recovery (digital): part 4.¹

23 Jun 2025

min read

Trials and Tactics

On contract roles, careers, and choosing roles carefully.

16 Jun 2025

min read

Tech

On AI, arbitration, and asset recovery (digital): part 4.¹

23 Jun 2025

min read

Trials and Tactics

On contract roles, careers, and choosing roles carefully.

16 Jun 2025

min read