AI and copyright

4 min read

Technology moves faster than the law. Always. And especially, today, in relation to the rapid rise of Artificial Intelligence. One issue that is likely to prove highly problematic is the interaction between AI applications and the law of copyright.

Reduced to its absolute basics, copyright law in Australia works like this: a person who holds copyright in a literary, dramatic or musical work (either as the author or because they have acquired the copyright) holds the exclusive right to reproduce that work in a material form, or make an adaptation of that work. Anyone who reproduces a copyright work without permission infringes copyright, and may be liable to pay damages to the owner of the copyright.

This creates (at least) two issues for AI software, one concerning its inputs and the other concerning its output.

So far as the inputs are concerned, AI software operates by recognising and reproducing patterns in data. Entering that data into the application is a process known as “training”, and there’s a serious question as to whether that training amounts to a breach of copyright.

There’s no law relating directly to this – the law, remember, is much slower than the technology, and by the time it gets around to addressing this problem, there’ll be a new one. So the question needs to be approached by applying principles that exist in the present law, which has just about caught up with the idea of photocopying: does “training” an AI application amount to reproduction in a material form or an adaptation?

The answer to that question is likely to depend on the precise process used by each application. Many AI systems are trained by “scraping” data from the internet, and inevitably much of what they gather is material protected by copyright. In many cases, that process involves making a digital copy of the “scraped” material, and a digital copy is a “reproduction in a material form”. Since the amount of material scraped from the internet is so vast, it’s a practical impossibility to gather permissions from all the copyright holders, and even if it were possible, the cost of paying for that content would render the technology commercially unviable. So there’s a powerful argument that a breach of copyright occurs when an AI application is trained.

To take one example, which attracted a good deal of media attention, ChatGPT was recently invited to write a Nick Cave song. Presumably, the result was a song in which someone’s lover betrayed them and then someone died, slowly, in the dark – anyway, Nick Cave announced that it “sucked”. But the point is that, in order to write a Nick Cave song, ChatGPT needed to know what a Nick Cave song was like, and to know that it had to be trained with Nick Cave songs, and if that process involved the making of digital copies of Nick Cave’s lyrics or music, then Cave’s copyright may have been infringed.

The usual industry response to this is to rely on the defence of “fair use” or “fair dealing”. The Copyright Act sets out a variety of uses of copyright material that are deemed not to infringe copyright; for example, a student can photocopy a specified portion of a book for the purpose of study (though not the whole book); a reviewer can quote from a work for the purpose of criticism; a satirist can parody a work without making an unauthorised adaptation. But it’s difficult to see how any of the statutory categories apply to what AI software does. Perhaps this is one reason why Shutterstock, which plans to sell AI-generated images, has said that it will set up a fund to compensate artists whose works are used to train its software.

For the time being, there’s no imminent change to the law, and the point remained untested before the courts, so all we can do is watch this space.

Nor are the outputs of AI free from copyright complication. The first question is, who owns the product? A machine can’t own copyright, and if a person has used a machine to generate a work, can he or she be said to be its author? Not a great deal turns on this, however, until and unless someone seeks to assert copyright over an entirely AI-produced work.

More dangerous territory is the risk that an AI output will infringe copyright. A copyright infringement does not need to be intentional: famously, George Harrison’s soporific (yet massive) hit, My Sweet Lord was held to infringe the copyright of the Chiffons’ 1963 song, He’s So Fine, even though the judge believed that Harrison could not recall the earlier song and that the plagiarism was “subconscious”. The test isn’t whether an attempt has been made to copy: it’s whether the copyright work has been substantially reproduced. Since AI software is trained by reproducing existing material, there’s a real risk that its outputs will substantially reproduce its inputs. Anecdotal evidence suggests that some applications substantially reproduce large amounts of copyright text available on the internet. And, if the AI software hasn’t told you what its inputs were, you may have no way of checking whether the work they produce infringes copyright.

Again, all this remains to be tested in court. But, for now, users of AI should be very cautious about publishing its work as their own, unless they can be comfortably certain that no infringement will occur.