Thursday, 25 July 2024

NZ Fair Digital News Bargaining Bill: "The better underlying question seems to be why anyone thinks there's a problem here to be solved."


"David Harvey reports that AI scraping could wind up being part of the revised NZ Fair Digital News Bargaining Bill. ....
    "He [writes at length about] technical elements on whether the definitions work and whatnot.
    "The better underlying question seems to be why anyone thinks there's a problem here to be solved.
    "It's simple for a website to restrict against scraping. It would similarly be simple for a news site to licence its content for AI training, if anyone wanted to pay them enough to allow it. There is no obvious reason government needs to be involved in any of this."

~ Eric Crampton from his post 'Fun antitrust application'

4 comments:

Duncan Bayne said...

Actually, as someone who is professionally involved in licensing content to AI companies (for medical research ... it's a long story): there is a reason for Government to get involved.

Much of the content being scraped by AI is not licensed in such a way that provides for the training of LLMs. If LLMs constitute a derived work (derived from the scraped content), as I believe they do, this is a pretty blatant copyright violation.

Microsoft's AI CEO (yes that's a thing), Mustafa Suleyman, has stated that he doesn't think there's anything wrong with this:

I think that with respect to content that’s already on the open web, the social contract of that content since the ‘90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been “freeware,” if you like, that’s been the understanding.

Needless to say that's not a popular opinion with copyright holders. It's also an odd position for an executive in a company that has historically been fiercely protective of its intellectual property.

What the Government needs to do is to enforce existing laws around intellectual property, as opposed to letting AI companies simply ignore them. It also needs to resolve the question of whether training an LLM constitutes creating a derivative work of the training material.

The last action is happening, through several court cases that will be very important in terms of the precedents that they set. As far as I can tell, though, most Governments are largely ignoring the first one.

(Personal note: I'm generally skeptical of the philosophical underpinnings of intellectual property. But if you have intellectual property and an economy based in part upon them, you don't get a free pass to ignore the laws just because you're training AI models).

Peter Cresswell said...

@Duncan, you make a good point. I'm *not* skeptical about intellectual property, and I should have made this point in a comment on the post. So-called AI is based on intellectual theft — sucking up other's intellectual content to regurgitate it, unattributed, without actual thought. Mr Suleyman is at least honest, or partially, that that's what they're doing.

So I do think there's a role for govt to protect IP stolen by AI. But I don't think that's what the bill does.

And although Eric is an IP-atheist, he does point out that there are mechanisms already in place by some organisations (Reddit, for example) to bar others from scraping, or at least searching content.

Duncan Bayne said...

Yes, agreed - I should have made it clear that although I think there is a (large) role for legitimate Government action in this area, this Bill isn't it.

Re. mechanisms, yes. But (a) they post-date the creation of many training sets, (b) aren't being respected by all AI companies, and (c) require positive action on the part of organisations to protect their intellectual property (i.e. they make scraping opt-out, and cost time and money to implement).

Re. AI not actually being intelligent, the very best description I've read of how LLMs work is Jamie Zawinski's phrase "spicy autocomplete". It's literally what they do: probabilistic token generation. And it's all they do.

But! That can still be useful, in the right circumstances. But it's woefully over-hyped and misunderstood by the public at large.

Duncan Bayne said...

Oh, and - some AI companies are so rude ( term of art: http://catb.org/jargon/html/R/rude.html ) that they cause downtime for site hosts:

https://chaos.social/@leah/112871670828981320