California’s A.B. 412: A Bill That Could Crush Startups and Cement A Big Tech AI Monopoly.

Tea@programming.dev · 1 day ago

California’s A.B. 412: A Bill That Could Crush Startups and Cement A Big Tech AI Monopoly.

NarrativeBear@lemmy.world · 1 day ago

AI owned by billionaires is something this world does not need. These should be accessible for anyone’s use.

Libra00@lemmy.world · 1 day ago

Nah this is bullshit, and I’m shocked to see it coming from the EFF. If you can’t build your ML model without stealing other peoples’ work to do it, don’t fucking do it. The purpose of IP law is to ensure that people who create are compensated for their work, and I never thought in a million years I’d see the EFF arguing against the protection of people over business.

Zarxrax@lemmy.world · 1 day ago

AI often gets painted as people vs businesses, but that’s not necessarily what it is in many cases. The EFF is arguing for fair use, which is something that they have stood for as long as I can remember. As the article argues, the businesses creating AIs can easily abide by this law, it’s the little guys training things that would be impacted the most.

Alphane Moon@lemmy.world · edit-2 1 day ago

Let’s say someone spends a decade plus on a small niche blog. The blog has decent readership and even modicum of commercial engagement in its niche.

Should I be allowed to openly use all the data on the blog to develop an AI powered AIBlog 2000 service that enables people to quickly and easily make SEO-optimized spam blogs (it wouldn’t be marketed that way, but that’s what it is) on a variety of topics; including the topic of the niche blog mentioned above?

Am I not giving the EFF enough benefit of the doubt? Is this more of a unique scenario that ignores the benefits of EFF’s approach?

What am I missing here?

Even_Adder@lemmy.dbzer0.com · 1 day ago

The fair use doctrine allows you to do just that. The alternative would be someone being able to publish a book and then shutting anyone else out of publishing, discussing, or building on their ideas without them getting a kick-back.

Alphane Moon@lemmy.world · edit-2 1 day ago

Not a legal expert, but this use case doesn’t seem very fair. Copying the content for a journalism class or for critique makes logical sense. You don’t need know anything about the details of a given legal doctrine to understand this.

This is just a tech-enabled copying device.

I strongly disagree with your analogy. Anyone can set up a blog covering the exact same niche topic; you would not have to give any kickback to anyone or ask for permission.

Am I missing something here?

Even_Adder@lemmy.dbzer0.com · 1 day ago

We’re saying the same thing here. It’s just your characterization of gen AI as a “tech-enabled copying device” isn’t accurate. You should read this which breaks down how all this works.

Alphane Moon@lemmy.world · 24 hours ago

I agree with the high level socio-political commentary around sectoral bargaining and the discussion around the technical and social limitations of copyright law.

I still disagree with the notion that developing AIBlog 2000 SEO-optimized slop generator falls under fair use (in terms of principles, not necessarily legal doctrine).

Academics programmatically going through the blog contents to analyze something about how perceptions of the niche topics changed. That sounds reasonable.

Someone creating a commercial review aggregation service that scraped the blog to find reviews and even includes review snippets (with links to the source) and metadata. Sure.

Spambot 3000, where the only goal is to leverage your work to shit out tech-enabled copies for monetization does not seem like fair use or even beneficial for broader society.

Perhaps the first two examples are not possible without the third one and we have to tolerate Spambot 3000 on that basis, but that’s not the argument that was provided in this thread.

Even_Adder@lemmy.dbzer0.com · 23 hours ago

One of the provisions of fair use is the effects on the market. If your spambot is really shitting up the place, you may very well run afoul of the doctrine.

AItoothbrush@lemmy.zip · 20 hours ago

Who cooked so hard on some random articles thumbnail?

Alphane Moon@lemmy.world · edit-2 18 hours ago

Can’t speak for the relative merits of the bill. To be honest it doesn’t really matter, since it’s a bad idea to use any American services, be it from big tech or from startups.

However, I do have issues with the characterization of small startups leveraging “AI” in the article. Vast majority of startups add “AI powered” both as consumer marketing and a fundraising method. Even if they do actually use ML powered features, it is likely these features would simply be part of their package and marketed something along the lines of “automated recommendation for configuring [X]”. Many such features cannot even leverage public works since startups tend to focus on more niche use use cases of ML tech since it’s difficult to competing around something like LLMs.

Something about their framing of startups just sounds off.

Zarxrax@lemmy.world · 1 day ago

Anyone have a link to the text of the bill?

Ebby@lemmy.ssba.com · edit-2 1 day ago

Wow, EFF. You’ve been a beacon of light in countless fights, but I did a doubletake on this article. Are you really implying that simply being on the internet is subject to business free-for-all?

I had to have read that wrong. It is absolutely the responsibility of any creative business to track and audit all copyrighted works used in deliverables.

AI, being the business of scooping up massive amounts of data, should absolutely have some sort of metadata log referencing copyrighted works. This is not the burden of small business, but standard practice for AI.

*AI is like reading and should be fair use

No, it certainly is not. Creating a compressed efficient database for search engines to reference and point users is fair use. Using that database to generate new work is not. AI is inherently generative.

BumpingFuglies@lemmy.zip · 1 day ago

Spoken like someone who either didn’t read the article or has a deep misunderstanding of what AI training is.

Ebby@lemmy.ssba.com · edit-2 1 day ago

Enlighten me. I hope I read it wrong.

It sounds like the EFF is advocating stripping/ignoring copyright information (as is currently done) when generating LLM’s to ease burden of small startups tracking down copyright owners. Something I had to do in productions and yeah, it sucked, but it’s how it works. (Radio is a tad different)

BumpingFuglies@lemmy.zip · 5 hours ago

It’s saying that copyright law doesn’t apply to AI training, because none of the data is copied. It’s more akin to a person reading an impossible amount at an impossible speed, then using what they read as inspiration for their own writing. Sure, you could ask an LLM trained on, say, Edgar Allen Poe’s works to recite the entirety of The Raven, but it can only “recall” similarly to a human, and will have just as many mistakes (probably more, really) in its recitation as a human would.

Even_Adder@lemmy.dbzer0.com · 1 day ago

I recommend reading this article by Cory Doctorow, and this one by Katherine Klosek, the director of information policy and federal relations at the Association of Research Libraries.

Ebby@lemmy.ssba.com · 1 day ago

The first article has some good points taken very literally. I see how they arrive at some conclusions. They break it down step by step very well. Copyright is merky as hell, I’ll give them that, but the final generated product is what’s important in court.

The second paper, while well written, is more of a press piece. But they do touch on one important part relevant to this conversation:

The LCA principles also make the careful and critical distinction between input to train an LLM, and output—which could potentially be infringing if it is substantially similar to an original expressive work.

This is important because a prompt “create a picture of ____ in the style of _____” can absolutely generate output from specific sampled copyright material, which courts have required royalty payments in the past. An LLM can also sample a voice of a voice actor so accurately as to be confused with the real thing. There have been Union strikes over this.

All in all, this is new territory, part of the fun of evolving laws. If you remove the generative part of AI, would that be enough?

Even_Adder@lemmy.dbzer0.com · 1 day ago

The funny part is most of the headlines want you to believe that using things without permission is somehow against copyright. When in reality, fair use is a part of copyright law, and the reason our discourse isn’t wholly controlled by mega-corporations and the rich. It’s sad watching people desperately trying to become the kind of system they’re against.

Ebby@lemmy.ssba.com · edit-2 20 hours ago

Fair use is based on a four-factor analysis that considers the purpose of the use, the nature of the copyrighted work, the amount used, and the effect on the market for the original work.

It is ambiguous, and limited, tested on a case-by-case basis which makes this time in Copyright so interesting.

Amoxtli@thelemmy.club · 1 day ago

California needs rich people to tax.