Having worked with licensing and commercial arts for the better part of my career, then digital innovation in the latter half, data provenance and IP is the major issue for me. LLMs and diffusion models are both stores of expressive content with no capability for compliance for privacy nor property law, by design (no machine unlearning) and direct market replacement intent — again, by design — distributed with terms of use for supposed research use that are predictably and demonstrably broken instantly upon release. Or worse, with permissive commercial use despite not acquiring any license for the underlying property. All with proven, massive and ongoing market harm to the rights holders of the underlying works.

This fails the three-step test and US four fair use factors by default, and is patently absurd even at face value: how can property, displayed for human consumption, in any way shape or form be seen as fair game for unlicensed for-profit use?

Moral rights to informed consent and attribution were violated. Exclusive commercial exploitation rights were violated. These are foundational rights across 193 countries.

It’s the intellectual property rights heist of the century, and every professional rights holder organization protests.

Expand full comment

It is interesting that both the Biden Administration's EO and the EU's AI Act focus on computing power as a risk threshold. For now, that mostly points to closed foundation models being "riskier," but with improvements to training efficiency and parameter size we're likely going to see very advanced open model fall below those thresholds.

Expand full comment

Drs Kapoor and Nrayanan, apologies if I missed it, but have you published a position and plan on the issue of Substack platforming and monetizing nazis?

Expand full comment