AI companies, including OpenAI, face numerous copyright infringement cases primarily stemming from their use of pirated databases like LibGen for AI model training. Internal communications, including emails and Slack messages, have revealed discussions about deleting LibGen data, leading plaintiffs to request access to legal communications under a “crime-fraud” exemption. This could expose OpenAI to claims of “willful infringement,” increasing potential penalties to $150,000 per work. OpenAI counters this by asserting attorney-client privilege regarding the decision-making process behind data removal. Meanwhile, prior cases have seen similar internal communications from other tech giants like Meta reveal awareness of pirated data usage. A recent ruling in California found Anthropic did not infringe copyright only for books it legally acquired, while a separate trial is set for works derived from pirated datasets. In June, Anthropic settled for $1.5 billion, anticipating further costs from ongoing class-action claims related to copyright issues.
Source link