Intellectual Property Rights of Artificial Intelligence Models are Protected and Used by Academics, Scholars, and Academics: A Free Market Perspective
New laws will make it more likely that ownership and transparency of the data used to train generative AI models will be established. Meanwhile, there are a few steps that researchers can take to protect their intellectual property (IP) and safeguard sensitive data.
The paper argues for a “free market” approach in which data originators and artificial intelligence companies negotiate directly. Other guidelines are not that easy to understand. For example, the alliance suggests five potential compensation structures to make sure creators and rights holders are paid appropriately for their data. These include a subscription-based model, “usage-based licensing” (in which fees are paid per use), and “outcome-based” licensing, in which royalties are tied to profit. Bestall says that these could work for a lot of things.
Academics today have little recourse in directing how their data are used or having them ‘unlearnt’ by existing AI models6. It is more challenging to litigate the misuse of published papers than it is for a piece of music or a work of art. Zhao says that most opt-out policies “are at best a hope and a dream”, and many researchers don’t even own the rights to their creative output, having signed them over to institutions or publishers that in turn can enter partnerships with AI companies seeking to use their corpus to train new models and create products that can be marketed back to academics.
It’s only now that international policy is catching up to the new technology, and clear answers to questions such as who owns copyright and what kind of data can be used in their models, will be years away. “We are now in this period where there are very fast technological developments, but the legislation is lagging,” says Christophe Geiger, a legal scholar at Luiss Guido Carli University in Rome. “The challenge is how we establish a legal framework that will not disincentivize progress, but still take care of our human rights.”
Tudorache sees the act as an acknowledgement of a new reality in which AI is here to stay. “We’ve had many other industrial revolutions in the history of mankind, and they all profoundly affected different sectors of the economy and society at large, but I think none of them have had the deep transformative effect that I think AI is going to have,” he says.
Academics often sign their IP over to institutions or publishers, giving them less leverage in deciding how their data are used. But Christopher Cornelison, the director of IP development at Kennesaw State University in Georgia, says it’s worth starting a conversation with your institution or publisher if you have concerns. When it seems likely that there will be noncompliance with the licensing agreement, these entities should be able to pursue litigation. “We certainly don’t want an adversarial relationship with our faculty, and the expectation is that we’re working towards a common goal,” he says.
Scientists can now detect if a visual product, such as images or graphics, has been used in a training set, and they have developed tools that can poison data by breaking when trained. Ben Zhao is a computer-security researcher at the University of Chicago and says they teach the models that a cow is something with 4 wheels and a fender. Nightshade is one of the tools that, when implemented correctly, can enable an artificial intelligence model to associate a different type of image with a corrupted pattern. Unfortunately, there are not yet similar tools for poisoning writing.
Specialists broadly agree that it’s nearly impossible to completely shield your data from web scrapers, tools that extract data from the Internet. Adding an extra layer of oversight is possible with steps like hosting data locally on a private server or making resources open and available, but only by request. Several companies, including OpenAI, Microsoft and IBM, allow customers to create their own chatbots, trained on their own data, that can be isolated in this way.
It might not seem like a bad idea to abstain from using Genai. For certain disciplines, especially those that involve sensitive data, giving it a miss is the more ethical option. There are still constraints on the use of models in health-care settings, because we do not have a good way of making them forget.
The publishers have brokered deals with the companies. Taylor & Francis, for example, has a US$10-million agreement with Microsoft. The Cambridge University Press (CUP) has not yet entered any partnerships, but is developing policies that will offer an ‘opt-in’ agreement to authors, who will receive remuneration. The company’s managing director of academic publishing, who is based in Oxford, UK, said in a statement to The Bookseller that there would be more plans for the company.
Representatives of the publishers Springer Nature, the American Association for the Advancement of Science (which publishes the Science family of journals), PLOS and Elsevier say they have not entered such licensing agreements — although some, including those for the Science journals, Springer Nature and PLOS, noted that the journals do disclose the use of AI in editing and peer review and to check for plagiarism. The journal is editorially independent of its publisher.
Two studies this year found evidence that a lot of people were using artificial intelligence to write scientific manuscripts and peer-reviewed comments, even as publishers try to put a limit on its use. Legal scholars and researchers who spoke to Nature said that when academics use a bot, they open themselves to risks that they might not know about. “People who are using these models have no idea what they’re really capable of, and I wish they’d take protecting themselves and their data more seriously,” says Ben Zhao, a computer-security researcher at the University of Chicago in Illinois who develops tools to shield creative work, such as art and photography, from being scraped or mimicked by AI.
When contacted for comment, an OpenAI spokesperson said the company was looking into ways to improve the opt-out process. The company believes that artificial intelligence gives huge benefits to the world of science. Some academics don’t want their publicly available works used to help teach our artificial intelligence, which is why we offer ways for them to opt out. We are also exploring what other tools may be useful.
Biodiversity as a tool to study biodiversity: Timothée Poisot’s work at the Montreal Ecologist’s lab
Chatbots are powerful in part because they have learnt from nearly all the information on the Internet — obtained through licensing agreements with publishers such as the Associated Press and social-media platforms including Reddit, or through broad trawls of freely accessible content — and they excel at identifying patterns in mountains of data. For example, the GPT-3.5 model, which underlies one version of ChatGPT, was trained on roughly 300 billion words, which it uses to create strings of text on the basis of predictive algorithms.
“There’s an expectation that the research and synthesis is being done transparently, but if we start outsourcing those processes to an AI, there’s no way to know who did what and where the information is coming from and who should be credited,” he says.
Timothée Poisot, a computational ecologist at the University of Montreal in Canada, has made a successful career out of studying the world’s biodiversity. Poisot hopes that his work will be considered for inclusion in the UN Convention on Biological Diversity when it is given the green light later this year. “Every piece of science we produce that is looked at by policymakers and stakeholders is both exciting and a little terrifying, since there are real stakes to it,” he says.