Technology TikTok owner ‘scraping’ UK news sites to train ChatGPT...

TikTok owner ‘scraping’ UK news sites to train ChatGPT rival


- Advertisment -

The Chinese owner of TikTok has been accused of using UK news sites to train up its rival to ChatGPT without permission or fair payment.

Publishers including The Guardian, Daily Mail and The Telegraph are believed to have been targeted by a bot operated by the Beijing-based tech giant Bytedance.

The company has said its bot, dubbed Bytespider, has been deployed for “search optimisation” purposes.

However, news organisations are concerned that their articles are being used without permission to train chatbots and have raised concerns about copyright violations.

Publishers raised further concerns over a lack of transparency around how Bytedance’s bot works, leaving them unable to block it.

Media outlets including the BBC, Guardian, New York Times and CNN have blocked ChatGPT-maker OpenAI from trawling their sites because of copyright concerns.

This can be done through the so-called “robots.txt” file, which instructs web crawlers which part of a site they are allowed to visit.

While it is believed that Bytespider abides by robots.txt files, the exact code required to activate the block is not known.

Owen Meredith, chief executive of the News Media Association, said: “This is yet further demonstration of how big tech firms ride roughshod over creators’ and rights holders’ IP to take their content and extract value without permission, notification or transparency.

“No one should have to accept the wholesale theft of their content this way.”

Bytedance is reportedly gearing up for its own foray into artificial intelligence (AI) after similar moves by companies including OpenAI, Google and Microsoft.

The Chinese tech giant is said to be developing an open platform that will allow users to create their own chatbots.

In its submission to a House of Lords inquiry on AI, The Guardian said the lack of transparency highlighted why publishers should be able to opt in to web scraping, rather than being forced to opt out.

Industry sources said similar suspicious bot activity had been reported by the Independent, as well as local news publishers including National World and Tindle.

The rapid emergence of AI has sparked alarm among news outlets and other creative organisations amid growing evidence that tech giants have used intellectual property without permission.

The Daily Mail is currently gearing up for a legal battle with Google over claims the company used hundreds of thousands of its online news stories to train the Bard chatbot.

Tech companies have insisted that their use of copyrighted material is justified under fair use clauses.

But creative organisations have hit back at these claims, arguing that their intellectual property is being used for commercial gain.

The row deepened after executives at Google appeared to suggest that they were entitled to make use of content as long as it was not behind a paywall.

Free-to-access publishers such as The Guardian have warned this approach puts the principles of the open web at risk.

The Guardian added: “We should be very clear that even if these datasets were scraped on the basis of non-profit exploitations, the reality is that they have been used to advantage some of the most wealthy and powerful technology businesses that have ever existed.”

A spokesman for Bytedance declined to comment.


Please enter your comment!
Please enter your name here

Latest news

- Advertisement -

Must read

Lady Gaga and Cardi B Meet at the Grammys

What was expected of her was the same thing...
- Advertisement -

You might also likeRELATED
Recommended to you