An update to Google’s privacy policy suggests that the entire public internet is fair game for it’s AI projects. If Google can read your words, assume they belong to the company now, and expect that they’re nesting somewhere in the bowels of a chatbot.
There has always been a symbiotic relationship between search engines and content site owners. The deal being “l (site owner) will let you index my site to make it easy for people to find my content. In exchange, you (Google) can make money by building user profiles and selling targeted advertising.” Conceptually this is no different, except that Google is now using the data to build new applications and businesses - AI rather than ads.
I believe that Google does respect robots.txt (though these need to be well specified and located), so it’s relatively easy for site owners to opt out of being indexed. Whether being indexed should be on an opt-out basis (as opposed to opt-in basis) in the first place is perhaps the key question, one I’d argue should have been discussed 20, 30 years ago.
or they could maliciously opt-in by including lots of hidden garbage text to poison the scrapers