Can I ask what that means? The bots'll waste their time plowing through a 10gb file of nothing? Would that screw up the databases the bots are building, or just delay them? Also curious if that causes more energy consumption than would otherwise happen? Apologies for the overload of ?s and thanks!!!
To train model you take 10k pics of blue jays and tell model it’s a folder entirely of pics with blue jays. It scans and looks for common denominator in all pics and calls that a blue jay. If some pics don’t have a blue jay the model accuracy will drop.
The model can likely be retrained from the point of Introduction of bad data once discovered. It’s also likely to be a war between scraping methods and detecting scraping for the redirect. It would be more energy in context of you wasted hours of ai processing that are worthless & they need to redo
It depends how far it gets in the process to determine impact. If it’s not discovered and millions of sites are feeding it garbage data then the ai model would be ruined. Something would have to go through the data to find the garbage or scrape all over again for good train data.