Greg Pak's avatar

Greg Pak

@gregpak.bsky.social

Can I ask what that means? The bots'll waste their time plowing through a 10gb file of nothing? Would that screw up the databases the bots are building, or just delay them? Also curious if that causes more energy consumption than would otherwise happen? Apologies for the overload of ?s and thanks!!!

3 replies 0 reposts 6 likes


Ch0c0L4t3m1Lk's avatar Ch0c0L4t3m1Lk @ch0c0l4t3m1lk.bsky.social
[ View ]

To train model you take 10k pics of blue jays and tell model it’s a folder entirely of pics with blue jays. It scans and looks for common denominator in all pics and calls that a blue jay. If some pics don’t have a blue jay the model accuracy will drop.

0 replies 0 reposts 0 likes


Ch0c0L4t3m1Lk's avatar Ch0c0L4t3m1Lk @ch0c0l4t3m1lk.bsky.social
[ View ]

The model can likely be retrained from the point of Introduction of bad data once discovered. It’s also likely to be a war between scraping methods and detecting scraping for the redirect. It would be more energy in context of you wasted hours of ai processing that are worthless & they need to redo

0 replies 0 reposts 0 likes


Ch0c0L4t3m1Lk's avatar Ch0c0L4t3m1Lk @ch0c0l4t3m1lk.bsky.social
[ View ]

It depends how far it gets in the process to determine impact. If it’s not discovered and millions of sites are feeding it garbage data then the ai model would be ruined. Something would have to go through the data to find the garbage or scrape all over again for good train data.

0 replies 0 reposts 0 likes