Hacker News

39 Comments:
mgamache said 14 days ago:

Don't mean to be snarky, but this is not how easy it is to get ad revenue. It's how easy it is to get approved for ad networks. She didn't even get adsense. I find it completely unremarkable that anyone could set up a non-adult site that has human generated content and get ads placed. The traffic needed for real ad revenue is a different story. I bet that site gets close to zero traffic (not even enough to cover hosting). SEO (black-hat or whatever) is the trick IMO not getting ad revenue. Plagiarized new domains get no weight in the Google engine.

CM30 said 14 days ago:

Honestly, it'd be pretty interesting to see an article like this which continues after the 'approved for ad networks' part and shows how such a site could rank in Google, do well on social media sites, etc.

Could be interesting to see how scammers are doing that, and lead to some potentially interesting insights about black hat SEO, social media marketing, targeted ads, etc.

Because yeah, as you said, getting approved by an ad network is only part of the story, and not very much of it at that.

berbec said 14 days ago:

It may be a low bar, but being able to automate this (scraping website ansible playbook?), makes the effort required as near-nil. They only have to clear $50 to pay for ALOT of domains and hosting.

notahacker said 14 days ago:

Sure but you need thousands of legitimate-seeming pageviews to get that $50 back, and the networks - even or especially the bottom tier ones - are likely to be hotter on click fraud than scraped content.

mgamache said 13 days ago:

You would have to get ~16,000 pageviews to make $50. (assuming $3.00 CPM -- which would be low for adsense, but not for these second tier networks).

luckylion said 13 days ago:

And 16,000 page views without unique content and links might happen, but most likely not within a year or five.

If you're lucky, you trigger something in Google's black box and they rank your site better than others at the same level, but you'll still only do long tail, and even on long tail, you'll compete with the original source of the article, which has a billion links pointing to its domain. Since you'll also need to go for quantity, you'll have a giant amount of pages as well, which will not help you even with niche rankings.

I doubt that the site would pull in 10 actual, human visitors per day on average with just scraped content.

terrycody said 13 days ago:

+1

its really not easy to get the ad revenue, scraped websites can't rank good and get enough visitors, of course one can make fraud clicks system, but if one can do this level, he may easily find more interesting things, but not peanut $$

amelius said 13 days ago:

If the pirate bay can run on ad money, then so can plagiarized news sites, I suppose.

mrtksn said 13 days ago:

It’s a dirty business, no cost is too great for eyeballs.

In Turkish whatever, you search the first few pages of results is from the Turkish largest news outlets because the SEO’ed for everything and Google doesn’t care.

Do you want to learn how to renew your driver license? Good luck with that because your search results will bring you a wall of text articles that are almost the same for every search term.

“Lately people started to ask themselves how to renew their driver's license. But do they consider the risks of renewing drivers licenses? Experts agree that renewing the driver's license can be a complicated thing. Now strap on and get ready to learn how to renew your driver's license”

Think to have pages like that on CNN, BBC and others. They are the top result for so many searches.

Plagiarism of news, on the other hand, is more nuanced IMHO. There’s nothing stopping you to say “NBC reports that” anyway. As per the article, you can not use their assets but you can create or even generate articles about the news based on the news.

The ad business is dirty. I’m almost proud of blocking ads.

mprev said 13 days ago:

In the U.K. it’s supermarket opening times, especially near holidays.

Google for “Aldi opening times Easter Sunday” and you’ll get articles from the lower quality newspaper websites.

It’s pathetic.

mrtksn said 13 days ago:

Oh, definitely. Especially in these pandemic days that was something that I tried and failed. On the Turkish web apparently the news outlets gave up any hope of respect and now every single one of them is doing it. The biggest ones, the leftie ones, the right-wing ones, the cushy with the government ones. All of them.

No way Google isn't aware of this, there's a local Google office in Turkey, they have a large presence and full Turkish language support on most of the products.

Maybe it's simply part of the business model now. If a supermarket wants people to find their opening times, maybe they should buy an ad placement. There's no money in the high-quality organic search results I guess.

gokhan said 13 days ago:

I'm watching this exact problem for SEO in Turkey and I can say that it was in full-force way before pandemic. Google doesn't care. Instead, they're busy flagging pages discussing "penisilin (penicillin in Turkish) application for kids" as AdSense Policy Violation since the page contains "penis".

propter_hoc said 14 days ago:

I suspect one of the hard parts of this for Google is that many news sites legitimately publish the same articles because of wire services and correspondence arrangements, like AP and Reuters. Hard to tell whether the new site is plagiarizing or syndicating.

wobbly_bush said 13 days ago:

Don't they always include the source as AP or Reuters in the body of text somewhere?

ryanwaggoner said 13 days ago:

You can copy that text too...

bufferoverflow said 14 days ago:

Doesn't say how much she made from it. Guess: very close to zero, if not zero.

robocat said 13 days ago:

She did say: “I didn’t want to be taking ad revenue from legitimate advertisers, so I only briefly activated advertisements from the partners to see what surfaced and to take a few screenshots.”

benburleson said 14 days ago:

Including hosting? I'm sure it's actually in red.

xwdv said 13 days ago:

If you include the hit to your professional reputation from actually plagiarizing a news site for revenue and getting blacklisted from the industry, then what did it cost? Everything.

waheoo said 13 days ago:

Or you get placed as a cto with a fat raise.

xwdv said 12 days ago:

My dream job.

Supermancho said 13 days ago:

I'm not. I knew a developer who, in his spare time, developed a clever scraper. He scraped the top stories and results from Google, then scraped similar content based on Google's own ranking, then submitted that content to his own aggregator sites (all resolving to the same server). He ran ads on it. He got plenty of traffic and was net <$500/month in 2006.

It's not that expensive to run a site and the right advertising partners (cough Taboola cough) pay nicely.

NeutronStar said 12 days ago:

Yeah, getting ad revenue for this kind of websites in 2006 was possible, now good luck.

kayoone said 13 days ago:

i remember in 2004 or so when an agency i worked for had a wikipedia clone running with adsense and tons of SEO which made 20k per month and basically kept the company afloat. I was a young junior dev and while i was impressed by it, it never felt right to me (which it obviously wasn't in many ways). As far as i remember this only worked for about a year at best until Google penalised those sites more and more.

gitgud said 13 days ago:

Unethical Continuation of This Idea:

Scrape existing news sites, and use machine learning to paraphrase everything so Google doesn't detect plagiarism.

zaphods3rdhead said 13 days ago:

US military has already been working on this for ~ a decade. There was a contract out of Redstone Arsenal where they writing "story spinners" to scrape and re-word war-time propaganda.

hedora said 13 days ago:

I wonder if mechanical turk is still cheap enough to have humans paraphrase and insert seo keywords.

In this new dystopia, generating content for the machines to read could be a decent job for a human.

randomgoose said 13 days ago:

How easy is it to get distribution over social media sites like facebook or Twitter? The distribution costs would be close to zero there right?

peter_d_sherman said 13 days ago:

>"It all underscores the fact that the ad tech space is so convoluted, it’s easy to make money from legitimate advertisers just by setting up a web page.

That means there’s significant incentive to create sites with not just with low-quality clickbait or A.I.-generated nonsense, but sites filled with outright plagiarized content."

ipiz0618 said 13 days ago:

It's really easy to get any content on the internet but really hard to verify if they are plagiarized. Basically anyone can place some ads on their websites, but if the site posts nothing but copied content, I doubt if it will last.

kevsim said 14 days ago:

> These firms mostly sold “popunder” ads, which pop up a new link in a browser tab when you click something

Who is buying ads on these networks? There cannot possibly be any returns can there?

is_true said 14 days ago:

It might be a "victim filter", some scams are created to avoid wasting time in people smart enough not to fall for the scam in the following steps.

9HZZRfNlpR said 13 days ago:

Often it's just affiliate fraud, you load casinos, aliexpress and what not affiliate links as be hope for the payout. That is why they redirect like crazy in order to hide the tracks since the sites offering affiliate services don't want it.

nillium said 14 days ago:

We're working on another way for disseminating news. It might make plagiarism a little more difficult, while also working a little better for our audience: https://blog.nillium.com/what-can-napster-teach-local-news/

kevsim said 14 days ago:

How does that help prevent plagiarism?

nillium said 13 days ago:

Because it isn't full articles -- just updates as they happen coming straight from the newsroom, more like tweets. It's not to say that people can't plagiarize, but it wouldn't be as easy or make as much sense as just copy and pasting an article.

rhizome said 13 days ago:

There's not much in the post so I'm gonna guess it's a form of content fingerprinting like we see with YouTube's Content ID, plus whatever is used in plagiarism-detection software used in schools and universities.

yummypaint said 13 days ago:

This model has seemingly taken over completely for online phonebooks. 10 years ago it was trivial to do things like reverse phone lookups online for free but now its an endless circlejerk of paywalls, search manipulation, and fraud. Almost makes me wish i still had an old fashioned book delivered.

aaron695 said 14 days ago:

OK first try. But needs more work.

Not much proven so far.

Many site seem to translate to language X and back to English to clean the data.

Research this.

Anyone using GANs yet?

How do you stop sites blocking your scraper?

There's money for the ad companies to allow you to plod along then steal your hard earned money because you are breaking the rules. Are they?