Spend some time on YouTube and you may run into comments like
Make money working from home, get paid $$$ to fill in surveys. Go here…
Needless to say, the comments bring no value to the context of the video that you may be watching. More often than not it is exactly the same comment over and over, i.e., it’s YouTube Spam.
In this post, we try to answer the following :
- How big of a problem is this spam for YouTube?
- How do the spammers monetize?
- What tools & tricks are employed by the spammers?
Scope of the Problem
If we were on the backend of YouTube, we could take a naive approach to appreciating this problem:
“These are all our videos (N). Each video may be connected to a set of tainted comments (T); We consider a set of comments to be tainted when it contains spam. Having defined a function to determine if a set is tainted, we then get an idea of the scope of this problem by dividing T into N”
Of course, it doesn’t take into account the rank of each spammy comment, but that’s why this is called a naive approach.
Now we’re not on the backend of YouTube, but we are privy to the very front end of YouTube. In fact, we try to get a rough idea of how much of a problem this is by taking a look at only the default page presented when visiting youtube.com. This approach should work well for us because
- it’s a whole lot smaller than N above, so it’s reproducible for the folks at home
- it’s a page with massive traffic so will have massive attention from the spammers
- it’s a page with massive traffic so will have massive attention from the YouTube abuse team
The following YouTube page was loaded at approximately 5pm on 8/5/2013
There are 40 videos presented on the front page. If you’re going to try this for yourself at home, then you need to click on each of the videos and scroll down into the comments. Fortunately (or not), you don’t have to scroll very far because the spammers have a knack for having their comments placed right at the top. What you’re looking for is something like this:
Now 22.5% of the front page videos having tainted comments may not sound like an awful lot, but when you consider that this is for the third most popular page on earth (Alexa Rank #3), then what’s going on here starts to take on a whole new perspective.
So what’s really going on here?
At the very least, we know that spammers are targeting a significant percentage of the videos on YouTube’s front page. Of course, they’re not doing this for their health so how do they make their money?
Consider the comment on the first highlighted video presented:
This is how i am making tons of money every single month working at my house..
Step 1: Follow the guide on this page: goo.gl\nb1Bak
Step 2: Get paid 5-20 bucks to answer each survey
Step 3: Retire and move overseas
This is a packet trace of the network activity on a machine when you browse goo.gl/nb1Bak in a browser:
- goo.gl is Google’s URL Shortener.
- goo.gl\nb1Bak redirects to 18.104.22.168/~leechtv/paidsurveys/?7 which redirects to trk.surveyjunkie.com/srd/klenzxcp
- This then redirects to www.surveyjunkie.com
“So surveyjunkie.com is the spammer?”
No, surveyjunkie.com is not the spammer. Surveyjunkie is an advertiser in a Cost Per Lead (CPL) advertising model. They have an affiliate program which rewards affiliates when users sign up (leads). The spammer in this scenario is one of surveyjunkie’s affiliates (specifically ‘klenzxcp’), he is paid a finder’s fee when YouTube users sign up with surveyjunkie.com.
Now this may or may not violate surveyjunkie’s acceptable terms, although I could not find a policy detailing these terms. Of interest from the packet trace is that the Web request through to trk.surveyjunkie.com does not contain a referrer header, so surveyjunkie does not get to know where the traffic comes from. So they won’t know that it’s YouTube spam. One could argue that they choose not to know, but who is going to argue that?
“Okay but this is just a once off, you’ve only analyzed one comment”
Actually we analyzed all outbound links on all of the tainted comments. In this case all roads lead to surveyjunkie.com via two affiliates (klenzxcp and gqrzv5sx):
Obviously the spammers are capitalizing on a great source of traffic. You could argue that the traffic is free but you would be wrong. The traffic is pretty cheap, but it’s not free. If you were going to pull this off yourself as a spammer new to the scene, then you’d need a couple of things
- A set of accounts to post the initial spam as a comment (A). Any spammer worth his weight will suggest using Phone Verified Accounts. You could set these up yourself or you could buy 10 for $5
- A set of accounts (B) to thumbs up the comments posted by set A. This is how the spammers get to the top of the comment’s section. For each comment posted by A, a group of approvers from B will come along and give it a thumbs up which will quickly push it to the top. Naturally the size of B must be greater than the size of A. You can buy 100 regular (non PVA) YouTube accounts for $5
- The tricky part is writing a tool that will monitor the front page of YouTube and post comments (with approval from set B) on each of the videos that have not yet been targeted. Not too difficult if you have Compsci 101 behind you (or even just a few weeks fiddling with Python/Java/.Net…). You won’t have to write it yourself though, because there are plenty of bots that already do this for you (with captcha support!). Expect to spend anywhere from $50 to $150.
The costs above are not where it ends. If you refresh a video with tainted comments for a while, you will notice that the tainted comment does eventually disappear (feedback from the community marks it as bad). Of course, sit a little while longer and the tainted comment will return. So as much as the YouTube abuse team is fighting the spammers back, the spammers are constantly increasing the size of set A and B.
“It’s all out war out there! What’s an abuse team to do?”
This is not a trivial problem to solve. What surprised me the most from analyzing YouTube spam comments, is that the same comment after being taken down will quickly make its way back to the top. I’d make a bet that there’s low hanging fruit to be had here by combining user feedback on tainted comments with a unique hash on the comment itself. In doing so one could block the comment at the front door.
“Yeah right, the spammers will then simply diversify each comment enough to avoid whatever filter is put in place”
Sure. The trick here is then to get to the root of the problem and really put a dent in their armour: identify outbound CPL links.