There is a Ukranian bot out there that is crawling and distorting the stats for millions of websites, and to some degree, is affecting every single site I’ve looked at this past week.
Go to Google Analytics, look at your referrers, and I’ll bet you a beer that SEMalt and crawler.semalt.com are both listed with dozens of visits over this past month. In some cases we’re seeing a history with them dating back to January 2 of this year.
What is SEMalt Up To?
According to one of their employees:
Then he answered again, and fed me this load of BS:
An accident? That’s a lie.
In a page on their website it says this:
Semalt crawler bots visit website and gather statistical data for our service simulating real user behavior: unique IP, browser, display resolution etc. This information is used exclusively within the Semalt.com project and isn’t revealed to a third party.
On their “about” page (their menu is in their footer) they claim to offer various tools, like keyword ranking, brand monitoring, reports, competitor explorer, website analyzer and a report system.
Presumably, these crawls are feeding their “competitor explorer” with info they then provide to their paying subscribers, but I don’t know that to be true.
Here on SEMpdx, here’s what the referrals looked like for March, where they visited nearly every single day for a total of 94 times.
Does it Matter?
If you’re a medium-sized website you probably didn’t notice, but if you’re a local business that only gets a few hundred visitors a month, you may just find that they are your number one referrer, and that’s severely distorting your stats.
They appeared to be friendly enough when I first Tweeted at them the other day, but I’ve done some digging now, and I distrust them…
Why do I distrust them?
- They visit sites from no consistent IP address or IP range
- They are stealing your bandwidth
- They are using your server resources
- They are skewing your stats
- They do not follow robots.txt
Depending on the size of your site, may be drastically skewing your overall statistics from your overall visitor count, to your conversion percentages and bounce rates.
For example, for one new local client with only about 300 visitors last month, SEMalt accounted for over 70 visits, which is more than 20% of their all their traffic!
A Special Message
SEMalt has managed to anger this site owner so much, that they added a special message just for them in his website header:
How can SEMalt be stopped?
They put up a page where you can supposedly list your domain for removal, but again, I don’t think I trust them. Here’s a link to their “removal” tool
Since I’ve discovered that blocking them via robots.txt didn’t work, and found that blocking their IP wasn’t possible, I began looking for the best way to edit our .htaccess file, and I had to try a couple of options before I found something that would work on the SEMpdx server.
Rather than provide you with .htaccess code here, which may or may not work for you in your hosting environment, I’ll refer you to a very useful post, where there are a lot of folks discussing the semalt situation and that’s where they show several options for .htaccess editing.
If they would simply obey a sites robots.txt, I think a lot of people would not worry about it, and might even try their service. Until they do though, we’re aggressively blocking them.