SEMalt is Skewing Your Stats

SEMalt is Skewing Your Stats

There is a Ukranian bot out there that is crawling and distorting the stats for millions of  websites, and to some degree, is affecting every single site I’ve looked at this past week.

Go to Google Analytics, look at your referrers, and I’ll bet you a beer that SEMalt and crawler.semalt.com are both listed with dozens of visits over this past month.  In some cases we’re seeing a history with them dating back to January 2 of this year.

What is SEMalt Up To?

According to one of their employees: 

4 6 2014 9 07 16 AM SEMalt is Skewing Your Stats image

Then he answered again, and fed me this load of BS: 

4 6 2014 2 03 50 PM SEMalt is Skewing Your Stats image

An accident? That’s a lie.

In a page on their website it says this:

Semalt crawler bots visit website and gather statistical data for our service simulating real user behavior: unique IP, browser, display resolution etc. This information is used exclusively within the Semalt.com project and isn’t revealed to a third party.

On their “about” page (their menu is in their footer) they claim to offer various tools, like keyword ranking, brand monitoring, reports, competitor explorer, website analyzer and a report system.

Presumably, these crawls are feeding their “competitor explorer” with info they then provide to their paying subscribers, but I don’t know that to be true.

Here on SEMpdx,  here’s what the referrals looked like for March, where they visited nearly every single day for a total of 94 times.

4 6 2014 9 53 58 AM1 SEMalt is Skewing Your Stats image

Does it Matter?

If you’re a medium-sized website you probably didn’t notice, but if you’re a local business that only gets a few hundred visitors a month, you may just find that they are your number one referrer, and that’s severely distorting your stats.

They appeared to be friendly enough when I first Tweeted at them the other day, but I’ve done some digging now,  and I distrust them…

Why do I distrust them?

  • They visit sites from no consistent IP address or IP range
  • They are stealing your bandwidth
  • They are using your server resources
  • They are skewing your stats
  • They do not follow robots.txt

Depending on the size of your site,  may be drastically skewing your overall statistics from your overall visitor count, to your conversion percentages and bounce rates.

For example, for one new local client with only about 300 visitors last month, SEMalt accounted for over 70 visits, which is more than 20% of their all their  traffic!

A Special Message

SEMalt has managed to anger this site owner so much, that they added a special message just for them in his website header –

4 6 2014 8 01 55 AM SEMalt is Skewing Your Stats image

How can SEMalt be stopped?

They put up a page where you can supposedly list your domain for removal, but again, I don’t think I trust them.  Here’s a link to their “removal” tool

Since I’ve discovered that blocking them via robots.txt didn’t work, and  found that blocking their IP wasn’t possible, I began looking for the best way to edit our .htaccess file, and I had to try a couple of options before I found something that would work on the SEMpdx server.

Rather than provide you with .htaccess code here, which may or may not work for you in your hosting environment, I’ll refer you to a very useful post, where there are a lot of folks discussing the semalt situation and that’s where they show several options for .htaccess editing.

If they would simply obey a sites robots.txt, I think a lot of people would not worry about it, and might even try their service. Until they do though, we’re aggressively blocking them.

The following two tabs change content below.
21.thumbnail SEMalt is Skewing Your Stats image
Scott Hendison is the CEO of Search Commander, Inc. and one of the founding board members of SEMpdx.   Here is his Google profile.
21.thumbnail SEMalt is Skewing Your Stats image

Latest posts by Scott Hendison (see all)

14 Comments

  1. We noticed it a few weeks ago and up to 300 visits within that time. Either way, they quickly made a name for themselves by penetrating statistics over more traditional approaches. It would be fascinating to see how their link profile grows now.

    Reply
    • Scott Hendison

      Yes, they can be filtered, good point, and we had to do that for a handful of client reports.

      Reply
  2. I know a lot of people have been talking about semalt. I know they are not respecting robot.txt files. I will try and block them via htaccess file. I wish the could code their software better so we didn’t have to block them.

    Reply
    • Scott Hendison

      They could easily choose to follow robots.txt if they cared to, and the .htaccess option seems like a drastic option but is likely best – however – that’s a hell of a lot of work editing those files for every single domain or client.

      In some (many) cases we don’t even have access to .htaccess or robots, but from their own “bulk blocking” page, we were able to drop in 130+ domain names all at once, and within two days all the referral visits seemed to stop.

      Reply
  3. Thanks for the helpful informations. All our Sites in Germany are polluted by Semalt yet.

    Reply
  4. I’d like to clarify why Semalt has negative response. Daily our technical robots visit many websites. These robots harvest statistical data for our service and don’t cause any harm to the users’ web resources.

    We understand each user who asks us to disable crawling activity on their websites. We bring apologies to each user and honor the request. So we’ve created a special tool to remove sites from the list of web resources we visit – Semalt Crawler http://semalt.com/project_crawler.php

    Anyway, we’re always glad to answer all the questions regarding to Semalt.

    Reply
    • Scott Hendison

      Thanks for responding –

      You claim that “these robots… don’t cause any harm to the users’ web resources”. I would argue that is false. Merely by using the resources and distorting referrer stats you are causing problems.

      You also claim you are “glad to answer all the questions regarding to Semalt” – Well, here’s mine… Why don’t you follow users robots.txt?

      Reply
  5. You’re definitely not alone, especially in being threatened by their Twitter PR.

    http://thenewfr0ntier.blogspot.nl/2014/03/anyone-running-blogger-or-wordpress.html
    http://blog.nabble.nl/post/93306955157/semalt-infecting-computers-to-spam-the-web
    and my own
    https://blog.flameeyes.eu/2014/08/antibiotics-for-the-internet-or-why-blocking-semalt-crawlers

    I wish I had read Nabble’s post before writing mine, they are much worse than I made them out to be.

    Reply
  6. Unless you manage websites for local businesses and Semalt accounts for well over 50% of your monthly traffic. I’ve got one site now where Semalt accounts for 72.25% of the traffic and has a 100% bounce rate – and that’s not counting youtube.downloader and kambasoft which skew the data even further.

    Filtering this stuff is a huge pain.

    Reply
    • Scott Hendison

      Sorry, Alena, that’s not something I can help with. It’s probably easiest to just go to their site and add your domain to their “do not crawl” list.

      Reply

Submit a Comment

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>