Google’s efforts to clean up the Internet and provide a useful advisory to Internet users has been very successful. Nearly every modern browser now incorporates Google’s Safe Browsing List information, to prevent users from inadvertently visiting malware infested websites and phishing websites.
In this article we will be analyzing the Google malware hash lists that have been published over the past few months in order to answer these important questions:
These are practical questions which are often posed by frustrated, sometimes confused and angry website owners, time and time again at help forums, and via our contact page.
Google has done a good job creating detailed help content describing the process of blacklisting, as well as a group where website owners can ask for help. Additionally there are excellent resources like BadwareBusters where users can find volunteers to help them. We also participate in these groups.
Yet, there is still a demand for getting clear cut answers to some basic questions like the ones detailed above. In this vein we want to provide scientifically sound and statistically significant analysis of freely available information to provide clear answers to these questions. A small FAQ is also available on our site to answer questions from website owners and admins.
This series of experiments is split into multiple parts. This article presents a first look (part 1) at openly available data. The goal of the experiment is to understand:
For the purposes of this experiment, Google malware hash lists were collected from March 3, 2010 to June 1, 2010 (113 days). Malware hash lists were collected every 30 minutes. Each malware hash list contains the information in the Google malware hash specification. All hash lists were parsed and unique hashes were extracted and time stamped, and correlated with the malware hash list version.
Subsequently an analysis was conducted to answer the questions posed above. At no point was an attempt identify a website name from the hashes. Also, note that a single website can have more than one unique hash. For example: “www.abcd.com”, “abcd.com”, and “www.abcd.com/infected/” can all generate different hashes.
It is clear from these initial results that a very large number of websites, nearly one quarter of the 6000 hashes added per day never make it off the Google blacklist. There are a number of reasons for this. One being that most webmasters, who may be good at website design and layouts, may not have the technical skills which are required to clean websites infected by malware and code injection attacks. We have also met website owners who are extremely business savvy, but lack the technical expertise to recover from a blacklisting event. The income lost due to business interruption in these cases is considerable.
We see that 43% of websites which get blacklisted manage to make it off the blacklist, but these websites suffer for an average period of 13 days.
Some websites manage to get off the blacklist and then fall in again. The average time for these “repeat offenders” on the blacklist is larger than the previous case. The time for which these “repeat offenders” stay clean is not very high, an average of just 17 days.
These numbers clearly show the current sorry state of website security. It is unfortunate that thousands of websites are affected every day. At stopthehacker.com, we strive to help combat this trend. These issues need to be addressed specifically by services that currently are not readily available to the masses. To address this vacuum in the service space, and disrupt the security market stopthehacker.com provides its advanced Health Monitoring and Vulnerability assessment services for website owners. Our services take away the anguish which business owners face when their websites are attacked. Please visit our services page to find out how we can help you. In fact, you can even sign up for free services.
Further detailed analysis will be presented in the second part of this series. We will show detailed analysis of the data and will provide more insight on the implications of these observations.
Stay tuned for Part 2!