Blacklists are published by many security groups and organizations around the world to share knowledge about malicious websites, IP addresses and other security features which allow others to insulate themselves from the dark side of the Internet.
In recent years, the number of blacklist being published by web-centric organizations have grown by leaps and bounds. Large Internet based companies such as Google, Yahoo and Microsoft have been providing cues to their users about malicious websites in trying to make the Internet a safer place. Google provides much more in-depth information than the other two, Yahoo and Bing, and seems to have sophisticated virtual machine based analysis tools which can detect misbehaving malicious code. Yahoo employs McAfee’s Search scan service while Bing potentially uses Microsoft specific technologies.
The aim of this experiment is to compare the coverage for each of the blacklists published by Google, Yahoo and Bing and compare them to what users in the Internet believe. To do this we will compare the results of Google, Yahoo, Bing and Malware Patrol with Web of Trust (WOT). Furthermore, we have also tried to see how many of these malicious URLs are also involved in Phishing. We have done this by looking up each URL/domain via Phishtank’s API.
Blacklists provide an easy mechanism for users (via browsers) and developers (via APIs) to assimilate security information about websites, IPs and such in order to make an informed decision about whether to allow or deny access to an IP or website.
We have collected 1095 confirmed malicious links from MalwareURL. Each of these links was tested to determine if they are listed on blacklists supplied by Google, Yahoo and Bing. Note that Yahoo and Bing unlike Google do not provide any direct APIs to probe their databases. Thereby each link, and its associated domain was pushed via an HTTP request to Yahoo and Bing to analyze if the results indicated that the domain/link was infected.
To determine if a website is present in the Google malware blacklist, the domain name along with the link and its variations, as defined here, are converted to MD5 hashes and checked using Google’s Safe Browsing API. For Malware Patrol, the aggressive version of their blacklist is downloaded and comparisons are made locally. For WOT, we employ their XML based API to gather information about the belief of users in the Internet. For Phishtank we have used their XML based API. The tests were conducted on Mar 22 2010.
Note: 1095 unique, malicious URLs were tested with each service.
Interestingly, Web Of Trust (WOT) marked 99% of the URLs with “poor” or “very poor” or “unsatisfactory” reputation. We have to assume that when users will see such a rating they will not visit the website in question and hence treat this kind of rating as unsafe, for the purposes of this test. It remains to be determined if WOT uses a data feed from a malware URL which we have used to prime the test set. Nonetheless, it is surprising to see that a company which specializes in collating the trust and opinions of web surfers performs better orders of magnitude than large Internet companies and established blacklist providers.
One must keep in mind though that Google’s approach to maintaining an ever changing blacklist is slightly different from the other actors in the game. Google publishes an updated version of its list every 30 minutes or so and specifies which MD5 hashes need to be purged and which ones need to be inserted. Some blacklist services do not take this approach and hence may claim to store information on millions of sites, which were infected at one point in time. The probability of this happening in the Google blacklist is low, because they have opened up a review process via their webmaster central area to update their blacklist.
In contrast, Bing and Yahoo do not provide public APIs for developers and applications to use.
Also, we see that none of the URL/domains were actually listed on Phishtank. It seems that websites which aim to infect users with malware are quite different from the set of sites used for phishing. It does not seem that malware laced websites are also used to commit phishing.
Large Internet companies, some of whom have published effective blacklists, used by many developers and application all over the world, still have a long way to go in order to become truly effective. As we have seen, only minuscule numbers of malicious websites are identified by the blacklist services. WOT seems to be extremely effective at identifying unsafe websites. It remains to be determined whether the data-set used for this test has a large overlap with any of the sources WOT uses to classify websites.
Another interesting result is that it does not seem that websites which aim to infect users with malware are actively involved in phishing campaigns.