Archive

Posts Tagged ‘bing’

Is User Trust More Effective Than Blacklisting?

April 6th, 2010

Blacklists are published by many security groups and organizations around the world to share knowledge about malicious websites, IP addresses and other security features which allow others to insulate themselves from the dark side of the Internet.

In recent years, the number of blacklist being published by web-centric organizations have grown by leaps and bounds. Large Internet based companies such as Google, Yahoo and Microsoft have been providing cues to their users about malicious websites in trying to make the Internet a safer place. Google provides much more in-depth information than the other two, Yahoo and Bing, and seems to have sophisticated virtual machine based analysis tools which can detect misbehaving malicious code. Yahoo employs McAfee’s Search scan service while Bing potentially uses Microsoft specific technologies.

Experiment Goal

The aim of this experiment is to compare the coverage for each of the blacklists published by Google, Yahoo and Bing and compare them to what users in the Internet believe. To do this we will compare the results of Google, Yahoo, Bing and Malware Patrol with Web of Trust (WOT). Furthermore, we have also tried to see how many of these malicious URLs are also involved in Phishing. We have done this by looking up each URL/domain via Phishtank’s API.

Blacklists provide an easy mechanism for users (via browsers) and developers (via APIs) to assimilate security information about websites, IPs and such in order to make an informed decision about whether to allow or deny access to an IP or website.

Methodology

We have collected 1095 confirmed malicious links from MalwareURL. Each of these links was tested to determine if they are listed on blacklists supplied by Google, Yahoo and Bing. Note that Yahoo and Bing unlike Google do not provide any direct APIs to probe their databases. Thereby each link, and its associated domain was pushed via an HTTP request to Yahoo and Bing to analyze if the results indicated that the domain/link was infected.

To determine if a website is present in the Google malware blacklist, the domain name along with the link and its variations, as defined here, are converted to MD5 hashes and checked using Google’s Safe Browsing API. For Malware Patrol, the aggressive version of their blacklist is downloaded and comparisons are made locally. For WOT, we employ their XML based API to gather information about the belief of users in the Internet. For Phishtank we have used their XML based API. The tests were conducted on Mar 22 2010.

Popular blacklists cover only a minuscule percentage of malicious sites.

Popular blacklists cover only a minuscule percentage of malicious sites.

Highlights

  • Google marked 0.18% of the URLs as unsafe.
  • Yahoo marked 1.0% of the URLs as unsafe.
  • Bing marked 0.09% of the URLs as unsafe.
  • Malware Patrol marked 0.63% of the URLs as unsafe.
  • Phishtank marked 0% of the URLs as unsafe.
  • WOT marked 99% of URLs as unsafe.

Note: 1095 unique, malicious URLs were tested with each service.

Observations

Interestingly, Web Of Trust (WOT) marked 99% of the URLs with “poor” or “very poor” or “unsatisfactory” reputation. We have to assume that when users will see such a rating they will not visit the website in question and hence treat this kind of rating as unsafe, for the purposes of this test. It remains to be determined if WOT uses a data feed from a malware URL which we have used to prime the test set. Nonetheless, it is surprising to see that a company which specializes in collating the trust and opinions of web surfers performs better orders of magnitude than large Internet companies and established blacklist providers.

One must keep in mind though that Google’s approach to maintaining an ever changing blacklist is slightly different from the other actors in the game. Google publishes an updated version of its list every 30 minutes or so and specifies which MD5 hashes need to be purged and which ones need to be inserted. Some blacklist services do not take this approach and hence may claim to store information on millions of sites, which were infected at one point in time. The probability of this happening in the Google blacklist is low, because they have opened up a review process via their webmaster central area to update their blacklist.

In contrast, Bing and Yahoo do not provide public APIs for developers and applications to use.

Also, we see that none of the URL/domains were actually listed on Phishtank. It seems that websites which aim to infect users with malware are quite different from the set of sites used for phishing. It does not seem that malware laced websites are also used to commit phishing.

Conclusion

Large Internet companies, some of whom have published effective blacklists, used by many developers and application all over the world, still have a long way to go in order to become truly effective. As we have seen, only minuscule numbers of malicious websites are identified by the blacklist services. WOT seems to be extremely effective at identifying unsafe websites. It remains to be determined whether the data-set used for this test has a large overlap with any of the sources WOT uses to classify websites.

Another interesting result is that it does not seem that websites which aim to infect users with malware are actively involved in phishing campaigns.

Report, Security , , , ,

Yes, Search Engines Can Infect Your Computer

March 8th, 2010

Search engines, like Google, Yahoo and Bing offer users the ability to scour the plethora of information on the Internet. These search engines index content on websites and often maintain cached copies of these sites so that, in the event that the site is unavailable, visitors can still view the contents of the website.

Unfortunately, the idea of page caching has not been implemented well. In fact, page caching has opened up new opportunities for malware. The primary problem being that, from a security perspective, when search engines cache copies of websites, they are storing any malware that is present on the site on their own infrastructure as well.

Hackers Exploit Search Engine Page Caches

Most large search engines use some kind of malware analysis to determine if a website is compromised or not. Google for example, has a well tuned system with high accuracy. In our meeting with the Google malware team, some months ago, we were glad to find that they were already aware of this problem. In the weeks following our interaction, cached copies of infected websites were no longer easily available via searches.

Not so long ago, we wrote an article about our efforts to alert Yahoo of the presence of malware in the cached versions of various web pages served up by their search engine. Our efforts were not successful, although the occurrence of malware in Yahoo cached pages seems to have gone down significantly. Perhaps our messages were not entirely ignored.

Recently, an article came up on ISC SANS discussing this very same issue.

Recently, we have found instances of Bing serving up malware in their cached pages. It seems that Bing’s malware detection methods are not able to reliably detect malware on cached web pages. This keeps Bing from securing cached pages which contain malware for its users. We have provided screen shots below as an example of the issue. In this particular case, the strain of malware found in Bing cached pages has been around since 2009.

Search Engines Ignore the Problem

Consider the case where a malicious individual deliberately infects a website with malware and Bing (or another search engine) indexes it. The malicious individual can then send out hyperlinks pointing to the cached web pages hosted by Bing. Any kind of “reputation-checking” for the cached link will confirm that the page is hosted by a reputable company, in this case, Bing (Microsoft). However, the malware will still be able to deliver its payload. Just in case you’re thinking, “my antivirus will protect me from the malware on the cached page,” you may like to read this article.

It is surprising to see that search engines like Bing, which claim to implement malware detection, cannot correctly determine if a cached copy of a web page hosts malware! In these cases, Bing ends up an excellent attack vector for malicious individual.

It remains to be seen if search engine companies will continue to serve up cached pages laced with malware at the same time as they are touting active scan and detection mechanisms. Let’s hope this article can get attention in the upper echelons of management at these large search giants and they start to pay attention to this problem.

Screen shots follow below:

Report, Security , , , , , , ,

How Safe are Internet Website Directories?

January 23rd, 2010

Recently, we told you that Dmoz.org, one of the largest user-edited directories on the Internet, is also one of the safest directories. Directories such as Dmoz.org contain links to hundreds of thousands to millions of sites. These directories are categorized by volunteers or through automated means. Many search engines, including Google, Hotbot and others, potentially use data from these directories. These directories are also used as efficient lookup services by thousands of web-surfers who want to locate sites which belong to a very specific category.

Given the important role that these directories play in the Internet, one would expect that they would make an attempt to point only to websites which are “safe.” By “safe,” we mean sites which have not been injected with malware, via code-injection attacks or other attack vectors.

We are not picking on Dmoz.org here. We were very impressed to see that none of the 2.8 million sites we profiled, were present on the Google Safe Browsing List. This could indicate that sites listed on Dmoz.org are concerned about their image, hence care about their visitors, and take appropriate precautions against malware.

To follow up on our previous article, we have further analyzed 10,000 sites, randomly chosen from the Dmoz.org corpus of nearly 2.8 million websites. Each of the 10,000 sites was tested against each of the below website reputation services.

Note: When analyzing a domain-name or URL, for verification with the Google Safe Browsing List, we have calculated the hash of the website name to match against the list. The test was conducted between January 19th and January 21st, 2010. The list of domain names tested are presented at the end of this article.

We identify the most interesting results below:

  1. McAfee SiteAdvisor marked 0.39% of domains as Unsafe, 84.23% as Safe, 15.08% as Untested and 0.3% as Potentially-Unsafe.
  2. Norton Safe Web marked 0.39% of domains as Unsafe, 59.02% as Safe, 39.79% as Untested and 0.8% as Potentially-Unsafe.
  3. Google Safe Browsing marked 0.02% of domains as Unsafe, 99.98% as Safe.
    Note: The presence of the hash of the domain name being tested, on the Google Safe Browsing List, is interpreted as “Unsafe” while its absence is interpreted as “Safe.”
  4. Microsoft Bing marked 0.06% of domains as Unsafe, 93.2% as Safe, and 6.74% as Untested.
  5. Comodo Site Inspector marked 0.08% of domains as Unsafe, 99.46% as Safe, and 0.44% as Unreachable.
    Note: We were only able to test the first 5000 URLs with Comodo Site Inspector.

McAffee SiteAdvisor and Norton SafeWeb seem to detect nearly 19 times more websites as “Unsafe to Visit” than Google, and nearly 6 times more websites as “Unsafe to Visit” than Bing. It is interesting to note that it is an order of magnitude difference in the number of websites marked as “Unsafe to Visit” by these competing services.

We would like to know how long McAfee, Norton or Bing cache results for a particular site. Google allows webmasters to request reviews when they believe the site has been disinfected, and Comodo’s service seems to be an On-Demand service. This makes an interesting place to start for a future experiment. Further, it would be interesting to see whether sites listed on Yahoo the Directory and other directories are classified by these services.
Read more…

Report, Security , , , , , , ,

Website-Reputation Services Agree to Disagree

January 17th, 2010

We have recently published statistics comparing various website reputation services and have received good feedback over private channels regarding our article. In this sequel we add Microsoft’s Bing, malware filter along with comparison to other website reputation services.

At StopTheHacker.com (Jaal LLC) we have conducted tests of 721 URLs, all of which have been reported as malicious by volunteers of various blacklists. We follow a similar format for presentation of results as in the last post.

Website Reputation services: agree to disagree.

Website Reputation services: agree to disagree.

Note: All 721 domains/URLs, were reported as malicious, and were collected from malware.com.br on January 14, 2010. The blue column (maximum 100) indicates the percentage of sites that the website-reputation service correctly identified as unsafe. The orange column (maximum 100) indicates the percentage of sites that the website-reputation services incorrectly identified as safe.

The aim of the test:

  1. Identify the accuracy of the website reputation service
  2. Identify the overlap in terms of safe/unsafe websites

We present the most interesting results in this article. First we detail the parameters of the testing procedure to provide an idea of how the test was set up.

First, 721 URLs were collected from malware.com.br (mbr) on January 14, 2010. These URLs are reported for listing by one or more of the following: individuals, organizations, agencies and software products or services.  For the purposes of this test we assume that all the URLs obtained from the “regular” list on mbr are malicious and hence deemed “unsafe” to visit.

We compare the reputation provided by each website-reputation service and observe how many websites are marked unsafe, safe, untested, maybe-unsafe/caution/potentially-unsafe, and unreachable.

Website-reputation services tested:

Note, that when analyzing a domainname/URL, for checking with the Google safebrowsing API, we have calculated the MD5 hash of the website name to match with the malware hash list. The date that we conducted this test was: January 15, 2010. The list of domain names tested are presented below and a graph representing the statistics for the 721 sites tested is above.

We identify the most interesting results below:

  1. McAfee SiteAdvisor marked 36.75% of domains as Unsafe, 27.18% as Safe, 32.32% as Untested and 3.74% as Potentially-Unsafe.
  2. Norton Safe Web marked 41.75% of domains as Unsafe, 45.49% as Safe, 4.3% as Untested and 8.32% as Potentially-Unsafe.
  3. Google Safe Browsing marked 5.96% of domains as Unsafe, 94.04% as Safe.
    Note: The presence of the hash of the domain name  being tested, on the google malware hash list, is interpreted as “unsafe” while the absence is interpreted as “safe.”
  4. Microsoft Bing marked 0.69% of domains as Unsafe, 34.26% as Safe, and 65.05% as Untested
  5. Comodo SiteInspector marked 0.19% of domains as Unsafe, 95.82% as Safe, and 4.08% as Unreachable.

This follow-up experiment also shows that the variance between website reputation services that are currently being offered by large Internet-services/security companies continues to be very large indeed.

After discussions with representatives of the companies mentioned in this article, and getting a better idea of their behind the scenes methodologies. It seems that these website reputation services will continue to “agree to disagree.” We welcome their comments.

A note on differences between website reputation services:

Some of the services scan pages and some scan parts of a site. Some scan for potential “signs” of an infection, while others scan for the “postmortem” effect of an infection, such as an exploit being launched. Furthermore, the time difference between one of the services testing a web page or site versus when another one tests the same web page can also complicate issues. At StopTheHacker.com we recognize the current limitations of website reputation services that being offered by the industry.

In conclusion, while website reputation services have come a long way, they still have an even longer path to tread in order to become something that users should trust implicitly.

News, Report, Security , , , , , , , ,