Archive

Posts Tagged ‘google’

Analyzing the Google Blacklist, Part 2

June 30th, 2010

Building on our first article in the series, we continue to analyze the Google Safe Browsing List. In this part, we present more detailed statistics about the hashes seen on the blacklist and try to provide insight into what we observe.

Motivation
Understanding the behavior of infected websites is very important. This provides security researchers with strategies to help deal a blow to the bad guys and at the same time, provide website owners and administrators an idea of the current state of website security.

Since the publication of our last article in this series, we have received good feedback from our colleagues in security. We will attempt to incorporate their comments and concerns in this part of the series.

Methodology
We discussed the aim of this experiment and methodology in the last part of this series. We won’t repeat them here, but we encourage you to take a look at our first article in this series if you haven’t already read it!

Analysis
Below we present some graphs which provide more information about the analysis.

  • Websites have a high probability of getting hacked on a Wednesday!
Websites have a high probability of getting hacked on a Wednesday!

Websites have a high probability of getting hacked on a Wednesday!

  • Websites have a high probability of getting hacked between 7-8 PM PDT.
Websites have a high probability of getting hacked between 7-8 PM PDT.

Websites have a high probability of getting hacked between 7-8 PM PDT.

  • On Monday websites get hacked most between 11 AM to 12 Noon, PDT
  • On Tuesday websites get hacked most between 9 AM to 10 AM, PDT
  • On Wednesday websites get hacked most between 7 PM to 8 PM, PDT
  • On Thursday websites get hacked most between 10 PM to 11 PM, PDT
  • On Friday websites get hacked most between 11 AM to 12 Noon, PDT
  • On Saturday websites get hacked most between 1 PM to 2 PM, PDT
  • On Sunday websites get hacked most between 11 AM to 12 Noon, PDT

Note: Most hashes which stay on the blacklist (over the 113 day period) seem to get added to the blacklist on Wednesday.

Conclusions
We have presented more interesting statistics regarding the appearance of website hashes on the Google Safe Browsing List. These statistics provide information which website administrators and owners can use better arm themselves with against attackers. We will continue analyzing the dataset to provide more interesting information. If you have any questions please add a comment.

At stopthehacker.com, we work hard to help you combat malicious hackers. If you would like to work with us, please drop us an email. You can also visit our services page to find out how we can help you, in fact you can even sign up for free services!

Till next time…

News, Report, Security , , , ,

Analyzing the Google Blacklist, Part 1

June 28th, 2010

Google’s efforts to clean up the Internet and provide a useful advisory to Internet users has been very successful. Nearly every modern browser now incorporates Google’s Safe Browsing List information, to prevent users from inadvertently visiting malware infested websites and phishing websites.

Motivation
In this article we will be analyzing the Google malware hash lists that have been published over the past few months in order to answer these important questions:

  • How many websites get blacklisted each day?
  • How many websites manage to get off the blacklist?
  • How soon do websites get off the blacklist?
  • How many never get off the blacklist?

These are practical questions which are often posed by frustrated, sometimes confused and angry website owners, time and time again at help forums, and via our contact page.

Resources
Google has done a good job creating detailed help content describing the process of blacklisting, as well as a group where website owners can ask for help. Additionally there are excellent resources like BadwareBusters where users can find volunteers to help them. We also participate in these groups.

Yet, there is still a demand for getting clear cut answers to some basic questions like the ones detailed above. In this vein we want to provide scientifically sound and statistically significant analysis of freely available information to provide clear answers to these questions. A small FAQ is also available on our site to answer questions from website owners and admins.

Goals
This series of experiments is split into multiple parts. This article presents a first look (part 1) at openly available data. The goal of the experiment is to understand:

  • How many websites get blacklisted each day?
  • How many websites manage to get off the blacklist?
  • How soon do websites get off the blacklist?
  • How many never get off the blacklist?
  • How many websites fall back onto the blacklist?
  • How much time elapses before a website falls back into the blacklist?

Methodology
For the purposes of this experiment, Google malware hash lists were collected from March 3, 2010 to June 1, 2010 (113 days). Malware hash lists were collected every 30 minutes. Each malware hash list contains the information in the Google malware hash specification. All hash lists were parsed and unique hashes were extracted and time stamped, and correlated with the malware hash list version.

Subsequently an analysis was conducted to answer the questions posed above. At no point was an attempt identify a website name from the hashes. Also, note that a single website can have more than one unique hash. For example: “www.abcd.com”, “abcd.com”, and “www.abcd.com/infected/” can all generate different hashes.

Brief Highlights

  • Total number of unique hashes tracked: 688,602.
  • Average number of unique hashes per day (over 113 day period): 6093.
  • 25.8% of hashes never got off the Google blacklist.
    Each one of these unique hashes was deemed infected for over 3 months (greater than 113 days).
  • 43% of hashes were listed exactly once as infected and managed to get off the Google blacklist.
    The average time each of these hashes was blacklisted was 13 days (89 days max).
  • 2% of hashes were blacklisted exactly twice.
    Each one of these hashes was blacklisted, was then removed from the blacklist and then fell back in (the sites were hacked again). These sites remained infected for an average of 19 days (89 days max), and remained clean for an average of 17 days before being hacked again.

Analysis
It is clear from these initial results that a very large number of websites, nearly one quarter of the 6000 hashes added per day never make it off the Google blacklist. There are a number of reasons for this. One being that most webmasters, who may be good at website design and layouts, may not have the technical skills which are required to clean websites infected by malware and code injection attacks. We have also met website owners who are extremely business savvy, but lack the technical expertise to recover from a blacklisting event. The income lost due to business interruption in these cases is considerable.

We see that 43% of websites which get blacklisted manage to make it off the blacklist, but these websites suffer for an average period of 13 days.

Some websites manage to get off the blacklist and then fall in again. The average time for these “repeat offenders” on the blacklist is larger than the previous case. The time for which these “repeat offenders” stay clean is not very high, an average of just 17 days.

Conclusion
These numbers clearly show the current sorry state of website security. It is unfortunate that thousands of websites are affected every day. At stopthehacker.com, we strive to help combat this trend.  These issues need to be addressed specifically by services that currently are not readily available to the masses. To address this vacuum in the service space, and disrupt the security market stopthehacker.com provides its advanced Health Monitoring and Vulnerability assessment services for website owners. Our services take away the anguish which business owners face when their websites are attacked. Please visit our services page to find out how we can help you. In fact, you can even sign up for free services.

Further detailed analysis will be presented in the second part of this series. We will show detailed analysis of the data and will provide more insight on the implications of these observations.

Stay tuned for Part 2!

News, Report, Security , , , ,

Why Did My PageRank Go Down? – SEO Poisoning

May 10th, 2010

Search engines like Google drive the majority of traffic to websites. Therefore, it is important for webmasters to appear high on search rankings and prominently in search results. To this affect website owners often spend large sums of money on Search Engine Optimization (SEO) strategies: using the right keywords, getting linked to by popular sites, getting a dialogue about the website going on good forums and much more.

Overview

The popularity, relevance and importance of a website, which determines where in the search rankings it should appear, can simplistically, thought to be represented by one magic number: the Google PageRank. This article is not about how to calculate, improve or tune your Google PageRank.

This article will discuss how a hacker can break into your site, without you knowing and reduce your Google PageRank, thereby making your website plummet from the top rankings in search engines, making your business lose money and visibility.

An Example

On May 7th, 2010, we reviewed a compromise of one of many sites we scan on a daily basis. This site was attacked by a hacker who had exploited a vulnerability in the web application used to host the website. Once the hacker had identified the specific vulnerability, which was WordPress based, he injected spam links into the source code of the pages on the site.

All the spam links are nicely placed after the main body of the legitimate HTML portion and even starts with a comment tag “<!– google –>”!

Conclusion

The affect of this spam link injection was that the PageRank of the legitimate site was potentially reduced since many links on the website now pointed to spam or malicious pages. This could result in lower positioning in search results as displayed on various search engines. This is yet another case where webmasters and administrators, who are already overloaded with many tasks, were either unaware or could not pay attention to the security breach.

At stopthehacker.com we are always available to help. If you have suffered from a breach of this kind and would like to share your experience, please contact us.

Report, Security , , , ,

Is User Trust More Effective Than Blacklisting?

April 6th, 2010

Blacklists are published by many security groups and organizations around the world to share knowledge about malicious websites, IP addresses and other security features which allow others to insulate themselves from the dark side of the Internet.

In recent years, the number of blacklist being published by web-centric organizations have grown by leaps and bounds. Large Internet based companies such as Google, Yahoo and Microsoft have been providing cues to their users about malicious websites in trying to make the Internet a safer place. Google provides much more in-depth information than the other two, Yahoo and Bing, and seems to have sophisticated virtual machine based analysis tools which can detect misbehaving malicious code. Yahoo employs McAfee’s Search scan service while Bing potentially uses Microsoft specific technologies.

Experiment Goal

The aim of this experiment is to compare the coverage for each of the blacklists published by Google, Yahoo and Bing and compare them to what users in the Internet believe. To do this we will compare the results of Google, Yahoo, Bing and Malware Patrol with Web of Trust (WOT). Furthermore, we have also tried to see how many of these malicious URLs are also involved in Phishing. We have done this by looking up each URL/domain via Phishtank’s API.

Blacklists provide an easy mechanism for users (via browsers) and developers (via APIs) to assimilate security information about websites, IPs and such in order to make an informed decision about whether to allow or deny access to an IP or website.

Methodology

We have collected 1095 confirmed malicious links from MalwareURL. Each of these links was tested to determine if they are listed on blacklists supplied by Google, Yahoo and Bing. Note that Yahoo and Bing unlike Google do not provide any direct APIs to probe their databases. Thereby each link, and its associated domain was pushed via an HTTP request to Yahoo and Bing to analyze if the results indicated that the domain/link was infected.

To determine if a website is present in the Google malware blacklist, the domain name along with the link and its variations, as defined here, are converted to MD5 hashes and checked using Google’s Safe Browsing API. For Malware Patrol, the aggressive version of their blacklist is downloaded and comparisons are made locally. For WOT, we employ their XML based API to gather information about the belief of users in the Internet. For Phishtank we have used their XML based API. The tests were conducted on Mar 22 2010.

Popular blacklists cover only a minuscule percentage of malicious sites.

Popular blacklists cover only a minuscule percentage of malicious sites.

Highlights

  • Google marked 0.18% of the URLs as unsafe.
  • Yahoo marked 1.0% of the URLs as unsafe.
  • Bing marked 0.09% of the URLs as unsafe.
  • Malware Patrol marked 0.63% of the URLs as unsafe.
  • Phishtank marked 0% of the URLs as unsafe.
  • WOT marked 99% of URLs as unsafe.

Note: 1095 unique, malicious URLs were tested with each service.

Observations

Interestingly, Web Of Trust (WOT) marked 99% of the URLs with “poor” or “very poor” or “unsatisfactory” reputation. We have to assume that when users will see such a rating they will not visit the website in question and hence treat this kind of rating as unsafe, for the purposes of this test. It remains to be determined if WOT uses a data feed from a malware URL which we have used to prime the test set. Nonetheless, it is surprising to see that a company which specializes in collating the trust and opinions of web surfers performs better orders of magnitude than large Internet companies and established blacklist providers.

One must keep in mind though that Google’s approach to maintaining an ever changing blacklist is slightly different from the other actors in the game. Google publishes an updated version of its list every 30 minutes or so and specifies which MD5 hashes need to be purged and which ones need to be inserted. Some blacklist services do not take this approach and hence may claim to store information on millions of sites, which were infected at one point in time. The probability of this happening in the Google blacklist is low, because they have opened up a review process via their webmaster central area to update their blacklist.

In contrast, Bing and Yahoo do not provide public APIs for developers and applications to use.

Also, we see that none of the URL/domains were actually listed on Phishtank. It seems that websites which aim to infect users with malware are quite different from the set of sites used for phishing. It does not seem that malware laced websites are also used to commit phishing.

Conclusion

Large Internet companies, some of whom have published effective blacklists, used by many developers and application all over the world, still have a long way to go in order to become truly effective. As we have seen, only minuscule numbers of malicious websites are identified by the blacklist services. WOT seems to be extremely effective at identifying unsafe websites. It remains to be determined whether the data-set used for this test has a large overlap with any of the sources WOT uses to classify websites.

Another interesting result is that it does not seem that websites which aim to infect users with malware are actively involved in phishing campaigns.

Report, Security , , , ,

Hackers Use Google Trends to Poison Searches

April 5th, 2010

Hackers are using a relatively new technique to lure users into visiting malicious websites. SEO poisoning is a method by which hackers can get a malicious link or URL, indexed by a search engine. When users search for terms that match the context of the malicious link, unsuspecting web surfers are often served malicious links which can divert them to harmful websites that commit all kinds of nasty deeds, ranging from ID theft to installing malware.

Overview

SEO poisoning is not new, but it is definitely a growing trend. It is becoming a vector of choice for hackers. The procedure to commit this crime is actually quite similar to the method of code-injection. First, find a vulnerability in the website or hosting infrastructure which will allow a hacker to upload malicious code or modify the behavior of the web application. Once this is achieved a hacker can insert URLs into a web page which will be indexed by search engines such as Google.

Below, we provide a screen shot to illustrate that hackers are reverse-engineering popular keywords from Google search trends to exploit unsuspecting users. In this particular example, the search query is extracted from Google Trends and results clearly show URLs which redirect users to fake anti-virus websites. Unfortunately, few of these URLs are even blacklisted by Google and hence users do not even have the luxury of making a decision to visit an unsafe website or not.

Experiment Goal

The aim of this experiment is to identify URLs which are using SEO poisoning.

Methodology

Search results were collected from Google Trends. Once the search queries were collected, searches were performed via Google and the first  10 results were collected for each search query.

Each search result was analyzed to find whether the URLs displayed in the search results contained the complete search query in the exact same order. Also, it was determined whether the structure of the URL matched patterns of SEO poisoning. Furthermore, the IP associated with the URL was looked up on Spamcop to verify if the IP had been used for sending spam or had participated in zombie networks. Finally, using a geo-location API from IPinfo DB, the country of origin for the URL was determined. The test was conducted on March 23, 2010. Google trend results for the period of January 1, 2010 to March 22, 2010 were used for searches.

Highlights

  • 59.5% of search results returned by Google had URLs which contained the entire search string in the same exact order.
  • 26.07% of search results returned by Google had URLs which matched SEO poisoning patterns.
  • 14.1% of search results returned by Google had URLs which matched SEO poisoning patterns and contained the entire search string in the same exact order.
  • Only one IP seemed to be involved in spam related activity.
  • Some of the most popular locations for websites returned as search results are: US, Canada, Netherlands, Germany, UK, France, Czech Republic, Australia and Singapore.

Note: 10,559 search results were analyzed.

Percentage of sites from different countries affected by SEO poisoning.

Percentage of sites from different countries affected by SEO poisoning.

Countries which seem to have the highest number of SEO poisoned links indexed by Google:

  • 86.1% of URLs from Singapore based sites.
  • 74% of URLs from Netherlands based sites.
  • 30.5% of URLs from UK based sites.
  • 25.1% of URLs from Germany based sites.
  • 12.6% of URLs from Canada based sites.
  • 12.42% of URLs from US based sites.
Fluctuation in the number of SEO poisoned results.

Fluctuation in the number of SEO poisoned results.

Note the fluctuations in the number of search results which are SEO poisoned.

Conclusion

It is clear that even the world’s most popular search engine company is not secure from SEO poisoning. It is not for the lack of trying though, but instead of the myriad number of ways hackers can break into a website and take advantage of it. We have seen that large numbers of search results match SEO poisoning patterns. Furthermore, it is clear that hackers are injecting malicious URLs into compromised websites to latch onto Google trends.

Report, Security , , ,

Yes, Search Engines Can Infect Your Computer

March 8th, 2010

Search engines, like Google, Yahoo and Bing offer users the ability to scour the plethora of information on the Internet. These search engines index content on websites and often maintain cached copies of these sites so that, in the event that the site is unavailable, visitors can still view the contents of the website.

Unfortunately, the idea of page caching has not been implemented well. In fact, page caching has opened up new opportunities for malware. The primary problem being that, from a security perspective, when search engines cache copies of websites, they are storing any malware that is present on the site on their own infrastructure as well.

Hackers Exploit Search Engine Page Caches

Most large search engines use some kind of malware analysis to determine if a website is compromised or not. Google for example, has a well tuned system with high accuracy. In our meeting with the Google malware team, some months ago, we were glad to find that they were already aware of this problem. In the weeks following our interaction, cached copies of infected websites were no longer easily available via searches.

Not so long ago, we wrote an article about our efforts to alert Yahoo of the presence of malware in the cached versions of various web pages served up by their search engine. Our efforts were not successful, although the occurrence of malware in Yahoo cached pages seems to have gone down significantly. Perhaps our messages were not entirely ignored.

Recently, an article came up on ISC SANS discussing this very same issue.

Recently, we have found instances of Bing serving up malware in their cached pages. It seems that Bing’s malware detection methods are not able to reliably detect malware on cached web pages. This keeps Bing from securing cached pages which contain malware for its users. We have provided screen shots below as an example of the issue. In this particular case, the strain of malware found in Bing cached pages has been around since 2009.

Search Engines Ignore the Problem

Consider the case where a malicious individual deliberately infects a website with malware and Bing (or another search engine) indexes it. The malicious individual can then send out hyperlinks pointing to the cached web pages hosted by Bing. Any kind of “reputation-checking” for the cached link will confirm that the page is hosted by a reputable company, in this case, Bing (Microsoft). However, the malware will still be able to deliver its payload. Just in case you’re thinking, “my antivirus will protect me from the malware on the cached page,” you may like to read this article.

It is surprising to see that search engines like Bing, which claim to implement malware detection, cannot correctly determine if a cached copy of a web page hosts malware! In these cases, Bing ends up an excellent attack vector for malicious individual.

It remains to be seen if search engine companies will continue to serve up cached pages laced with malware at the same time as they are touting active scan and detection mechanisms. Let’s hope this article can get attention in the upper echelons of management at these large search giants and they start to pay attention to this problem.

Screen shots follow below:

Report, Security , , , , , , ,

“Online Pharmacy” Spam Stalks Internet Forums/Boards

January 26th, 2010

Malicious hackers have, for many years, been offering services to unscrupulous individuals and companies for monetary compensation. With the growth of Email Spam advertising everything from medical supplements to cars and lottery tickets, email scrubbers and filters have taken the game up a notch by implementing ever increasing layers of complexity to cut down on such spam. In turn, hackers have started to focus on advertising spam, such as medication and fraudulent scams by compromising web-based message boards and forums.

Hackers employ two basic techniques:

  • Creating large numbers of users on forums. These accounts are then used to post spam on the message boards.
  • Exploiting Web Application vulnerabilities in the software used to run the forum.

Approximately two weeks ago, Lenny Zeltser, from ISC SANS, posted an informative article about online pharmacy ads popping up on message boards. In this vein we have conducted a limited experiment with about 14,000 websites which contain spam announcing online pharmacies.

The aim of the experiment:

  • What percentage of websites which advertise online pharmacies are message boards and Internet forums?
  • What Web Applications, e.g. CMS packages, are used on the message boards that are compromised?

We believe this will provide us with a rough estimate of how focused are hackers toward using message boards and forums on the Internet to advertise spam. From another perspective, it will provide us some idea of how vulnerable websites are if it hosts a message board or forum from being abused by hackers.

Testing methodology:

We have used Google to mine the websites which contain certain keyword patterns such as “buy zocor online”, or “buy brand kamagra online” etc. Once the links suggested by Google were mined, each of the websites was tested against Google’s Safe Browsing List to determine if they had hosted malware (according to Google). Next, an analysis was done to determine if the link(s) mined from Google pointed to a forum or message board. This was done by identifying the presence of multiple strings inside a link. For example, if a link has the keywords “topic”, “view”, “thread” or similar keywords, including characters associated with dynamic page generation, it is probably hosting a message board or forum.

The test was conducted between January 21st and January 23rd, 2010.

Popular software packages installed on compromised forums and message boards.

Popular software packages installed on compromised forums and message boards.

We present the most interesting results below:

  • 47.9% of websites displaying “online pharmacy” spam are message boards and forums.
  • None of the websites advertising “online pharmacy” spam were listed on Google Safe Browsing List.
  • 20.28% of forums displaying “online pharmacy” spam were using Jquery.
  • 15.73% of forums displaying “online pharmacy” spam were using phpBB.
  • 11.54% of forums displaying “online pharmacy” spam were using WordPress.
  • 10.84 % of forums displaying “online pharmacy” spam were using Mootools.

These results and other software packages, helper-scripts, tracking-code are depicted in the graph presented above.

This small experiment shows that a high percentage of websites displaying online spam campaigns are message boards or forums. This indicates that there are many unsecured software installations and older software packages still in use which are often exploited by malicious individuals to post spam. Further, it seems that most sites which were hacked are using jQuery. This supports our previous observations regarding jQuery scripts being used to push malware to unsuspecting visitors.

Read more…

Company, News, Report , , , ,

Website-Reputation Services Agree to Disagree

January 17th, 2010

We have recently published statistics comparing various website reputation services and have received good feedback over private channels regarding our article. In this sequel we add Microsoft’s Bing, malware filter along with comparison to other website reputation services.

At StopTheHacker.com (Jaal LLC) we have conducted tests of 721 URLs, all of which have been reported as malicious by volunteers of various blacklists. We follow a similar format for presentation of results as in the last post.

Website Reputation services: agree to disagree.

Website Reputation services: agree to disagree.

Note: All 721 domains/URLs, were reported as malicious, and were collected from malware.com.br on January 14, 2010. The blue column (maximum 100) indicates the percentage of sites that the website-reputation service correctly identified as unsafe. The orange column (maximum 100) indicates the percentage of sites that the website-reputation services incorrectly identified as safe.

The aim of the test:

  1. Identify the accuracy of the website reputation service
  2. Identify the overlap in terms of safe/unsafe websites

We present the most interesting results in this article. First we detail the parameters of the testing procedure to provide an idea of how the test was set up.

First, 721 URLs were collected from malware.com.br (mbr) on January 14, 2010. These URLs are reported for listing by one or more of the following: individuals, organizations, agencies and software products or services.  For the purposes of this test we assume that all the URLs obtained from the “regular” list on mbr are malicious and hence deemed “unsafe” to visit.

We compare the reputation provided by each website-reputation service and observe how many websites are marked unsafe, safe, untested, maybe-unsafe/caution/potentially-unsafe, and unreachable.

Website-reputation services tested:

Note, that when analyzing a domainname/URL, for checking with the Google safebrowsing API, we have calculated the MD5 hash of the website name to match with the malware hash list. The date that we conducted this test was: January 15, 2010. The list of domain names tested are presented below and a graph representing the statistics for the 721 sites tested is above.

We identify the most interesting results below:

  1. McAfee SiteAdvisor marked 36.75% of domains as Unsafe, 27.18% as Safe, 32.32% as Untested and 3.74% as Potentially-Unsafe.
  2. Norton Safe Web marked 41.75% of domains as Unsafe, 45.49% as Safe, 4.3% as Untested and 8.32% as Potentially-Unsafe.
  3. Google Safe Browsing marked 5.96% of domains as Unsafe, 94.04% as Safe.
    Note: The presence of the hash of the domain name  being tested, on the google malware hash list, is interpreted as “unsafe” while the absence is interpreted as “safe.”
  4. Microsoft Bing marked 0.69% of domains as Unsafe, 34.26% as Safe, and 65.05% as Untested
  5. Comodo SiteInspector marked 0.19% of domains as Unsafe, 95.82% as Safe, and 4.08% as Unreachable.

This follow-up experiment also shows that the variance between website reputation services that are currently being offered by large Internet-services/security companies continues to be very large indeed.

After discussions with representatives of the companies mentioned in this article, and getting a better idea of their behind the scenes methodologies. It seems that these website reputation services will continue to “agree to disagree.” We welcome their comments.

A note on differences between website reputation services:

Some of the services scan pages and some scan parts of a site. Some scan for potential “signs” of an infection, while others scan for the “postmortem” effect of an infection, such as an exploit being launched. Furthermore, the time difference between one of the services testing a web page or site versus when another one tests the same web page can also complicate issues. At StopTheHacker.com we recognize the current limitations of website reputation services that being offered by the industry.

In conclusion, while website reputation services have come a long way, they still have an even longer path to tread in order to become something that users should trust implicitly.

News, Report, Security , , , , , , , ,

Profiling Autonomous Systems Hosting Blacklisted Websites

January 1st, 2010

An Autonomous Systems or AS is a routing construct that represents a group of networks under the control of an organization (credit for edit :Max@badwarebusters.org). These form the “structure” of the Internet. These organizations can be thought of as web-hosting companies, large Internet-based companies or resellers of bandwidth and IP addresses. These are usually large organizations for whom simply getting an Internet connection and hosting a company for their website is not enough.

In recent months, the trend of benign websites being affected by code injection clearly show that attacks to inject malware into unsuspecting websites is on the rise. It is important to understand the profile of the ASes which are actually providing transit to infected websites hosted within their systems. Since each AS provides bandwidth and resources supporting the downloading of malware to computers which belong to unsuspecting visitors of a compromised website. ASes, more specifically hosting companies and other network operators (rather than ASes) should play a pivotal role in addressing compromised websites.

At StopTheHacker.com, we have conducted extensive experiments to analyze and profile over 20,000 ASes to identify which ASes are the worst offenders in terms of hosting Blacklisted websites.  We have used Google safebrowsing data, also accessible via StopBadware.org, (which sources data from Google and Sunbelt)to identify and trend which ASes are responsible for the proliferation of badware on the Internet. We have correlated AS size with data available from CAIDA to determine whether larger ASes are more at fault or not.

We present some brief results below:

  1. The average percentage of blacklisted websites in
    • Top 10 ASes (according to number of sites noted by Google) is 3.5%
    • ASes with Ranks 11-23 (according to number of sites noted by Google) is 3.75%
    • ASes with Ranks 24-40 (according to number of sites noted by Google) is 5.01%
  2. The AS with the highest percentage of blacklisted sites, is AS 16557 (Colo Solutions, Inc.), with close to 60% of 10,000 sites blacklisted.
  3. The Top 50 ASes, which host more than 10,000 sites each and have at least 6% of websites blacklisted, host 151,000 blacklisted sites, combined.

Interesting observations:

  1. AS 16557 (Colo Solutions, Inc.), is well known for popping up on blacklists related to peer-to-peer networks [Is someone tracking P2P users]. It seems that this AS, which is not really concerned about P2P traffic emanating from within its systems, traffic which is potentially used to exchange copyrighted material, is also not interested in paying attention to malware infected websites hosted within its networks.
  2. AS 15169 (Google Inc.), had 590734 sites analyzed and 6046 of them were found to contain malware.
  3. AS 14173 (Photobucket), had zero sites infected out of 399424 sites analyzed.
  4. The Largest AS (Level 3 Communications) according to connection degree, see CAIDA’s AS listing, was hosting 571 infected sites out of 136305 sites analyzed by Google.
  5. AS 7018 (AT&T), was hosting 97 infected sites out of 7947 sites analyzed by Google.
  6. AS 701 (Verizon), was hosting 117 infected sites out of 7248 sites analyzed by Google.
  7. AS 1239 (Sprint), was hosting 117 infected sites out of 3958 sites analyzed by Google.

Making Sense of the Results

Below we present some graphs to highlight the percentage of blacklisted websites hosted by the top few ASes. Note that all AS rankings below are based on the number of websites analyzed by Google. An AS with rank 1 hosts more websites, analyzed by Google than an AS with rank 2.

Read more…

News, Report, Security , , ,

How Good Are Website-Reputation Services?

December 21st, 2009

Websites on the Internet have now become the standard modus operandi for spreading malicious software to infect personal and corporate environments. A large number of benign and well-meaning websites are compromised everyday by hackers inserting malicious code to, in turn, infect the computers used by visitors to the hacked site. One of the ways to combat this is to develop a website reputation mechanism which can warn of potential threats before visiting a compromised site.

Website-reputation services vary wildly in their opinions

Website-reputation services vary wildly in their opinions.

Note that all 350 domains, were reported as malicious, and were collected from malware.com.br on December 18, 2009. The blue column (maximum 350) indicates the number of sites that the website-reputation service correctly identified reported bad sites. The orange column (maximum 350) indicates the number of sites that the website-reputation services incorrectly identified reported malicious sites as safe.

Website reputation services have been around for nearly 5-7 years now. Initially developing as a niche product line which could serve to provide an opinion of a site’s reputation to full fledged offerings which provide advisories about websites, whether they are distributing malware, and if they are, what kind, and using which Autonomous Systems.

At StopTheHacker.com (Jaal LLC) we have conducted tests with 350 domain names, all of which have been reported as malicious by volunteers of various blacklists.

The aim of the test is to:

  1. Identify how accurate the website reputation services are
  2. What is the overlap in terms of safe/unsafe websites

We have found some interesting results which we present in this article. First we detail the parameters of the testing procedure to provide an idea of how the test was set up.

350 URLs were collected from malware.com.br (mbr) on December 18, 2009. These URLs are reported to this website for listing by one or more of the following: individuals, organizations, agencies and software products or services.  We assume for the purposes of this test that all the URLs obtained from the “regular” list from mbr are malicious and hence deemed “unsafe” to visit.

We compare the reputation provided by each website-reputation service and observe how many websites are marked as unsafe, safe, untested, maybe-unsafe/caution/potentially-unsafe, unreachable.

Note, that when analyzing a domain name, for checking with the Google safebrowsing API, we have had to calculate the MD5 hashes of the website names to match with the malware hash list. The date that we conducted this test was: December 21, 2009. The list of domain names tested are presented below and a graph representing the statistics for the first 350 sites tested is above.

We have identified some of the most interesting results below:

  1. McAfee Siteadvisor marked 32.5% of Domains as Unsafe, 22% as Safe, 43% as Untested and 1.7% as Potentially-unsafe.
  2. Norton Safeweb marked 50.86% of Domains as Unsafe, 43.71% as Safe, 2.29% as Untested and 3.14% as Potentially-unsafe.
  3. Google SafeBrowsing marked 10.86% of Domains as Unsafe, 89.14% as Safe. Note: the presence of the hash of the domain name  being tested, on the google malware hash list, is interpreted as “unsafe” while the absence in interpreted as “safe”.
  4. Comodo Siteinspector marked 0.29% of Domains as Unsafe, 98.86% as Safe and 0.86% as Unreachable. Note: after feedback from Comodo, a retest was conducted, accuracy changed from 0.29% -> 1.2%.

This limited test is a first step towards showing how much variance there is website reputation services that are currently being offered by large Internet-services/security companies. To highlight this point we present immediately below the relatively few domains (~6% of the total domains tested) that were marked as bad by all three major services, Norton, McAfee, and Google.

In brief:

  • 6% of domains tested were marked as “unsafe” by all 3, McAfee, Norton and Google
  • 10% of domains tested were marked as “unsafe” by Norton and Google
  • 22% of domains tested were marked as “unsafe” by Norton and McAfee
  • 5.7% of domains tested were marked as “unsafe” by Google and McAfee

Update: December 28, 2009

After receiving helpful feedback from representatives at Comodo, we were informed that Comodo’s service could provide more accurate answers if complete web page locations were checked instead of just the domain name. We followed the advice and saw a definite increase in Comodo’s accuracy. Comodo marked 1.2% of the website/pages as malicious. Prior to this re-test, the same service marked 0.2% of the websites as unsafe. The graph at the beginning of this article does not represent the results of this re-test.
Read more…

News, Report, Security , , , , , , ,