Car Accidents, Investment Analysis and other Machine Learning News

BigML News, Issue #2, March 2013

Hi <<Full Name>>,

There are many exciting updates at BigML that we want to share with you:

Donaldson Capital Management (DCM) has been an Investment Advisory firm for almost 20 years. With a total portfolio of over $600 Million and historical performance that beats the S&P 500 index, DCM knows that it takes data-driven decisions to be successful in the investment business. Greg Donaldson, DCM's founder, has been using BigML to analyze S&P 500 companies' performance and he came to some interesting conclusions. "Dividend growth is a better predictor of winning stocks if the growth is consistent and persistent. [...] Over the last 5-6 years, a time as chaotic as any we have seen since the 1930s, the winning recipe in the stock market has been dividend-related." This analysis has confirmed the dividend centric investment strategy already in place at Donaldson Capital Management. DCM has many data analysis tools, like Bloomberg and Value-Line. But using BigML on top of these tools presents new insights and enables new data mining possibilities. You can read more on the insights Greg found.

When car accidents happen, a lot of data is gathered, especially when the crash ended in a fatality. The US National Highway Traffic Safety Administration keeps track of these accidents and keeps a ‘Fatality Analysis Reporting System’. This system is a rich source of data for all kinds of analytics. We have transformed some of that data into a more readable dataset. We used it to create a series of models. ‘Car accident injury types’ shows you the injury severity of the various participants in an accident. We have made the prepared data available for you at the Windows Azure Data Marketplace.

Your model is based on the training data you supplied. As you can expect, it will do a decent job predicting the cases that you fed into it. Overfitting occurs when a model does too good a job: it branches out into very specific cases that fit your training data well, but that are useless when you use the model on new data. Pruning is a technique used to prevent overfitting. Common practice to know if your model is overfitted is to test (or evaluate) it with data that you have not used for training. More on overfitting and other machine learning basics in “Everything you wanted to know about machine learning, but were too afraid to ask” part one and two.

Some trees are like the proverbial haystack: new insights can be difficult to find amidst of all the nodes, branches and confidences. We added some new filters to help you locate points of interest in your tree. For instance, there’s the confidence filter (or error filter for a regression tree). If you only want to see nodes that have 90%+ confidence, you slide the filter to 90%. The tree visualization is reduced to the 90%+ confidence part of the tree. Likewise with support: show me that part of the tree that has a specific level of instances flowing through it’s nodes. There are even buttons to help you find interesting, rare spots in the tree. Read more about it here.

Two months ago we introduced our command line tool BigMLer. It will let you create and process predictive models with even greater ease of use. Recently we added two important members to the BigMLer family: evaluations and ensembles. With BigMLer, it only takes one command to create a source from your data, create a dataset, create a model using 80% of your dataset and create an evaluation of your model using 20% of your dataset! All in one simple statement. Easy as that. Read more about it and how to create ensembles using BigMLer.

BigML is pleased to introduce our Early Adopter Program, which we are launching to stay close to a select number of active BigML users. Our goal is to learn from how you are using our service and to get your feedback on how we are addressing your predictive modeling needs. Customers that are selected to take part in the program will be granted free access to our service and early access to new features and functionality. If you would like to apply for the Early Adopter Program, please drop us a line at eap@bigml.com.

Our mailing address is:

BigML, Inc

2851 NW 9th Street

Suite D, Conifer Plaza Building

Corvallis, Oregon 97330

Add us to your address book