Jewel or Junk? (Data Balancing and Outlier Detection with DataModeler) -- from Wolfram Library Archive

Products

Consulting & Solutions

Learning & Support

Company

Wolfram|Alpha

Enable JavaScript to interact with content and submit forms on Wolfram websites. Learn how

Title

Jewel or Junk? (Data Balancing and Outlier Detection with DataModeler)

Author

Mark Kotanchek

Organization:

Evolved Analytics, LLC

Conference

International Mathematica User Conference 2008

Conference location

Champaign, IL

Description

Lots of variables. Not much in the way of data. The data variables are correlated and coupled. Truth is unknown. What does it all mean?

This is a common situation for many real-world data sets. . In this talk, Mathematica's CountryData[ ] is used as a surrogate for typical social science data to demonstrate how DataModeler's unique capabilities can be used to efficiently and effectively attack such data. Although basic SymbolicRegression and data exploration features will be briefly reviewed, the emphasis will be recently added capabilities:

Data Balancing--which records and attributes contain the most information? The SMITS algorithm lets us explore the information content of data sets and identify key data records. This lets us quickly build balanced data sets from real-world data which leads to improved and empirical models as well as faster model development.

Variable Selection--which variables count from a nonlinear perspective? Symbolic regression is attractive because it naturally handles correlated inputs and pull out the driving variables and present them without artificial constraints and assumptions. In other words, the data tells us which inputs and input combinations are important.

Niching--what is the trade-off on number of variables or variable combinations? The ability to explore the trade-off between the number of variables or possible variable combination leads to increased insight into the underlying mechanisms in the modeled system.

Outlier Detection--outliers are either the most interesting or nuisance nuggets of data. However, identifying outliers is difficult in multivariate data sets--especially when the outliers are inliers. Although the data balancing processing can identify some outliers, model-based outlier detection lets us quickly identify the data records which warrant special attention.

Subject

Wolfram Technology

URL

http://www.wolfram.com/news/events/userconf2008

Downloads

JewelOrJunk_Abstract.nb (254.6 KB) - Mathematica Notebook

JewelOrJunk_Presentation.nb (6.6 MB) - Mathematica Notebook

WOLFRAM