Wolfram Library Archive


Courseware Demos MathSource Technical Notes
All Collections Articles Books Conference Proceedings
Title Downloads

A New Application for Data Mining and Analytics
Author

Ariel Sepúlveda
Organization: Pronto Analytics, Inc
Conference

2007 Wolfram Technology Conference
Conference location

Champaign, IL
Description

Abstract

Data analysts always need quick answers to their questions but in many cases answers are not ready when needed to support decision-making processes. Thus, important questions remain unanswered due to the lack of an efficient data analysis software. For this reason, when evaluating data analysis software analysts are usually interested in two main characteristics:
  1. Capabilities--can this software do what I need to do?
  2. User friendliness--how difficult is the experience getting to the desired results? How much do I need to learn to get where I want to get, or, after I made my analysis, how simple is it to repeat the same algorithm over a different dataset?
Unfortunately, most available software is closer to one of two opposite extremes: wonderful user experience with a modest analysis capabilities, or poor user experience with a great number of capabilities. Thus, analysts are usually in the position of choosing a solution in one of these two extremes, or in an in-between point. As an attempt to provide a solution to this dilemma, this presentation is about MXL Plus: a data analysis software designed to be simple to use, powerful, and flexible.

Among the most significant characteristics of the software are:
  1. Highly intuitive user interface.
  2. No software limits on dataset size.
  3. Powerful number cruncher.
  4. Analyzes datasets of numbers, strings, graphics, symbols, etc. (e.g. simple text mining capabilities)
  5. Highly efficient on categorical data analysis.
  6. Import data and merge many datasets by combining any of the following: databases, xls, csv, txt, etc.
  7. Exports output to csv, xls, gif, Metafile, etc.
The software has the following capabilities:
  1. A data importing interface for uploading data or to merge datasets from many data sources: databases, Excel, text files, etc. When importing from databases, the user can create and continuously updates a selectable and editable list of SQL commands. When importing from Excel, the user can preview any of the spreadsheets in the workbook and define the desired region of interest to be imported. After any dataset is imported, the system will automatically assess or characterize the dataset by providing for each field information like: data type(s), missing elements, total elements, a frequency distribution for the top elements, and for numeric variables the mean, standard deviation, min, max, histogram, etc.
  2. An internal spreadsheet for data visualization and manipulation (calculate new fields, summarize data, query data, sort, etc.) where you can define your set of queries, procedures, and equations and save them as an analysis template for future use in different datasets. Current spreadsheet analysis can be duplicated on levels of categorical variables by just clicking the desired categorical variable(s). The contents of a cell in the spreadsheet can be a number, a string, a list of data, a graphic, a GUI-controlled object, or any other Mathematica object.
  3. Several analytics applications where you can calculate and plot data summaries as defined by levels of the selected categorical variables. Data summaries can be represented as cross tabs of any of the following formats: bar charts, pareto plots, box and whisker plots, confidence intervals and line plots. Other analyzes include histograms, scatter plots, statistical control charts, hypothesis tests, etc. All these analyzes can be shaped by the user with many options like image size, aspect ratio, colors, analysis variable, categories, etc., and then store all definitions as an analysis template for future use with other datasets. The software also contains a module specialized in determining and plotting associations found in one or more analysis variables (market basket analysis).
  4. A report scheduler where you can set up tasks to produce real-time or other scheduled reports. This report scheduler can be run on Mathematica or from webMathematica. Stacks of jobs can be prioritized such that when competing for computing resources, jobs with highest priorities are executed first. Also, in the case of a system crash (e.g. power failure), a file containing the queue of active jobs will remain available for automatic execution after the system is reestablished.
In sumary, MXL Plus is designed to complement Mathematica with the best data manipulation and analysis capabilities found in spreadsheets and database languages.
Subject

*Wolfram Technology
URL

http://www.wolfram.com/news/events/techconf2007/
Downloads Download Wolfram CDF Player

Download
NewApplicationForDataMiningAndAnalytics.nb (2 MB) - Mathematica Notebook [for Mathematica 6.0]