Wolfram Library Archive


Courseware Demos MathSource Technical Notes
All Collections Articles Books Conference Proceedings
Title Downloads

MXLPlus: Data Mining eXtensible Library
Author

Ariel Sepúlveda
Organization: Pronto Analytics, Inc
Conference

2006 Wolfram Technology Conference
Conference location

Champaign IL
Description

dMXLPlus (MXL) is an application designed to simplify and improve the use of Mathematica for doing data analysis by (a) providing functions with the capability of operating over labeled data sets to emulate and improve spreadsheet or database type of operations, (b) providing new methods and graphical tools for data analysis, and (c) providing functions for scheduling tasks such that Mathematica can be easily transformed into a live data analysis and reporting system or a real-time process control and feedback environment.

Simplified operations for labeled data sets Database languages and other data analysis software allow for operations with field names like “sales”="price”*“quantity”. The functions developed in this application are mainly designed to allow users to do data operations by making reference to their column or field name instead of its position in the target data set. This means that the user will keep all known functionalities of Mathematica but in an environment specially designed for simplifying data analysis. This way of expressing operations is easy to understand. More importantly, it makes programs more readable and, thus, easier to debug or modify specially if a considerable amount of time has elapsed since the first time the program was created and the time it will be modified. Another advantage is that the code generated with these functions is dynamic in the sense that adding or deleting fields from the source data set analyzed does not affect the code generated with MXL functions because of the field-wise reference to data. In summary, functions like MapMXL, SelectMXL, SplitMXL, TakeMXL, DropMXL, and PartMXL are modifications of functions to allow the use of field names in their attributes instead of numerical references to column or row positions. More important, MXL functions seamlessly operate over rectangular data sets, or over multirectangular data sets. For example, a data set representing sales can easily be partitioned by “account_number”, and in the same operation, field-name-referenced analyses performed for the partitioned data set are automatically performed for all account numbers.

Plus... new methods and graphical tools for data analysis and report generation Several functions like PivotAnalysis, MarketBasketAnalysis, SortByColumnMXL, TakeFractionMXL, TransformVariablesMXL, DefineIndexMXL, SelectMembersMXL, and SelectDistinctMXL, TakeRandomSample, TakeFractionMXL have been designed to provide simplified data analysis capabilities. In particular PivotAnalysis is a versatile function for analyzing and reporting from data sets with categorical fields. PivotAnalysis filters data by setting conditions on its fields, and produces several types of plots for the resulting filtered data (BoxPlot, BarChart, ListPlot, PieChart, etc.). Similarly, all MXL functions are capable of automatically operating either over rectangular data sets or over multi-rectangular data sets defined by splitting data by the elements of one of its categorical fields. For example, if a data set has been split by one categorical field such that each subset belongs to the each different instance of the elements of that categorical field, then the entire split data set can be analyzed as if it were a rectangular data set, but now with the advantage of getting the corresponding results to each data subset. Additionally prioritization, multi-intervals, and MBA or cross relationships.

Plus... Scheduling functionality Finally, MXL provides scheduling functions that transform the Mathematica environment from a static to a dynamic one by allowing users to schedule tasks to be performed at any predetermined times and frequencies. For example, users can easily create a multitasks system that (a) monitors a production process in real time (data analysis and graphical representations like several control charts), (b) creates hourly reports summarizing key information from several data sources, (c) creates other high-level summary analysis and reports at desired frequencies (e.g., daily, weekly, monthly, and quarterly).

Plus... users are still inside Mathematica and thus, you still get all its power for the following: - programming, graphics, string manipulation, symbolic processing, etc. - MXL handles mixed data sets of numbers, strings, symbols, graphics, etc. - MXL functions have been carefully designed to provide excellent processing speed. - More readable, and at the same time, even more condensed code is generated.
Subject

*Mathematics > Probability and Statistics
Keywords

Data mining
Downloads Download Wolfram CDF Player

Download
TechConf2006_ArielSepulveda.nb (4.5 MB) - Mathematica Notebook [for Mathematica 5.2]