Wolfram Library Archive

Courseware Demos MathSource Technical Notes
All Collections Articles Books Conference Proceedings
Title Downloads

Gazing at New Paradigms in Data Mining

Ariel Sepúlveda
Organization: Pronto Analytics, Inc

2005 Wolfram Technology Conference
Conference location

Champaign IL

The author will present his experiences using Mathematica as the main source of data analysis in a working environment where the IT or data mining (DM) paradigms do not include Mathematica as an alternative to data processing, analysis or as a reporting tool. Several graphical examples of the work done using Mathematica will be presented along with supporting evidence of the effectiveness and efficiency of the work done.

An example of using Mathematica for performing Market Basket Analysis in a large data set will be used to compare the traditional approach to DM that has been established by statistical software versus the use of Mathematica as an integrated system capable of reading from data bases, perform efficient and sophisticated data transformations and analysis, and produce customized elegant outputs in several formats for easiness of interpretation of results, and even create reporting systems in many output formats. DM paradigms regarding the access and use of data will be evaluated. For example, certain algorithms in statistical software often require huge input data sets to be preprocessed to conform a determined format such that the software can process the data. This process usually takes one person to do the data transformation and another one to receive it and do the analysis using the statistical software. This process is lengthy and inefficient. The author will share how an equivalent but much more efficient method can be used by taking advantage of the many data transformation functions in Mathematica.

The author will also make simple evaluations on the need to run SQL statements to transform data in a data mart, versus bringing the data to the local computer and make the transformations in Mathematica. The results will show comparisons on the complexity of the relations in the data mart and the speed and the data transfer process. These comparisons will be presented in the context of the following questions: Why are the standard data mining software so popular? Is it because they are better, or because they are already a standard?

*Applied Mathematics > Information Theory
*Mathematics > Probability and Statistics

Ariel_Sepulveda_WTC_2005.zip (728.7 KB) - ZIP archive