MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Speed Up of Calculations on Large Lists

  • To: mathgroup at smc.vnet.net
  • Subject: [mg108881] Re: Speed Up of Calculations on Large Lists
  • From: Ray Koopman <koopman at sfu.ca>
  • Date: Mon, 5 Apr 2010 08:02:36 -0400 (EDT)

Zach,

The point I was trying to make was that inefficiencies in the code
that was wrapped around MovingAverage were costing substantially more
than compiling was saving.

Ray

----- Zach Bjornson <bjornson at mit.edu> wrote:
> Ray,
> 
> Critical statement there is "under your test conditions." I played with
> Stefan's problem for quite a while and came up with a few moving average
> functions, and tried them all with and without compiling. His function
> in particular was only 15% slow compiled/uncompiled on my computer with
> his data set. The functions I came up with were usually faster when
> compiled, depending on the data set. Also depending on the data set,
> some were faster than the built-in MovingAverage function. They were
> never faster than the inbuilt function with his data set however, so I
> never sent my functions along. Since this came up though, my futzing is
> below.
> 
> My initial response to Stefen's inquiry was the thought that Compile
> would have no effect on MovingAverage, or would just add kernel time
> while Mmeca decides to execute it with normal Mathematica code, but I'm
> not sure that's true.
> 
> -Zach
> 
> (*data-set dependencies are illustrated between the top and bottom half 
> of this*)
> 
> $HistoryLength=0 (*to prevent artificially high speeds*)
> 
> 1.1 Your function
> movAverageOwn2FCorig =
>   Compile[{{dataInput, _Real,
>      1}, {days, _Integer}, {length, _Integer}},
>    N[Mean[dataInput[[1 + # ;; days + #]]]] & /@
>     Range[0, length - days, 1]]
> 
> In[165]:=
> First@Timing[
>     Do[movAverageOwn2FCorig[Range[1000000], 2, 1000000];, {10}]]/10
> Out[165]= 1.7347
> 
> 1.2 Inbuilt Mathematica function
> In[164]:= First@Timing[Do[MovingAverage[Range[1000000], 2];, {10}]]/10
> Out[164]= 1.6942
> 
> 1.3 My variation #1
> movAverageOwn2FCa =
>   Compile[{{dataInput, _Real, 1}, {days, _Integer}},
>    Table[Mean[dataInput[[i ;; i + days - 1]]], {i,
>      Length@dataInput - days + 1}]]
> 
> In[166]:=
> First@Timing[Do[movAverageOwn2FC[Range[1000000], 2];, {10}]]/10
> Out[166]= 1.6146
> 
> Non-compiled function version gives a time of 4.0311
> for this same data set.
> 
> 1.4 My variation #2
> movAverageOwn2Fb =
>   Compile[{{dataInput, _Real, 1}, {days, _Integer}},
>    With[{innerdata = Partition[dataInput, days, 1]},
>     Table[Mean[innerdata[[i]]], {i, Length@innerdata}]
>     ]]
> 
> In[167]:=
> First@Timing[Do[movAverageOwn2F3[Range[1000000], 2];, {10}]]/10
> Out[167]= 1.6287
> 
> Note that this *is* data-set dependent... for example, the same 
> functions tested on your data symbol give:
> In[169]:= First@Timing[Do[MovingAverage[data, 2];, {10}]]/10
> 
> Out[169]= 0.0015
> 
> In[170]:= First@Timing[Do[movAverageOwn2Fa[data, 2];, {10}]]/10
> 
> Out[170]= 0.0171
> 
> In[171]:= First@Timing[Do[movAverageOwn2Fb[data, 2];, {10}]]/10
> 
> Out[171]= 0.0156
> 
> In[173]:=
> First@Timing[Do[movAverageOwn2FCorig[data, 2, Length@data];, {10}]]/10
> 
> Out[173]= 0.0171
> 
> On 4/4/2010 7:45 AM, Ray Koopman wrote:
>> Your compiled movAverageC takes 25% more time than the uncompiled
>>
>> movAv[data_, start_, end_, incr_] := Transpose@PadRight@Join[{data},
>>        Table[MovingAverage[data, r], {r, start, end, incr}]]
>>
>> under your test conditions.
>>
>> On Apr 1, 3:59 am, sheaven<shea... at gmx.de>  wrote:
>>    
>>> Hello everyone!
>>>
>>> I am new to Mathematica and try get a understanding of its power. I
>>> plan to use Mathematica mainly for financial data analysis (large
>>> lists...).
>>>
>>> Currently, I am trying to optimize calculation time for calculations
>>> based on some sample data. I started with with a moving average of
>>> share prices, because Mathematica already has a built in moving
>>> average function for benchmarking.
>>>
>>> I know that the built-in functions are always more efficient than any
>>> user built function. Unfortunately, I have to create functions not
>>> built in (e.g. something like "moving variance") in the future.
>>>
>>> I have tried numerous ways to calc the moving average as efficiently
>>> as possible. So far, I found that a function based on Span (or
>>> List[[x;;y]]) is most efficient. Below are my test results.
>>> Unfortunately, my UDF is still more than 5x slower than the built in
>>> function.
>>>
>>> Do you have any ideas to further speed up the function. I am already
>>> using Compile and Parallelize.
>>>
>>> This is what I got so far:
>>>
>>> 1. Functions for moving average:
>>>
>>> 1.1. Moving average based on built in function:
>>>
>>> (*Function calcs moving average based on built in function for
>>> specified number of days, e.g. 30 days to 250 days in steps of 10*)
>>> movAverageC = Compile[{{inputData, _Real, 1}, {start, _Integer}, {end,
>>> _Integer}, {incr, _Integer}}, Module[{data, size, i},
>>>     size = Length[inputData];
>>>     Transpose[Join[{inputData}, PadRight[MovingAverage[inputData, #],
>>> size]&  /@ Table[x, {x, start, end, incr}]]]
>>>     ]
>>>    ]
>>>
>>> 1.2. User defined function based on Span:
>>> (*UDF for moving average based on Span*)
>>> movAverageOwn2FC = Compile[{{dataInput, _Real, 1}, {days, _Integer},
>>> {length, _Integer}},
>>>    N[Mean[dataInput[[1 + # ;; days + #]]]]&  /@ Range[0, length - days,
>>> 1]
>>> ]
>>>
>>> (*Function calcs moving average based on UDF "movAverageOwn2FC" for
>>> specified number of days, e.g. 30 days to 250 days in steps of 10*)
>>> movAverageOwn2C = Compile[{{dataInput, _Real, 1}, {start, _Integer},
>>> {end, _Integer}, {incr, _Integer}}, Module[{length},
>>>     length = Length[dataInput];
>>>     Transpose[Join[{dataInput}, PadRight[movAverageOwn2FC[dataInput, #,
>>> length], length]&  /@ Range[start, end, incr]]]
>>>     ]
>>>    ]
>>>
>>> 2. Create sample data:
>>> data = 100 + #&  /@ Accumulate[RandomReal[{-1, 1}, {10000}]];
>>>
>>> 3. Test if functions yield same results:
>>> Test1 = movAverageC[data, 30, 250, 10]; (*Moving average for 30 days
>>> to 250 days in steps of 10*)
>>>
>>> Test2 = movAverageOwn2C[data, 30, 250, 10]; (*Moving average for 30
>>> days to 250 days in steps of 10*)
>>>
>>> Test1 == Test2
>>> Out = True
>>>
>>> 4. Performance testing (Singe Core):
>>> AbsoluteTiming[Table[movAverageC[data, 30, 250, 10], {n, 1, 20, 1}];]
>>> (*Repeat function 20x for testing purposes*)
>>> Out = {1.3030000, Null}
>>>
>>> AbsoluteTiming[Table[movAverageOwn2C[data, 30, 250, 10], {n, 1, 20,
>>> 1}];] (*Repeat function 20x for testing purposes*)
>>> Out = {11.4260000, Null}
>>>
>>> =>  Result UDF 9x slower
>>>
>>> 5. Performance testing (multi core):
>>> LaunchKernels[]
>>>
>>> Out = {KernelObject[1, "local"], KernelObject[2, "local"]}
>>>
>>> DistributeDefinitions[data, movAverageOwn2C, movAverageOwn2FC,
>>> movAverageC]
>>>
>>> AbsoluteTiming[Parallelize[Table[movAverageC[data, 30, 250, 10], {n,
>>> 1, 20, 1}]];]
>>> Out = {1.3200000, Null}
>>>
>>> AbsoluteTiming[Parallelize[Table[movAverageOwn2C[data, 30, 250, 10],
>>> {n, 1, 20, 1}]];]
>>> Out = {6.7170000, Null}
>>>
>>> =>  Result UDF 5x slower
>>> Very strange that the built in function does not get faster with
>>> Parallelize
>>>
>>> I would very much appreciate any input on how to decrease calculation
>>> time based on the user defined function.
>>>
>>> Many thanks
>>> Stefan
>>>      
>>    
> 


  • Prev by Date: Re: Speed Up of Calculations on Large Lists
  • Next by Date: Mathematica Programming
  • Previous by thread: Re: Speed Up of Calculations on Large Lists
  • Next by thread: Re: Speed Up of Calculations on Large Lists