foldParallel() doesn't work as inject(). Is that correct?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

foldParallel() doesn't work as inject(). Is that correct?

Mario Garcia

Let's say I have a text file with the following information

JANUARY     100.23
FEBRUARY    23.34
MARCH       45.56
APRIL       67
MAY         78.2
JUNE        23.3
JULY        92.2
AUGUST      802.2
SEPTEMBER   87.3
OCTOBER     2.2
NOVEMBER    3.2
DECEMBER    150.4

I want to read the file and populate a given map with some information. I came up with the following solution using inject(...)

def firstQ = ['JAN','FEB','MAR','APRIL']
def secondQ = ['MAY','JUN','JUL','AUG']
def thirdQ = ['SEP','OCT','NOV','DEC']

def file2Process = new File("/pathtoFile/file.txt")
def data = file2Process.withReader{r->
    r.inject([q1:0,q2:0,q3:0,lines:0,total:0]){map,val->
        def lineInfo = val.split(/\s{1,}/)
        def month = lineInfo?.getAt(0)?.take(3)
        def amount = lineInfo?.getAt(1)?.toBigDecimal()
        switch(month){
            case firstQ:
               map.q1 += amount
            break
            case secondQ:
               map.q2 += amount
            break
            case thirdQ:
               map.q3 += amount
            break
        }
        map.total += amount
        map.lines++
     /* Don't forget to return the map */
        map
    }
})

Using Groovy 2.1.0 I couldn't make it work using Gpars' foldParallel method. It doesn't seem to work the same way inject() does. Is that correct?

This isn't a very complex sample, but I need to parse a really big file and apply some computational algorithms to every line, that's why I thought about using Gpars. 

Is there any other way to accomplish that using the parallel methods? 

Thanks
Mario

Reply | Threaded
Open this post in threaded view
|

Re: foldParallel() doesn't work as inject(). Is that correct?

Vaclav
Administrator
Hi Mario,

you say you're planning to process large files, hence using parallel collections is not the best choice, IMHO, since for parallel collections to work they need all the elements populated into the memory first.
I would rather model the process around the dataflow abstraction, with a reader process feeding the elements into a DataflowQueue, which is then read by a group of processing operators. At the end of the computation the operators would share and combine their local statistics.

Vaclav




On Fri, Apr 19, 2013 at 4:23 PM, Mario Garcia <[hidden email]> wrote:

Let's say I have a text file with the following information

JANUARY     100.23
FEBRUARY    23.34
MARCH       45.56
APRIL       67
MAY         78.2
JUNE        23.3
JULY        92.2
AUGUST      802.2
SEPTEMBER   87.3
OCTOBER     2.2
NOVEMBER    3.2
DECEMBER    150.4

I want to read the file and populate a given map with some information. I came up with the following solution using inject(...)

def firstQ = ['JAN','FEB','MAR','APRIL']
def secondQ = ['MAY','JUN','JUL','AUG']
def thirdQ = ['SEP','OCT','NOV','DEC']

def file2Process = new File("/pathtoFile/file.txt")
def data = file2Process.withReader{r->
    r.inject([q1:0,q2:0,q3:0,lines:0,total:0]){map,val->
        def lineInfo = val.split(/\s{1,}/)
        def month = lineInfo?.getAt(0)?.take(3)
        def amount = lineInfo?.getAt(1)?.toBigDecimal()
        switch(month){
            case firstQ:
               map.q1 += amount
            break
            case secondQ:
               map.q2 += amount
            break
            case thirdQ:
               map.q3 += amount
            break
        }
        map.total += amount
        map.lines++
     /* Don't forget to return the map */
        map
    }
})

Using Groovy 2.1.0 I couldn't make it work using Gpars' foldParallel method. It doesn't seem to work the same way inject() does. Is that correct?

This isn't a very complex sample, but I need to parse a really big file and apply some computational algorithms to every line, that's why I thought about using Gpars. 

Is there any other way to accomplish that using the parallel methods? 

Thanks
Mario




--
E-mail: [hidden email]
Blog: http://www.jroller.com/vaclav
Linkedin page: http://www.linkedin.com/in/vaclavpech