best practices on choosing concurrency paradigm

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

best practices on choosing concurrency paradigm

Stefan Armbruster
Hi,

consider the following use case:

* read and parse multiple csv large (GB range) files
* take some operation on each row. The operation is side-effect free,
so applying parallelism should be easy
* the output of the per-row operations should be further processed to
a single-threaded consumer

Which of the multiple concurrency paradigms offered by gpars would you
prefer for this kind of scenario and why?

Cheers,
Stefan

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply | Threaded
Open this post in threaded view
|

Re: best practices on choosing concurrency paradigm

Vaclav
Administrator
Hi Stefan,

I'm happy to see you looking into GPars ;-)

What you describe is a model use-case for dataflow processing - a (most likely) sequential file reader (perhaps a dataflow task) feeding the rows into a DataflowQueue, consumed by a group of DataflowOperators, which share a thread pool of certain size, feeding their output into a DataflowQueue consumed by another operator doing the final polish.
So dataflow is the answer. You may also consider some way of throttling the file reader to avoid too much data in memory - either using SyncDataflowQueues or KanbanFlow.

It may be tempting to use parallel collections, which, however, require all the data to fit into the memory. Also, since the algorithm cannot start before all the data is in memory, you unproductively waste the time the file reader is loading the data.

Best regards,

Vaclav

 


On Mon, Jul 15, 2013 at 4:48 PM, Stefan Armbruster <[hidden email]> wrote:
Hi,

consider the following use case:

* read and parse multiple csv large (GB range) files
* take some operation on each row. The operation is side-effect free,
so applying parallelism should be easy
* the output of the per-row operations should be further processed to
a single-threaded consumer

Which of the multiple concurrency paradigms offered by gpars would you
prefer for this kind of scenario and why?

Cheers,
Stefan

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email





--
E-mail: [hidden email]
Blog: http://www.jroller.com/vaclav
Linkedin page: http://www.linkedin.com/in/vaclavpech
Reply | Threaded
Open this post in threaded view
|

Re: best practices on choosing concurrency paradigm

Russel Winder-3
In reply to this post by Stefan Armbruster
On Mon, 2013-07-15 at 16:48 +0200, Stefan Armbruster wrote:

> Hi,
>
> consider the following use case:
>
> * read and parse multiple csv large (GB range) files
> * take some operation on each row. The operation is side-effect free,
> so applying parallelism should be easy
> * the output of the per-row operations should be further processed to
> a single-threaded consumer
>
> Which of the multiple concurrency paradigms offered by gpars would you
> prefer for this kind of scenario and why?
Architecturally (at the abstract level) this is a one-level scatter
gather, so a simple parallel reduce. Of course this presupposes the
whole file can be in memory at once. Given the size, this seems
unlikely. In another email Václav has proposed using dataflow as a way
of realizing an explicitly managed version of a parallel reduce: single
sequential reader offering tasks per row to workers that then send
results to an accumulator operator that undertakes the reduction. Sounds
eminently workable to me, and should result in somethjng as efficient as
a given machine can manage.
>

--
Russel.
=============================================================================
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:[hidden email]
41 Buckmaster Road    m: +44 7770 465 077   xmpp: [hidden email]
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

signature.asc (205 bytes) Download Attachment