reasonable to use GPars to split one massive sql query into n parallel ones ?
This post has NOT been accepted by the mailing list yet.
Hi, Ive been reading lots of the GPArs docs and samples but am yet to find a straight forward example of this..
My problem is the following : I have query that runs in a jms listener. Its a BIG query, it takes hours to run. it involves finding top 3 similar users to each other user in my schema and then storing them somewhere - a bit like getting the results for Twitters 'Suggested people to follow'
Im not using hadoop or any of that jazz yet, merely a hibernate session on top of a datasource to a mysql single database.
So, Is it possible or reasonable to use Gpars to break my query into n smaller queries - each on a subset of the database (i.e. each query having a 'where id>= :minId and id<=:maxid ), have all these queries execute in parallel and then do a reduce on the multiple resultsets that each query returns ? or does this actually just put the same load onn the db at the end of the day ?