Oct 13 2010

3 ways for splitting a collection into pages with Linq

Category: C# — Duke @ 18:27
There are situations when you have a collection of objects (IEnumerable<T>) and you need to split it in chunks or pages so to process these resulting pars in different moments.
There are a lot of way to achieve this but the nicer is probably using Linq (it makes also your code looks a lot more modern and cool!)
the focus of this post is to discuss some of these ways and it’s pros and cons.
Digging around the web I've found this first example ( unfortunately I've lost the link, anyway the code in this first sample is not mine)
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> enumerable, int chunkSize) 
{ 
    var total = enumerable.Count(); 
    for (int i = 0; i * chunkSize < total; i++) 
    { 
        yield return enumerable.Skip(i * chunkSize).Take(chunkSize); 
    } 
}
 
This allows you to write code like:
foreach(var page in myCollection.Chunk(30))
{
    foreach(var item in page)
    {
         process(item);
    }
}
anyway, this work well in limited scenarios, when the amount of record are quite low. To prove it imagine to have a collection with 1 million item in it, and you want to split it in chunk of 1000 elements each.
the Chunk method will first skip zero element (hey pretty fast!) and then take 1000. the total amount of time is c, given c the size of the chunk.
but the next round the method have to skip the first 1000(as they are already taken) and then take 1000. Hum… still not so bad as the first 1000 was already taken and then we have to process the next 1000.
third round…. skip 2000 this time… and then take 1000; ok but we have skipped 0 the first time, 1000 the second time and 2000 the third time… if we continue this way at the 10th page we will have to skip 0, 1000, 2000, 3000, 4000 … 9000! oh no, this means that at the kth page we will need to skip
((k * (k-1))/2 ) * c;
Old friend Gauss appears again! plus of course (k * c) for the take part.
that’s so bad, as you can see, we have a term that grows as O(k^2) and you for sure don’t want to have such complexity if your collection have 1 million of elements… moreover, one usually want to split thinks in chunks when the number is important, so exactly our scenario… therefore this algorithm is quite poor in a general situation: we are not going to use it in our programs, isn’t it?
So, let’s move forward and try to find something better: we have to solve the problem of restarting every time from the beginning, let’s try some caching, it usually works,right?
 
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> enumerable, int chunkSize) 
{ 
    var total = enumerable.Count(); 
    var current = enumerable; 

    for (int i = 0; i * chunkSize < total; i++) 
    { 
        yield return current.Take(chunkSize); 
        current = current.Skip(chunkSize); 
    } 
}

Tags: