In this short guide, I'll show you how to iterate over chunks of data in Python. We will cover 3 examples showing how to iterate over chunks.

itertools + iter()

To iterate a list in chunks in Python we can use itertools. We split the list in chunks of a specific size using the itertools module and a while loop:

import itertools
my_iterator = iter([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
chunk_size = 3

while True:
	chunk = list(itertools.islice(my_iterator, chunk_size))
	if not chunk:
    	break
	print(chunk)

result:

[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
[10]

Where:

  • create an iterator my_iterator with a list of integers
  • sets chunk_size to 3
  • while loop is used to iterate over the iterator in chunks of 3
  • itertools.islice() creates a slice object
    • to get the next chunk_size elements from the iterator.
  • list() converts the slice object to a list of elements in the chunk.
  • print chunk elements if the chunk is not empty
  • Continue until:
    • the end of the iterator is reached
    • there are no more chunks left

itertools + grouper()

Alternatively we can use grouper to split iterable into chunks with Python:

def grouper(iterable, n, fillvalue=None):
	args = [iter(iterable)] * n
	return itertools.zip_longest(*args, fillvalue=fillvalue)

my_iterator = iter([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
chunk_size = 3

for chunk in grouper(my_iterator, chunk_size):
	print(chunk)

result:

(1, 2, 3)
(4, 5, 6)
(7, 8, 9)
(10, None, None)

To split the iterable into groups of elements we do:

  • grouper() is a function that groups an iterable into chunks of size n
  • args is a list of n references to the same iterator object created from iterable
  • itertools.zip_longest() aggregates the elements of each chunk into tuples, filling missing values with fillvalue.
  • The for loop iterates over my_iterator in chunks of chunk_size.

generator function

Finally we can create function to generator chunks in Python:

def chunk_iterator(iterator, chunk_size):
	chunk = []
	for item in iterator:
    	chunk.append(item)
    	if len(chunk) == chunk_size:
        	yield chunk
        	chunk = []
	if chunk:
    	yield chunk
   	 
my_iterator = iter([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
chunk_size = 3

for chunk in chunk_iterator(my_iterator, chunk_size):
	print(chunk)

Result:

[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
[10]