February 27, 2023

Iterators & Generators

1 Iterators

1.1 Another look at `range()`

The range() function returns an iterator that will iterate through the values that we specify when we call range().

The iterator only calculates and yields one value at a time. It does not calculate or store those numbers in memory up front, it calculates them only when they are needed, one at a time.

my_range = range(11, 23)
for x in my_range:
    print(x)

1.2 File Object Are Also Iterators

You may have remembered from the prerequisite course how we opened files and iterated through the lines one at a time. The main takeaway is that each line is read in (iterated through) one at a time. The entire file is not read up-front, each line is read into this notebook one at a time, as needed.

filepath = os.path.join(os.getcwd(), 'AAA_Fuel_Prices.csv')
count = 0
with open(filepath, 'r') as my_file:
    for line in my_file:
        line = line.strip()
        print(line)
        count += 1
        if count > 10:
            break

1.3 Iterators Can Also Be Made From Lists (as well as other data types)

Iterators can also be made out of lists. This is what happens when we use a list in a for loop.

my_list = [1, 2, 3, 4, 5]
for x in my_list:
    print(x)

my_list_iterator = iter(my_list)

print(type(my_list_iterator))
print(my_list_iterator.__next__())
print(my_list_iterator.__next__())
print(my_list_iterator.__next__())

2 Generators

You may wonder how can we write functions, like range(), that return iterators that we can iterate through. We can! In order to do so, we must use the keyword yield instead of return.

2.1 Our Own Version Of Range

Let’s write our own version of the range() function. We need to write a function the will yield numbers between a beginning and ending number. Note that when the function reaches the yield keyword, it will return that value (in this case, the value of i) and it will cease execution until it is asked for the next value.

def my_range(beg, end):
    "Generate numbers from start to stop"
    i = beg
    while i < end:
        yield i
        i += 1

# Let's call the function to return a generator that we can iterate through. 
range_of_nums = my_range(0, 10)

# Now, let's call the __next__() method to get each value is the it is 
# "yielded" by the generator

# This executes the code in the generator until it hits the yield statement.  It then stops until __next__() is called again.
print(range_of_nums.__next__())
print(range_of_nums.__next__())
print(range_of_nums.__next__())

output:

0
1
2

We do not usually use the __next__() method directly. We are usually looping over the iterator, or passing the iterator to another iterative process (in these case,__next__() is still used “under the hood”, but we are not using it directly as programmers). Below, we simple use range_of_nums in a loop, we also use the my_range() function directly in a for loop, just like you would use range().

for num in my_range(30, 33):
    print(num)

2.2 A Fibonacci Series Generator.

A Fibonacci Series is a series of numbers in which the next number is the sum of the two preceding numbers. If we start with 0 and 1, then the series is 0, 1, 1, 2, 3, 5, 8, 13, etc. This is a fun series that is often used in computer science lessons. Let’s code a function that will return a generator that iterates through the Fibonacci Series (starting with 0 and 1).

def fibonacci_series(N):
    """Generate the Fibonacci series starting at 0 and 1"""
    # We start by seeding 0 and 1 as the first two numbers
    i_prev = 0
    i = 1
    yield i_prev  # we yield 0 first
    # now in the following loop, we yield "i" and then calculate i_next by
    # summing the two previous
    for _ in range(N-1):
        yield i 
        i, i_prev = i + i_prev, i

f_s = fibonacci_series(10)
for x in f_s:
    print(x)

Iterators Can Only Be Iterated Over Once

Once we create an iterator, it can only be iterated over once. For example, in the above cell we looped over the entirety of f_s. Below, we try to loop over it again, but nothing prints. This is because we have already looped over the iterator to its end. If we need to iterate again, we will have to create a new iterator.

2.3 File Word Counts

Another example, which will become more meaningful if you take the course Python Data Structures, Data Mining and Big Data, is producing a word counts from a file, line by line.

Let’s write a function that will generate word counts, from a file, line by line.

def file_word_count(filepath):
    """Generate word counts from ta file, line by line"""
    # First, open the file
    with open(filepath, 'r') as my_file:
        # Loop through the lines
        for line in my_file:
            line = line.strip()  # strip whitespace from the line
            words = line.split()  # split the line into words
            # create a dictionary that we will store the word counts in
            word_count_dict = {}
            # loop through the words in the line and tally them in the
            # dictionary
            for word in words:
                if word in word_count_dict:
                    word_count_dict[word] += 1
                else:
                    word_count_dict[word] = 1
            # now loop through the dictionary and yield up the word counts 
            for word in word_count_dict:
                yield word, word_count_dict[word]

aesopa10_path = os.path.join(os.getcwd(), 'aesopa10.txt')
counter = 0
for word, count in file_word_count(aesopa10_path):
    print(word, count)
    counter += 1
    if counter > 2000:
        break

3 Enumerate and Zip

3.1 Enumerate

Enumerate takes in an iterable object (like a list, tuple, or some other iterator) and outputs tuples that are enumerated. The first element in the tuple is the number and the second is the value from the original iterable object.

my_list = ['a', 'b', 'c']

for item in enumerate(my_list):
    print(item)

(0, 'a')
(1, 'b')
(2, 'c')

3.2 Zip

Zip will take multiple iterable objects as inputs and output tuples that contain items from each iterable, in order. That is the first tuple will contain the first item from each iterable, the second tuple will contain the second items, etc…

tuple_1 = (2012, 2012, 2012)
tuple_2 = ('01', '02', '03')

new_list = []
for val1, val2 in zip(tuple_1, tuple_2):
    print(val1, val2)
    new_list.append(str(val1) + '_' + val2)
print(new_list)

2012 01
2012 02
2012 03
['2012_01', '2012_02', '2012_03']

for x in zip(tuple_1, tuple_2):
    print(x)

(2012, '01')
(2012, '02')
(2012, '03')

# Python

Iterators & Generators

1 Iterators

1.1 Another look at `range()`

1.2 File Object Are Also Iterators

1.3 Iterators Can Also Be Made From Lists (as well as other data types)

2 Generators

2.1 Our Own Version Of Range

2.2 A Fibonacci Series Generator.

2.3 File Word Counts

3 Enumerate and Zip

3.1 Enumerate

3.2 Zip

9 Data Prep: Discretization & 1Hot Encoding

8 kNN Model

Iterators & Generators

1 Iterators

1.1 Another look at range()

1.2 File Object Are Also Iterators

1.3 Iterators Can Also Be Made From Lists (as well as other data types)

2 Generators

2.1 Our Own Version Of Range

2.2 A Fibonacci Series Generator.

2.3 File Word Counts

3 Enumerate and Zip

3.1 Enumerate

3.2 Zip

9 Data Prep: Discretization & 1Hot Encoding

8 kNN Model

1.1 Another look at `range()`