Main Menu: PYTHON PROGRAMMING
Data Structures: Tuples, Lists, Dictionaries and more...
While developing software professionally, you will need to go beyond working with variables that store a single value of data, and start working with constructs that can store multiple values (or elements) in a common, easy-to-access format. In this chapter, you will learn to do just that. You will be introduced to various important data-structures in Python. With lists, you will learn to store and access data sequentially, while with dictionaries you will learn how to develop elaborate data-storage objects that can be accessed with unique keywords. You will also learn to initialize data-structures by using elegant constructs known as comprehensions that can help you reduce the number of lines of code you write, and make your code faster and smarter.
4.1: What are Data Structures?
In the second chapter, you explored variables and different data-types. We saw that variables can hold only a single value at a time, be it numeric or textual. But often, when developing programs professionally, you'll come across the need to organize data and values into elaborate structures that enable you to identify, access and manipulate data efficiently.

Data-structures are all about just that. They are (in most cases) variables that can hold two or more data elements or items of different values. In most cases, these elements can be accessed from within the data-structure with the help of index values, just like you can access characters from within a string. The different data structures available in Python are: Tuples, Lists, Sets, and Dictionaries.

Of these, tuples, lists and sets use indexes to locate and store data within them. Dictionaries, unlike the former three, use the concept of key-value pairs to store and access data.

One of the things that makes data-structures fantastic is the fact that not only can they hold more than single valued data, but they can also hold data belonging to different data-types. This means that you can store textual and numeric data within the same data-structure.
4.2: Tuples and Lists...
Alrighty then, let's start with exploring Tuples. They are simply sequentially ordered immutable lists of Python objects, written inside parentheses ( ). This means that once a tuple is created, it is not possible to alter the contents of the tuple.

NOTE: Before you start, remember to launch the Python shell in your CMD!

Here's how you initialize a tuple:


>>> # A tuple that contains on string objects: the names of my friends             
>>> myFriends = ('Diana', 'Annet', 'Meghan', 'Alome', 'Divyesh', 'Abhinava', 'Bipin', 'Senthil', 'Ashwantha')
>>> print('My Friends:', myFriends)
My Friends: ('Diana', 'Annet', 'Meghan', 'Alome', 'Divyesh', 'Abhinava', 'Bipin', 'Senthil', 'Ashwantha')              
>>> 
            

Well, so we just created a tuple and printed it. Let's now initialize one contains both numeric and textual data:


>>> # Here's one that contains numbers and strings...
>>> numbersAndWords = ('One', 2, 'Three', 4, 'Five', 6, 'Seven', 8, 'Nine', 10)
>>> print('Mixed Tuple:', numbersAndWords)
Mixed Tuple: ('One', 2, 'Three', 4, 'Five', 6, 'Seven', 8, 'Nine', 10)
>>>                
            

And, you can also create a tuple with more tuples (and other data-structures) within it! These are called nested tuples. Say you want to create a tuple that contains tuples made up of your friends' names, age and their favorite anime.


>>> myFriends = (('Diana', 23, 'Beyblade'),        # Hit enter
... ('Annet', 25, 'Alice Academy'),                # Hit enter
... ('Meghan', 25, 'Dragon Ball Super'),           # Hit enter
... ('Alome', 26, 'Dragon Ball Z'),                # Keep hitting enter after every tuple...
... ('Divyesh', 25, 'Dragon Ball Z'),
... ('Abhinava', 24, 'Doesn\'t watch anime'),
... ('Bipin', 25, 'Is an anime character himself'),
... ('Senthil', 26, 'Doesn\'t watch anime'),
... ('Ashwantha', 24, 'Attack on Titan'))          # Notice the double parentheses marking the end of the tuple of tuples...
>>> print('My Friends:' + '\n' + str(myFriends))   # Convert the tuple to a string before printing
My Friends:
(('Diana', 23, 'Beyblade'), ('Annet', 25, 'Alice Academy'), ('Meghan', 25, 'Dragon Ball Super'), ('Alome', 26, 'Dragon Ball Z'), ('Divyesh', 25, 'Dragon Ball Z'), ('Abhinava', 24, "Doesn't watch anime"), ('Bipin', 25, 'Is an anime character himself'), ('Senthil', 26, "Doesn't watch anime"), ('Ashwantha', 24, 'Attack on Titan'))
>>> 
            

So, we've create a new tuple myFriends, which consists of 9 tuples within it, and each of these tuples consists of numeric and textual elements. Let's take another example, of a tuple that contains information on different smartphones:


>>> allMySmartphones = ((1, 'Samsung Galaxy S Plus', 18500, (122.4, 64.2, 9.9)),   # Hit enter
... (2, 'Samsung Galaxy Grand', 15000, (144.80, 72.10, 8.60)),                     # Hit enter...
... (3, 'Microsoft Lumia', 8100, (144, 73.3, 8.6)),
... (4, 'Moto G4 Plus', 14999, (153, 76.6, 9.8)),
... (5, 'Oppo R11 Plus', 34600, (165.8, 81.5, 7.8)))                               # Notice the parentheses marking the end of the tuple...                    
>>> 
            

The tuple allMySmartphones consists of five tuples. Each of these tuples consists of textual and numeric objects, and another tuple within it. If you look closely, all these tuples inside allMySmartphones share a similar structure, which demonstrates that data-structures can be used efficiently for organizing data.

Now that you know what tuples are, let's learn to access the data within them. As mentioned earlier, data inside tuples is organized sequentially from 0 to (total number of elements - 1), and their positions are identified with the help of index values. Consider the tuple allMySmartphones. It has five elements inside of it (each element is a tuple itself).

Here's how you use index values to fetch data from within the tuple:


>>> # Print one element (tuple) at a time...
>>> print('My First Smartphone:', allMySmartphones[0])
My First Smartphone: (1, 'Samsung Galaxy S Plus', 18500, (122.4, 64.2, 9.9)
>>> 
>>> print('My Second Smartphone:', allMySmartphones[1])
My Second Smartphone: (2, 'Samsung Galaxy Grand', 15000, (144.80, 72.10, 8.60))
>>> 
>>> print('My Third Smartphone:', allMySmartphones[2])
My Third Smartphone: (3, 'Microsoft Lumia', 8100, (144, 73.3, 8.6))
>>> 
            

You can also fetch multiple elements at a time. Say, you want to fetch the 2nd (index position 1) to the 4th (index position 4 - 1) element in the tuple:


>>> # Fetch elements from index position '1' to index position '4 - 1':
>>> print('My 2nd, 3rd and 4th Smartphone:' + '\n' + str(allMySmartphones[1:4]))  
My 2nd, 3rd and 4th Smartphone:
((2, 'Samsung Galaxy Grand', 15000, (144.8, 72.1, 8.6)), (3, 'Microsoft Lumia', 8100, (144, 73.3, 8.6)), (4, 'Moto G4 Plus', 14999, (153, 76.6, 9.8)))
>>> 
            

Pretty cool huh? Now, let's try fetching the names of some of the smartphones in the list. Note that these smartphone names are elements inside tuples, so that means we need to use the index positions of elements inside tuples inside a tuple. Here's the syntax for it:


>>> # Name of the smartphone in the second tuple:
>>> print('My Second Smartphone:', allMySmartphones[1][1])
My Second Smartphone: Samsung Galaxy Grand
>>> 
>>> # Name of the smartphone in the fourth tuple:
>>> print('My Fourth Smartphone:', allMySmartphones[3][1])
My Fourth Smartphone: Moto G4 Plus
>>>                  
            

Consider the code for fetching the name of the smartphone from the fourth tuple. The code uses two index values, one for index position of the element tuple, followed by the index position of the name string inside the element tuple.

Now, there's a tuple inside each of these tuples, which holds data regarding dimensions (length, breadth, and thickness) of the corresponding smartphones. Try fetching these values. If you're paying attention, you'll know that you're going to need to use three index values: One for the five element tuples, the next for the tuple of dimensions, and the third for the dimension values inside the tuple. Here's an example that you can use:


# Fetch the name and dimensions of the first phone:
>>> name, dimensions = allMySmartphones[0][1], allMySmartphones[0][3]
>>> print('Dimensions of', name, 'are', dimensions)
Dimensions of Samsung Galaxy S Plus are (122.4, 64.2, 9.9)
>>> 
            

Saw what I did there in the second line?! Python allows you to initialize more than one variable in a single line of code with the help of commas.

Now, the dimensions are stored as a tuple, so fetching data from allMySmartphones[0][3] returns a tuple with three elements inside it. You can initialize three variables for each of these in a single line of code. Check this:


>>> # Fetch the name and dimensions and store them in four different variables:
>>> name, length, breadth, width = allMySmartphones[0][1], allMySmartphones[0][3]
>>> print('Dimensions of', name, 'are', length, 'x', breadth, 'x', width)
Dimensions of Samsung Galaxy S Plus are 122.4 x 64.2 x 9.9
>>>                 
            

Before we move to the next data-structure in this session, have a look at a few other ways to initialize / create tuples:


>>> # You can create a tuple without using brackets...
>>> tupleWithoutBrackets = 'Kevin', 'Annet', 'Diana'
>>> print("Here's a tuple created without brackets:", tupleWithoutBrackets)
Here's a tuple created without brackets: ('Kevin', 'Annet', 'Diana')
>>>
>>> # And here's a tuple that contains a single element...
>>> singleElementTupleWithoutBrackets = 'Kevin', 
>>> singleElementTuplesWithBrackets = ('Annet', )
>>> 
>>> # You can also concatenate tuples to form a new tuple...
>>> annetAndKevin = singleElementTuplesWithBrackets + singleElementTupleWithoutBrackets
>>> print('Concatenated tuple:', annetAndKevin)
Added tuple: ('Annet', 'Kevin')
>>> 
>>> # And, you can turn strings into tuples...
>>> myNameTuple = tuple('Kevin Sequeira')
>>> print('My name as a tuple:', myNameTuple)
My name as a tuple: ('K', 'e', 'v', 'i', 'n', ' ', 'S', 'e', 'q', 'u', 'e', 'i', 'r', 'a')
>>>       
            

Lists, the next data-structure we're going to explore in this section, are very much like tuples, except in one regard: they are mutable, while tuples are immutable. And also, lists are written inside square brackets [ ], while tuples are written inside parentheses ( ).

Just like tuples, lists can be a mixture of various data-types, containing both numerical and textual data, and also other data structures within them. But before we dive into all that, let's create a simple tuple and a simple list and compare them:


>>> # A tuple is created using parentheses '( )'
>>> tupleOfPokemon = ('Charizard', 'Squirtle', 'Bulbasaur', 'Pikachu', 'Psyduck', 'Togepi')
>>>
>>> # A list is created using square brackets '[ ]'
>>> listOfPokemon = ['Charizard', 'Squirtle', 'Bulbasaur', 'Pikachu', 'Psyduck', 'Togepi']
>>>                 
            

Lists store data sequentially with index values, just like tuples. This means that the data inside lists can be accessed in the same manner as with tuples, which we saw earlier. Yet, lists are special because they can be altered, i.e. the data can be added to a list, or deleted from a list. Data at specific index locations can also be updated. This makes lists extremely powerful and dynamic data-structures.

Data manipulation in lists can be done using special list-specific functions: append() and pop() are two extremely useful functions. Have a look:


>>> # Add elements to the end of a list using 'append()'
>>> id(listOfPokemon)
2187180204616
>>> 
>>> listOfPokemon.append('Clefairy')
>>> print('List of Pokemon:', listOfPokemon)
List of Pokemon: ['Charizard', 'Squirtle', 'Bulbasaur', 'Pikachu', 'Psyduck', 'Togepi', 'Clefairy']
>>> id(listOfPokemon)
2187180204616
>>> 
>>> # Remove the last element from a list and display the element using 'pop()'
>>> list.OfPokemon.pop()
'Clefairy'
>>> print('List of Pokemon:', listOfPokemon)
List of Pokemon: ['Charizard', 'Squirtle', 'Bulbasaur', 'Pikachu', 'Psyduck', 'Togepi']
>>> id(listOfPokemon)
2187180204616
>>>               
            

In the above example, we added a new element 'Clefairy' to the list using the .append() function. This element gets added as the last element in the list. The .pop() function removes the last element in the list and displays it. We see that on carrying out both operations, the id assigned to the list object does not change, showing that lists are mutable data-structures.

Try adding an element to a tuple. You cannot use functions like .append() so here's how you can do it:


>>> # Create a new tuple using the existing tuple and the new element as a tuple
>>> id(tupleOfPokemon)
2187180530704
>>> 
>>> # Add a new element as a 'single tuple' to the old tuple
>>> tupleOfPokemon = tupleOfPokemon + ('Clefairy', )
>>> print('Tuple of Pokemon:', tupleOfPokemon)
('Charizard', 'Squirtle', 'Bulbasaur', 'Pikachu', 'Psyduck', 'Togepi', 'Clefairy')
>>> id(tupleOfPokemon)
2187180297576
            

Adding the element ('Clefairy', ) to the tuple changes the id assigned to the tuple. This demonstrates that tuples are immutable data-structures.

Here are some more data-manipulations that you can carry out on your lists:


>>> # Change an element at a given index position...
>>> print('List of Pokemon:', listOfPokemon)
List of Pokemon: ['Charizard', 'Squirtle', 'Bulbasaur', 'Pikachu', 'Psyduck', 'Togepi']
>>> listOfPokemon[4] = 'Pigeot'
>>> print('List of Pokemon:', listOfPokemon)
List of Pokemon: ['Charizard', 'Squirtle', 'Bulbasaur', 'Pikachu', 'Pigeot', 'Togepi']
>>> 
>>> # Remove the first instance of any value in the list...
>>> listOfPokemon.append('Squirtle')
>>> print('List of Pokemon:', listOfPokemon)
List of Pokemon: ['Charizard', 'Squirtle', 'Bulbasaur', 'Pikachu', 'Pigeot', 'Togepi', 'Squirtle']
>>> listOfPokemon.remove('Squirtle')
>>> print('List of Pokemon:', listOfPokemon)
List of Pokemon: ['Charizard', 'Bulbasaur', 'Pikachu', 'Pigeot', 'Togepi', 'Squirtle']
>>>          
            

Simply using the index position, we changed the element 'Psyduck' at index position 4 to 'Pigeot'. And then, we demonstrated how to remove an element from the list. Notice we added the element 'Squirtle', giving us a list with repeated elements: The element 'Squirtle' appears at index positions 1 and 6. We then used the .remove() function and passed the string 'Squirtle' in it. We see that the element at index position 1 is removed, while the 'Squirtle' at position 6 is retained.

List manipulation functions are case-sensitive, meaning that the element you want to remove a string element, then the string passed into the .remove() function must match the case of each letter of the string element you want to remove. Here's what would happen if you passed a string to the .remove() function and it did not match any of the elements in the list:


>>> listOfPokemon.remove('squirtle')
Traceback (most recent call last):
  File "", line 1, in 
ValueError: list.remove(x): x not in list
>>>                 
            

You can also find the index position of the first occurance of an element in a list, using the .index() function. It also works for tuples. Try it out:


>>> # Get the index position of the element 'Pigeot' in the list...
>>> listOfPokemon.index('Pigeot')
3
>>> 
>>> # Get the index position of the element 'Bulbasaur' in the tuple...
>>> tupleOfPokemon.index('Bulbasaur')
2
>>>               
            

So you've now learnt all the basic stuff that you can do with Python lists. Let's now look at lists that contain elements of different data-types and data-structures. Consider a list that contains details of a student, such as first name, last name, age, Year-Division, list of subjects, etc.


>>> # Let's create a list 'studentDetails001' using other variables, tuples and lists
>>> # A tuple containing the student's first and last name...
>>> fullNameTuple = ('Kevin', 'Sequeira')
>>> 
>>> # A variable containing the student's age...
>>> studentAge = 26
>>> 
>>> # A variable containing the student's year-division
>>> studentClass = 'Year 2-Full Time'
>>> 
>>> # A tuple for for the program name and code...
>>> programCode = ('Master of Data Science', 'LMDS')
>>> 
>>> # A list of lists, also known as 'nested-list', containing the student's selected subjects and his or her grades...
>>> subjectDetails = [['Statistics for Data Science', 'HD'], ['Statistical Programming for Data Science', 'HD'], ['Probabilities and Data', 'C'], ['Big Data Basics', 'C'], ['Unsupervised Methods in Analytics', 'D'], ['Research Methods', 'D'], ['Predictive Analytics', 'D'], ['Data Visualisation', 'C']]
>>> 
>>> # We'll now use these variables, tuple and list to create a new list...
>>> studentDetails001 = [fullNameTuple, studentAge, studentClass, programCode, subjectDetails]
>>> print('Student Details 001: ' + '\n' + str(studentDetails001))
Student Details 001:
[('Kevin', 'Sequeira'), 26, 'Year 2-Full Time', ('Master of Data Science', 'LMDS'), [['Statistics for Data Science', 'HD'], ['Statistical Programming for Data Science', 'HD'], ['Probabilities and Data', 'C'], ['Big Data Basics', 'C'], ['Unsupervised Methods in Analytics', 'D'], ['Research Methods', 'D'], ['Predictive Analytics', 'D'], ['Data Visualisation', 'C']]]
>>> 
            

Let's say we create a few more lists in this way. I'm going to skip the details and just print the lists for you:


>>> # Here's student details for Divyesh...
>>> print('Student Details 002: ' + '\n' + str(studentDetails002))  
Student Details 002:
[('Divyeshkumar', 'Lad'), 25, 'Year 2-Full Time', ('Master of Data Science', 'LMDS'), [['Statistics for Data Science', 'HD'], ['Statistical Programming for Data Science', 'HD'], ['Probabilities and Data', 'C'], ['Big Data Basics', 'C'], ['Unsupervised Methods in Analytics', 'D'], ['Research Methods', 'D'], ['Predictive Analytics', 'D'], ['Data Visualisation', 'D']]]
>>> 
>>> # Here's Bipin's details...
>>> print('Student Details 003: ' + '\n' + str(studentDetails003))  
Student Details 003:
[('Bipin', 'Karki'), 25, 'Year 2-Full Time', ('Master of Data Science', 'LMDS'), [['Statistics for Data Science', 'HD'], ['Statistical Programming for Data Science', 'HD'], ['Probabilities and Data', 'D'], ['Big Data Basics', 'D'], ['Unsupervised Methods in Analytics', 'D'], ['Research Methods', 'D'], ['Predictive Analytics', 'D'], ['Data Visualisation', 'D']]]
>>> 
>>> # And finally, Abhinava's...
>>> print('Student Details 004: ' + '\n' + str(studentDetails004))
[('Abhinava', 'Barthakur'), 24, 'Year 2-Full Time', ('Master of Data Science', 'LMDS'), [['Statistics for Data Science', 'HD'], ['Statistical Programming for Data Science', 'HD'], ['Probabilities and Data', 'HD'], ['Big Data Basics', 'HD'], ['Unsupervised Methods in Analytics', 'HD'], ['Research Methods', 'HD'], ['Predictive Analytics', 'HD'], ['Data Visualisation', 'HD']]]
>>>   
            

Can we store all of these student details in a list to create some form of Student Database? Yes Siree, we can...


>>> # We create a list called 'studentDetails' containing 4 lists...
>>> studentDetails = [studentDetails001, studentDetails002, studentDetails003, studentDetails004]
>>> 
>>> # Here's the database, consisting of four similarly structured lists with details of different students...
>>> print(studentDetails)
[[('Kevin', 'Sequeira'), 26, 'Year 2-Full Time', ('Master of Data Science', 'LMDS'), [['Statistics for Data Science', 'HD'], ['Statistical Programming for Data Science', 'HD'], ['Probabilities and Data', 'C'], ['Big Data Basics', 'C'], ['Unsupervised Methods in Analytics', 'D'], ['Research Methods', 'D'], ['Predictive Analytics', 'D'], ['Data Visualisation', 'C']]], [('Divyeshkumar', 'Lad'), 25, 'Year 2-Full Time', ('Master of Data Science', 'LMDS'), [['Statistics for Data Science', 'HD'], ['Statistical Programming for Data Science', 'HD'], ['Probabilities and Data', 'C'], ['Big Data Basics', 'C'], ['Unsupervised Methods in Analytics', 'D'], ['Research Methods', 'D'], ['Predictive Analytics', 'D'], ['Data Visualisation', 'D']]], [('Bipin', 'Karki'), 25, 'Year 2-Full Time', ('Master of Data Science', 'LMDS'), [['Statistics for Data Science', 'HD'], ['Statistical Programming for Data Science', 'HD'], ['Probabilities and Data', 'D'], ['Big Data Basics', 'D'], ['Unsupervised Methods in Analytics', 'D'], ['Research Methods', 'D'], ['Predictive Analytics', 'D'], ['Data Visualisation', 'D']]], [('Abhinava', 'Abhinava'), 24, 'Year 2-Full Time', ('Master of Data Science', 'LMDS'), [['Statistics for Data Science', 'HD'], ['Statistical Programming for Data Science', 'HD'], ['Probabilities and Data', 'HD'], ['Big Data Basics', 'HD'], ['Unsupervised Methods in Analytics', 'HD'], ['Research Methods', 'HD'], ['Predictive Analytics', 'HD'], ['Data Visualisation', 'HD']]]]
>>> 
>>> # Let's run a 'for' loop on this list and print the details of each student one at a time:
>>> for studentList in studentDetails:
...     print(studentList, '\n')
...
[('Kevin', 'Sequeira'), 26, 'Year 2-Full Time', ('Master of Data Science', 'LMDS'), [['Statistics for Data Science', 'HD'], ['Statistical Programming for Data Science', 'HD'], ['Probabilities and Data', 'C'], ['Big Data Basics', 'C'], ['Unsupervised Methods in Analytics', 'D'], ['Research Methods', 'D'], ['Predictive Analytics', 'D'], ['Data Visualisation', 'C']]]

[('Divyeshkumar', 'Lad'), 25, 'Year 2-Full Time', ('Master of Data Science', 'LMDS'), [['Statistics for Data Science', 'HD'], ['Statistical Programming for Data Science', 'HD'], ['Probabilities and Data', 'C'], ['Big Data Basics', 'C'], ['Unsupervised Methods in Analytics', 'D'], ['Research Methods', 'D'], ['Predictive Analytics', 'D'], ['Data Visualisation', 'D']]]

[('Bipin', 'Karki'), 25, 'Year 2-Full Time', ('Master of Data Science', 'LMDS'), [['Statistics for Data Science', 'HD'], ['Statistical Programming for Data Science', 'HD'], ['Probabilities and Data', 'D'], ['Big Data Basics', 'D'], ['Unsupervised Methods in Analytics', 'D'], ['Research Methods', 'D'], ['Predictive Analytics', 'D'], ['Data Visualisation', 'D']]]

[('Abhinava', 'Barthakur'), 24, 'Year 2-Full Time', ('Master of Data Science', 'LMDS'), [['Statistics for Data Science', 'HD'], ['Statistical Programming for Data Science', 'HD'], ['Probabilities and Data', 'HD'], ['Big Data Basics', 'HD'], ['Unsupervised Methods in Analytics', 'HD'], ['Research Methods', 'HD'], ['Predictive Analytics', 'HD'], ['Data Visualisation', 'HD']]]

>>> 
            

Cool. As we just demonstrated, a list is an iterable data-structure, meaning that you can iterate through the elements of a list (or a tuple) using loops. In this case, we iterated through the list studentDetails which consisted of four elements: studentDetails001, studentDetails002, studentDetails003 and studentDetails004.

So how about we use the 'for' loop to print the names of all students in the studentDetails list? In each of the individual lists, the student names are packed within a tuple. This tuple appears in index position 0 inside each individual list. We'll need to use multi-level indexing:


>>> # Let's first get the total number of iterations that we'll need for the loop...
>>> numberOfStudents = len(studentDetails)
>>> 
>>> # Now, let's iterate through the list 'studentDetails' using a 'for' loop:
>>> for counter in range(0, numberOfStudents):
...    firstName = studentDetails[counter][0][0]
...    lastName = studentDetails[counter][0][1]
...    print('Name of Student 00' + str(counter + 1) + ': ' + firstName + ' ' + lastName)
... 
Name of Student 001: Kevin Sequeira
Name of Student 002: Divyeshkumar Lad
Name of Student 003: Bipin Karki
Name of Student 004: Abhinava Barthakur
>>> 
            

What we did in the example above was really simple. Unlike the previous one where we iterated through the list elements itself, in this example we find the number of elements in the list studentDetails, and then iterate through the list using index values. Within each of the lists, we have a tuple that holds the first name and last name. The tuple is found at index position 0 inside each list, and the first and last name strings are found at index positions 0 and 1, respectively. That's how we carried out indexing in the above example:

[index position of list][index position of tuple within list][index position of element within tuple].


There's so much to learn when it comes to lists and tuples that we could go and on without end. However, let's stop for now and move on to the next data-structure: Dictionaries. We'll continue to learn about and use lists extensively as we go ahead...
4.3: The Magic of Dictionaries
Dictionaries are amazing data-structures: I like them! First, dictionaries are mutable, which means you can change the values of elements within dictionaries, without having to create a new variable. Second, dictionaries use key-value pairs for mapping data stored in them, unlike index values used in lists and tuples.

Using key-value pairs means you can map values or variables or data-structures to specific keys. Keys are unique names in dictionaries, and these are used to store, extract, and manipulate data within dictionaries.

Since dictionaries are so different from tuples and lists, let me show you a small example before we go ahead.


>>> # A simple dictionary with alphabets as keys, and the alphabet's position as value...
>>> simpleDictionary = {'a': 1, 'b': 2, 'c': 3}
>>> 
>>> # Let's print this dictionary, shall we?
>>> print('Simple Dictionary:', simpleDictionary)
Simple Dictionary: {'a': 1, 'b': 2, 'c': 3}
>>> 
>>> # You can fetch data from a dictionary using the 'key'...
>>> print('Value for key "a":', simpleDictionary['a'])
Value for key "a": 1
>>> print('Value for key "b":', simpleDictionary['b'])
Value for key "b": 2
>>> print('Value for key "c":', simpleDictionary['c'])
Value for key "c": 3
>>> 
>>> # You can add a new 'key-value' pair to the dictionary...
>>> simpleDictionary['d'] = 4
>>> print('Simple Dictionary:', simpleDictionary)
Simple Dictionary: {'a': 1, 'b': 2, 'c': 3, 'd': 4}
>>> 
>>> # Adding a duplicate key will replace the previous value for the key...
>>> simpleDictionary['c'] = 5
>>> print('Simple Dictionary:', simpleDictionary)
Simple Dictionary: {'a': 1, 'b': 2, 'c': 5, 'd': 4}
>>> 
>>> # You can fetch all the keys in a dictionary using the 'keys()' function...
>>> print('Keys of Simple Dictionary:', simpleDictionary.keys())
Keys of Simple Dictionary: dict_keys(['a', 'b', 'c', 'd'])
>>> 
>>> # You can also get the values assigned to all keys using the '.values()' function...
>>> print('Values in Simple Dictionary:', simpleDictionary.values())
Values in Simple Dictionary: dict_values([1, 2, 5, 4])
>>> 
>>> # And finally, you can remove a 'key-value' pair from a dictionary...
>>> del(simpleDictionary['c'])
>>> print('Simple Dictionary:', simpleDictionary)
{'a': 1, 'b': 2, 'd': 4}
>>> 
            

Dictionary key-value pairs can also contain other data-types and data-structures as values. For example, take a dictionary in which the key is the lottery draw date, and the value is a list containing the winning numbers.


>>> lotteryWins = {                             # Hit 'Enter' to go to next line
... '20-Mar-2019': [5, 22, 25, 27, 32, 33],
... '21-Mar-2019': [14, 16, 22, 26, 27, 32],
... '22-Mar-2019': [3, 9, 19, 22, 24, 25]
... }
>>> 
            

Now all that's fine, but here's my favorite part! You can have key-value pairs with dictionaries as keys! These are called nested dictionaries. Just like we created lists within lists and tuples within tuples. Let's consider a dictionary in which the key is a student's roll number, and the value is a dictionary containing the first name and last name of the student.


>>> # For the sake of simplicity, let's consider one 'key-value' pair...
>>> studentNames = {                   # Parent dictionary begins
...    '1001': {                       # Inner dictionary for key '1001' begins
...        'First Name': Kevin,
...        'Middle Name': '',
...        'Last Name': 'Sequeira'
...    }                               # Inner dictionary for key '1001' ends
... }                                  # Parent dictionary ends
>>> 
>>> # Let's fetch the last name for roll number '1001', shall we?
>>> print('Last Name for student roll no. 1001:', studentNames['1001']['Last Name'])
Last Name for student roll no. 1001: Sequeira
            

You see? It's that simple. Don't let the indentation worry you. They're just there so you can follow the structure of the nested-dictionary.

Let's go back to the example of a Student Database, which we created in the previous section using lists. Say, you want to associate the data for each of the four students with a roll number, such that you can access each student's data using the roll number as a key. Here's how you'll do it by following the examples we've just seen:


>>> # Let's initialize a dictionary called 'studentDatabase'...
>>> studentDatabase = {
... '1001': [('Kevin', 'Sequeira'), 26, 'Year 2-Full Time', ('Master of Data Science', 'LMDS'), [['Statistics for Data Science', 'HD'], ['Statistical Programming for Data Science', 'HD'], ['Probabilities and Data', 'C'], ['Big Data Basics', 'C'], ['Unsupervised Methods in Analytics', 'D'], ['Research Methods', 'D'], ['Predictive Analytics', 'D'], ['Data Visualisation', 'C']]],
... '1002': [('Divyeshkumar', 'Lad'), 25, 'Year 2-Full Time', ('Master of Data Science', 'LMDS'), [['Statistics for Data Science', 'HD'], ['Statistical Programming for Data Science', 'HD'], ['Probabilities and Data', 'C'], ['Big Data Basics', 'C'], ['Unsupervised Methods in Analytics', 'D'], ['Research Methods', 'D'], ['Predictive Analytics', 'D'], ['Data Visualisation', 'D']]],
... '1003': [('Bipin', 'Karki'), 25, 'Year 2-Full Time', ('Master of Data Science', 'LMDS'), [['Statistics for Data Science', 'HD'], ['Statistical Programming for Data Science', 'HD'], ['Probabilities and Data', 'D'], ['Big Data Basics', 'D'], ['Unsupervised Methods in Analytics', 'D'], ['Research Methods', 'D'], ['Predictive Analytics', 'D'], ['Data Visualisation', 'D']]],
... '1004': [('Abhinava', 'Barthakur'), 24, 'Year 2-Full Time', ('Master of Data Science', 'LMDS'), [['Statistics for Data Science', 'HD'], ['Statistical Programming for Data Science', 'HD'], ['Probabilities and Data', 'HD'], ['Big Data Basics', 'HD'], ['Unsupervised Methods in Analytics', 'HD'], ['Research Methods', 'HD'], ['Predictive Analytics', 'HD'], ['Data Visualisation', 'HD']]]
... }
>>>          
            

Looks real messy, doesn't it? Let's say you want to extract the age of the student with roll number '1003'. You see that within the list attached to key '1003', the age is found as a single element in index position '1'. So here's what you'd need to do:


>>> print('Age of student roll no. 1003:', studentDatabase['1003'][1])
Age of student roll no. 1003: 25
>>>                 
            

How about extracting the Masters' Program Code for student with roll number '1002'? The code is found at index position '1' within the tuple at index position '3' of the list... Urgh. It keeps getting messier...


>>> print('Program Code that student roll no. 1002 is enrolled in:', studentDatabase['1003'][3][1])
Program Code that student roll no. 1002 is enrolled in: LMDS
            

Whew! This is wicked. You can't always expect people to remember so many index position numbers when maintaining a student database. Probably restructuring the database into a Nested Dictionary might help. Let's reinitialize the dictionary then:


>>> # Let's start with only filling in details for one student, roll no. 1001...
>>> 
>>> studentDetails = {             # Parent dictionary begins
...    '1001': {                   # Dictionary for key '1001' begins
...        'name': {'first name': 'Kevin', 'last name': 'Sequeira'},
...        'age': 26,
...        'program': {'program name': 'Master of Data Science', 'program code': 'LMDS'},
...        'duration': {'years': 2, 'mode': 'full-time'},
...        'courses': {            # Inner-dictionary inside key '1001' for 'courses' begins
...            'MATH-4044': {'course name': 'Statistics for Data Science', 'grade': 'HD'},
...            'COMP-5070': {'course name': 'Statistical Programming for Data Science', 'grade': 'HD'},
...            'MATH-4073': {'course name': 'Probabilities and Data', 'grade': 'C'},
...            'INFS-5095': {'course name': 'Big Data Basics', 'grade': 'C'},
...            'INFS-5102': {'course name': 'Unsupervised Methods in Analytics', 'grade': 'D'},
...            'INFT-4017': {'course name': 'Research Methods', 'grade': 'D'},
...            'INFS-5100:' {'course name': 'Predictive Analytics', 'grade': 'D'},
...            'INFS-5116': {'course name': 'Data Visualisation', 'grade': 'C'}
...        }                       # Inner-dictionary for key '1001' for 'courses' ends
...    }                           # Dictionary for key '1001' ends
... }                              # Parent dictionary ends
>>> 
            

If you look closely now, you'll see that the nested-dictionary provides a definitive structure to how data is stored inside it. So now, we are no longer depending on index values, but we can use well defined keys to fetch our data. You can now explore the different keys inside each level of the nested-dictionary.


>>> print('Keys inside "studentDetails":', studentDetails.keys())
Keys inside "studentDetails": dict_keys(['1001'])
>>> 
>>> print('Keys inside "1001":', studentDetails['1001'].keys())
Keys inside "1001": dict_keys(['name', 'age', 'program', 'duration', 'courses'])
>>> 
>>> print('Keys inside "name":', studentDetails['1001']['name'].keys())
Keys inside "name": dict_keys(['first name', 'last name'])
>>> 
>>> print('Keys inside "program":', studentDetails['1001']['program'].keys())
Keys inside "program": dict_keys(['program name', 'program code'])
>>> 
>>> print('Keys inside "courses":', studentDetails['1001']['courses'].keys())
Keys inside "program": dict_keys(['MATH-4044', 'COMP-5070', 'MATH-4073', 'INFS-5095', 'INFS-5102', 'INFT-4017', 'INFS-5100', 'INFS-5116'])
>>> 
            

Now, if you wanted to fetch 'program' information for student roll no '1001', you simply could...


>>> # Let's use a 'for' loop and make this interesting, shall we?
>>> for keyName in studentDetails['1001']['program'].keys():
...    print(keyName + ': ' + studentDetails['1001']['program'][keyName])
... 
program name: Master of Data Science
program code: LMDS
>>>                 
            

You can now see how simple dictionaries make it to store data. So you might think, "Heck it's smart to use dictionaries all the time! But that's not really the case. At least not with Python. You see, in programming, the point of using different data-structures is not only a matter of storing data in elaborate and reproducible structures, but it also comes down to speed of execution.

Tuples are faster than Lists. Lists are faster than Dictionaries. It depends on your use case.

When it comes to simple queueing and indexing, tuples are faster than lists. This is because tuples are stored in single blocks of memory, since they are immutable and thus can use memory more rigidly. This makes it fasted to loop through tuples.

Lists are slightly slower than tuples. This is because lists, unlike tuples, are stored in two blocks of memory. The first block stores information about the list object. The second block stores information about the elements inside the list. This allows lists to be mutable. But it slows down execution time

When it comes to looking up elements in a data-structure, dictionaries are faster than lists. This is because of Time Complexity. The time complexity for searching an element in a list of lenght 'n' is O(n). However, for a dictionary, the time complexity is O(1). The reason for this is that in order to search for an element in a list, Python has to search through all 'n' elements. However, dictionaries store data in key-value pairs, where the keys are stored using hash tables. This allows Python to look-up just the right key and present its value instead of iterating through all elements like in lists.

>>> NOTE: For more information on time complexity for different data-structure operations, visit this link.

>>> NOTE: For more information on hashing or hash tables, visit this link.

Since this chapter is running way too long, I am going to prepare a separate post where we will build a basic student database using dictionaries and JSON files. Here's the link. It'll be fun. In fact we'll work on more than one example. But for now, let's move on to the next data-structure.
4.4: Sets and Frozen Sets
Set and Frozen Sets are interesting data-structures. In sets, each element is its own key and its own value. They do not support indexing like lists, however, they make use of the hashability of an element. This means that sets can only store unique elements.

Sets are mutable data-structures. While you cannot add duplicate elements (they simply get overwritten), you can add and remove elements from a set, and carry out Boolean operations.

Frozen Sets, as you might have guessed already, are immuatable data-structures. This means that once a frozen set is initialized, it cannot be altered in any way. Like tuples.

Let's explore sets:


>>> # Let's start with sets of odd and even numbers...
>>> oddNumbers = set([9, 5, 7, 1, 3])
>>> print('Set of Odd Numbers:', oddNumbers)
Set of Odd Numbers: {1, 3, 5, 7, 9}
>>> 
>>> evenNumbers = set([6, 2, 0, 4, 8, 2])
>>> print('Set of Even Numbers:', evenNumbers)
Set of Even Numbers: {0, 2, 4, 6, 8}
>>> 
            

If you observe the examples, you'll see that no matter how you initialize them, sets will always be re-ordered as seen for the set oddNumbers. In this case we're in luck as the set is in ascending order, but that won't always happen. Also, you can see that for the set evenNumbers, the duplicate element 2 was overwritten. This shows that elements in sets are hashed and thus can only be unique.

How about an example of a set that contains both alphabets and numeric elements?


>>> alphaNumericSet = set([3, 'b', 'A', 1, 'c', 'B', 2, 'C', 'a'])
>>> print('Set with Alpha Numeric Elements:', alphaNumericSet)
Set with Alpha Numeric Elements: {1, 2, 3, 'b', 'B', 'c', 'C', 'a', 'A'}
>>> 
            

You know, you can add to and remove from sets using two set-specific functions. Check these out:


>>> # Let's add an impurity to the set of even numbers...
>>> evenNumbers.add(3)
>>> print('Set of Even Numbers:', evenNumbers)
Set of Even Numbers: {0, 2, 3, 4, 6, 8}
>>> 
>>> # Urgh. I hate it. Let's take it out...
>>> evenNumbers.remove(3)
>>> print('Set of Even Numbers:', evenNumbers)
Set of Even Numbers: {0, 2, 4, 6, 8}
            

You can combine two sets using the logical OR (|) operator. Mathematically, that is the union of two sets.


>>> # Let's combine the sets for even and odd numbers...
>>> setsUnion = oddNumbers | evenNumbers
>>> print('Union of Sets of Even and Odd Numbers:', setsUnion)
Union of Sets of Even and Odd Numbers: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
>>>                
            

You can extract the common elements of two sets using the logical AND (&) operator. Mathematically, that is the intersection of two sets.


>>> # Let's find the intersection of sets for even and odd numbers...
>>> setsIntersection = oddNumbers & evenNumbers
>>> print('Intersection of Sets of Even and Odd Numbers:', setsIntersection)
Intersection of Sets of Even and Odd Numbers: set()
>>>                
            

Well, that gives you an empty set because, obviously, numbers can't be both even and odd! But I'd like for you to try a few examples of your own.

Alright, now that we understand what Sets are, let's move on to Frozen Sets. Well, as we have discussed earlier, frozen sets are simply sets that are immutable. Have a look...


>>> # Do you like the Justice League?
>>> theJusticeLeague = frozenset(['Diana', 'Clark', 'Bruce', 'Barry', 'Hal', 'Victor', 'Arthur'])
>>> 
>>> # Looks like Billy Batson wants to join the gang...
>>> theJusticeLeague.add('Billy')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'frozenset' object has no attribute 'add'
>>>              
            
4.5: Making Loops Shorter: Comprehensions
Alright, let's tackle the obvious question first: What are Comprehensions in Python? Answer: Comprehensions are ways to quickly and elegently initialize sequence or collection data-types and -structures in Python. They help you reduce number of lines of code, and also save up on runtime and memory when possible.

Comprehensions can be used to initialize different sequence or collection objects, like tuples, lists, dictionaries, sets, etc.

Let's start with list comprehensions. Let's say you wanted to create a list that holds square of integers between 1 and 20 that are divisible by 3. Here's how you'd do it using regular 'for' loops and 'if' statements:


>>> squaresList001 = []
>>> for number in range(1, 21):                # Range from 1 to 21 - 1...
...    if (number % 3) == 0:                   # If 'number' divisible by 3 returns a remainder of '0'...
...        squaresList001.append(number ** 2)
... 
>>> print("List of Squares of multiples of 3 below 20:", squaresList001)
List of Squares of multiples of 3 below 20: [9, 36, 81, 144, 225, 324]
>>> 
            

Whew! That's 4 whole lines of code. Now what if I told you it was possible to do all of that in a single line? Check it out:


>>> # If this doesn't blow your mind, you're probably a boring person :P                
>>> squaresList002 = [number ** 2 for number in range(1, 21) if (number % 3) == 0]
>>> print("List of Squares of multiples of 3 below 20:", squaresList002)
List of Squares of multiples of 3 below 20: [9, 36, 81, 144, 225, 324]
>>> 
            

Comprehensions make your code shorter and smarter (while making you sexier).

Now if I told you, create a list that holds tuples of all combinations of three 6 sides dice thrown, such that none of the combinations are repeated. For example, (1, 1, 2) mustn't be repeated as (1, 2, 1) or (2, 1, 1). Let's try doing that with 'for' loops first:


>>> threeDiceComboList001 = []
... for dice01 in range(1, 7):    
...     for dice02 in range(dice01, 7):
...         for dice03 in range(dice02, 7):
...             threeDiceComboList001.append((dice01, dice02, dice03))
... 
>>> # Print the first ten combinations...
>>> print('First ten combinations of three dice:', threeDiceComboList001[0:10])
First ten combinations of three dice: [(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 1, 4), (1, 1, 5), (1, 1, 6), (1, 2, 2), (1, 2, 3), (1, 2, 4), (1, 2, 5)]
>>> 
            

Too many 'nested-for loops', don't you think? Now, witness the magic of comprehensions...

     
>>> threeDiceComboList002 = [(dice01, dice02, dice03) for dice01 in range(1, 7) for dice02 in range(dice01, 7) for dice03 in range(dice02, 7)]
>>> print('First ten combinations of three dice:', threeDiceComboList002[0:10])
First ten combinations of three dice: [(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 1, 4), (1, 1, 5), (1, 1, 6), (1, 2, 2), (1, 2, 3), (1, 2, 4), (1, 2, 5)]
>>> 
            

Well, cool, isn't it? Now, let's turn our attention to dictionary comprehensions. Here's an interesting example: How would you create a dictionary that stores all the unique characters of the word 'ANTITRANSUBSTANTIATIONALIST' as keys and the count of their occurrence as values? Let's try it with a 'for' loop and 'if' statement first...


>>> characterCounter001 = {}
>>> for character in 'ANTITRANSUBSTANTIATIONALIST':
...     if character in characterCounter001.keys():
...          characterCounter001[character] = characterCounter001[character] + 1
...     else:
...          characterCounter001[character] = 1
... 
>>> print('Count of characters in "ANTITRANSUBSTANTIATIONALIST":', characterCounter001)
Count of characters in "ANTITRANSUBSTANTIATIONALIST": {'A': 5, 'N': 4, 'T': 6, 'I': 4, 'R': 1, 'S': 3, 'U': 1, 'B': 1, 'O': 1, 'L': 1}
>>> 
            

And now, let's do it the sexier way...


>>> characterCounter002 = {character: len([position for position, letter in enumerate('ANTITRANSUBSTANTIATIONALIST') if letter == character]) character in 'ANTITRANSUBSTANTIATIONALIST'}     
>>> print('Count of characters in "ANTITRANSUBSTANTIATIONALIST":', characterCounter002)
Count of characters in "ANTITRANSUBSTANTIATIONALIST": {'A': 5, 'N': 4, 'T': 6, 'I': 4, 'R': 1, 'S': 3, 'U': 1, 'B': 1, 'O': 1, 'L': 1}
>>>          
            

How about we create a tuple of all 3-bit binary numbers using comprehensions?


>>> threeBitTuple = tuple(str(highestBit) + str(middleBit) + str(lowestBit) for highestBit in range(0, 2) for middleBit in range(0, 2) for lowestBit in range(0, 2)) 
>>> print('All three bit numbers:', threeBitTuple) 
All three bit numbers: ('000', '001', '010', '011', '100', '101', '110', '111')
>>>               
            

If you're paying attention, you'll see that when declaring a tuple comprehension I specifically mentioned the keyword tuple, unlike when I declared list or dictionary comprehensions. This is because merely using parentheses ( ) to create a a tuple comprehension creates a generator object, instead of a tuple. We'll be looking at generators in the next chapter, so don't worry about it.

And whew! That's all for this chapter. It was a long one, so thanks for staying with me! See ya on chapter 5.