Fundamentals of DataScience Week 3 - Object Mapping
PadhAIdata_objects_mappings_03

drawing

by One Fourth Labs

For more reading or to go indepth refer official reference of python language https://docs.python.org/3/reference/

Also refer to the Padhai-One Github repo for updates & other notebooks

Dictionary

Dictionaries are mutable data objects that has mapping between the keys and values. Keys are similar to set elements that would be hashed and stored. Each key will a have a value (object) associated with it. Dictionary do not preserve ordering.

Keys has to be immutable objects like int, float, string, tuple; whereas values can be any mutable or immutable object like even lists. This is because only keys are hashed, not values.

Creating Dictionary

List can be created in 2 ways

  1. using {} curly brackets
  2. using dict() function, which takes any sequence type or iterable objects (like other list, tuple, set) whose elements are objects with pair of elements
  3. using fromkeys() method for initializing dictionary

A dictioanry element has two parts represented as {key : value}. Key and associated value should be seperated by :.
The Keys will/need_to be unique while values will/need not.

In [1]:
# empty dictionary
a = {}
b = dict()

# initalized dictionary
c = {"a":1,"b":2, "c":3, "d":4}
d = dict([ (1,"a"), (2,"b") ]) # create dict from sequence of paired elements
e = c # shallow copy from a list

# Print
print("a = ",a, "\nb = ",b, "\nc = ",c, "\nd = ",d, "\ne = ",e )
a =  {} 
b =  {} 
c =  {'a': 1, 'b': 2, 'c': 3, 'd': 4} 
d =  {1: 'a', 2: 'b'} 
e =  {'a': 1, 'b': 2, 'c': 3, 'd': 4}

As one can see a list with elements made of tuple with element pairs is converted into Dictionary.

For dict() function, it is required to have paired elements or else it will throw error. Check the below cases.

In [2]:
a = [1,2,3,4]
try:
    out = dict(a) # throw error
except Exception as error:
    print("`a` convertion TypeError:", error)

b = (1,2)
try:
    out = dict(b) # throw error
except Exception as error:
    print("`b` convertion TypeError:", error)

c = {1,2,3}
try:
    out = dict(c) # throw error
except Exception as error:
    print("`c` convertion TypeError:", error)

d = "hello World"
try:
    out = dict(d) # throw error
except Exception as error:
    print("`d` convertion TypeError:", error)
`a` convertion TypeError: cannot convert dictionary update sequence element #0 to a sequence
`b` convertion TypeError: cannot convert dictionary update sequence element #0 to a sequence
`c` convertion TypeError: cannot convert dictionary update sequence element #0 to a sequence
`d` convertion TypeError: dictionary update sequence element #0 has length 1; 2 is required

It is mandatory for data sequences and sets to have elements as clubbed pairs of objects, that would form key and corresponding value in dictionary.

The elements can be clubbed using data objects like list, tuple or set; Each nested object should have exactly two elements.

In [3]:
a = [[1,2], [2,3], [4,5]]
b = dict(a)
c = [{1,2}, [2,3], (4,5)]
d = dict(c)
e= [['hello','world'] ]
f=dict(e)

print("b =", b)
print("d =", d)
print("f =", f)


g = [ [1,2,3], [4,5,6] ]
try :
    h = dict(g) # this will throw error
except Exception as error:
    print("Error at g:", error)
b = {1: 2, 2: 3, 4: 5}
d = {1: 2, 2: 3, 4: 5}
f = {'hello': 'world'}
Error at g: dictionary update sequence element #0 has length 3; 2 is required

Dictionary keys will be hashed thus they must be immutable objects, where as values can be any mutable or immutable object

In [4]:
a = {
    (1,2): "tuple key",
    "test": "string key",
    frozenset((1,2)): "Frozen set key",
    1:      "int key",
    1.222:  "float key",
    1+2j:   "complex key",
} 

print("a =", a)

try:
    b = { [1,2]: "list key" }
except Exception as error:
    print("Error at b:", error)

try:
    c = { {1,2}: "list key" }
except Exception as error:
    print("Error at c:", error)
a = {(1, 2): 'tuple key', 'test': 'string key', frozenset({1, 2}): 'Frozen set key', 1: 'int key', 1.222: 'float key', (1+2j): 'complex key'}
Error at b: unhashable type: 'list'
Error at c: unhashable type: 'set'
In [5]:
a = {
    "tuple": (1,2),
    "string": "test",
    "frozenset": frozenset((1,2)),
    "int": 1,
    "float": 1.222,
    "complex number": 1+2j,
    "list": [1,2,3],
    "set": {1,2,3},
} 

print("a =", a)
a = {'tuple': (1, 2), 'string': 'test', 'frozenset': frozenset({1, 2}), 'int': 1, 'float': 1.222, 'complex number': (1+2j), 'list': [1, 2, 3], 'set': {1, 2, 3}}

As the keys of dictionaries are hashed there will not be duplicated keys in a dict. The value of key will be updated to the latest instance of key's value.

In [6]:
z = {
    "a": 1, "b": 1, 
    "a": 2, "c": 2,
    "a": 3, "d": 3,
}

print("z =", z)
z = {'a': 3, 'b': 1, 'c': 2, 'd': 3}

One can see that the last occurence of key ahad value as 3 which is retained. Also one can see that both the value of a and c are same(3), thus values can have duplicate objects.


There could be cases where one may need to have just the keys in dict and values would be created at a future point.

But a dictionary key cannot exist without a value, in such cases one can initialize the value to None

In [7]:
a = {"a": None, "b":None, "c": None}
print(a)
{'a': None, 'b': None, 'c': None}

Or there could be case where we may want to initialize all the values tp some constant values or empty objects and would modify later point in code.

This pattern is frequently used in may applications, so there is dedicated function in dictionary called dict.fromkeys().
This two arguments, firstly a sequence data or set as argument and secondly optional value to which each keys will be initalized, if not given None will be taken as default.

In [8]:
a = [1,2,3,4,5,6,7]
b = dict.fromkeys(a) # values will be none
print("b =", b)

c = ["apple", "banana", "cucumber" ]
d = [1,2,3]
e = dict.fromkeys(c,d)
print("e =", e)
b = {1: None, 2: None, 3: None, 4: None, 5: None, 6: None, 7: None}
e = {'apple': [1, 2, 3], 'banana': [1, 2, 3], 'cucumber': [1, 2, 3]}

Note that function did not iterate over values argument instead it took entire list and made it as value for all the keys




Accessing elements

Dictionary elements cannot be indexed like list or tuple with numbers. Each value has to be accessed using keys. This is because keys are stored based on hashing, not sequentially in memory.

Subscripting with [] square bracket is used for accessing values, using keys for referencing.

In [9]:
z = {"a": 1, "b": 2, "c": 3, "d": 4}
y = z["a"]
x = z ["b"]
print("value at \"a\" :", y)
print("value at \"b\" :", x)

try:
    w = z[1]
except KeyError as error:
    print("KeyError:", error)
value at "a" : 1
value at "b" : 2
KeyError: 1

Referenced with 1 dict does not consider it as index instead takes it as key and looks at corresponding hashing, since it does not exist it throws KeyError.

When one tries to reference a key that is not present using [], it will throw error.
For certain program logic, it might be useful to get value if present else silently move to next step. In such cases one could use get(). It will return value if key exists else will return None

In [10]:
z = {"a": 1, "b": 2, "c": 3, "d": 4}

try:
    y = z["e"]
except KeyError as error:
    print("KeyError:" , error)

y = z.get("e")
print('using get:', y)
KeyError: 'e'
using get: None


One can use in statement to ckeck if a specific key is present in a dictionary or not.

Only membership of keys can be checked with in, one cannot use to check if a value exists in dict.

In [11]:
a = {"ant": 1, "bear":2, "cat": 3, "dog": 4, "elephant": 5}
a_dog = "dog" in a
a_wolf = "wolf" in a
a_1 = 1 in a

print("is dog present:", a_dog)
print("is wolf present:", a_wolf)
print("is 1 present:", a_1)
is dog present: True
is wolf present: False
is 1 present: False

Iterating with for and in and accessing the dict elements for updating them can be done as shown below.

In [12]:
a = {"ant": 1, "bear":2, "cat": 3, "dog": 4, "elephant": 5}

for i in a:
    a[i] +=10

print("updated a =", a) 
updated a = {'ant': 11, 'bear': 12, 'cat': 13, 'dog': 14, 'elephant': 15}

View Objects

Refer Docs for more details

One could use in for checking if key exists. But what if one needs to check if a value exists in a dict?

A simple work around could be to use values() method. it returns all the values of a dictionary as a dict_value class which is view object (i.e gives view like a list).
One can use this with in to check if the value present in the object returned by values()

In [13]:
a = {"ant": 1, "bear":2, "cat": 3, "dog": 4, "elephant": 5}
a_vals = a.values()

a_1 = 1 in a_vals
print("is 1 present:", a_1)
print("a_vals = ", a_vals)
is 1 present: True
a_vals =  dict_values([1, 2, 3, 4, 5])

Notice that though the object seems like list it is actually a dict_values object. The object is not subscriptable with index like list.

In [14]:
a = {"ant": 1, "bear":2, "cat": 3, "dog": 4, "elephant": 5}
a_vals = a.values()
print("a_vals before = ", a_vals)
a_vals before =  dict_values([1, 2, 3, 4, 5])

Similarly there is keys() method which will return dict_keys class which again is a view object.

In [15]:
a = {"ant": 1, "bear":2, "cat": 3, "dog": 4, "elephant": 5}
a_keys = a.keys()
print("a_keys = ", a_keys)
a_keys =  dict_keys(['ant', 'bear', 'cat', 'dog', 'elephant'])

Previously we converted sequence type data with paired elements into a dictionary. We could also do the inverse, we could convert dictionary into dict_items view object, which equivalent to tuple pairs.

For which we could use items() method which returns dict_items which again is a view object.

In [16]:
a = {"ant": 1, "bear":2, "cat": 3, "dog": 4, "elephant": 5}
a_items = a.items()
print("a_keys = ", a_items)
a_keys =  dict_items([('ant', 1), ('bear', 2), ('cat', 3), ('dog', 4), ('elephant', 5)])

One Significance of having view object is any changes or updates to the dictionary object will reflect in view object. These objects are references to the actual dictionary.

In Python2 values() items() keys() methods returned seperate list object, that did not preservethe link. Only in Python3 this was changed to view object.

In [17]:
a = {"ant": 1, "bear":2, "cat": 3}
a_vals = a.values()
a_keys = a.keys()
a_items = a.items()
print("Before changing:\n", a_vals, "\n", a_keys, "\n", a_items)

a["ant"] = 10 # change value
a["dog"] = 40 # addd new element

print("After changing:\n", a_vals, "\n", a_keys, "\n", a_items)
Before changing:
 dict_values([1, 2, 3]) 
 dict_keys(['ant', 'bear', 'cat']) 
 dict_items([('ant', 1), ('bear', 2), ('cat', 3)])
After changing:
 dict_values([10, 2, 3, 40]) 
 dict_keys(['ant', 'bear', 'cat', 'dog']) 
 dict_items([('ant', 10), ('bear', 2), ('cat', 3), ('dog', 40)])

keys(), values() and items() would useful in many cases involving complex operations with dictionary data. Limiting the usage to simple examples here.

One could convert these view objects into other data structures easily with creation functions like list(), tuple() and set() and use them further.



Inserting elements

Dictionary are mutable object, so one can add elements to existing object without creating new object.

For adding new element one can directly subscript the new key with [] and assign the corresponding values.

In [18]:
a = {"ant": 1, "bear":2, "cat": 3, "dog": 4, "elephant": 5}
a["fox"] = 6

print('a=', a)
a= {'ant': 1, 'bear': 2, 'cat': 3, 'dog': 4, 'elephant': 5, 'fox': 6}

One alternative way would be to use update() to add new elements.

Update takes a dictionary or sequences with paired elements. So this could be used for bulk updates as well.

In [19]:
a = {"ant": 1, "bear":2, "cat": 3}
a.update({'dog': 4})
a.update( [("dog", 4), ("elephant", 5)] )
print(a)
{'ant': 1, 'bear': 2, 'cat': 3, 'dog': 4, 'elephant': 5}

The keys of dictionar are unique, thus using subscript [] or update() on existing keys will change the values of the keys.

The subscripting [] will be faster for adding single element compared to update() method.
But update() will be and faster useful for bulk update with large number of elements compared to iterating.

In [20]:
%%timeit -n1 -r10
## Using Subscripting
a = {}
for i in range(100000):
    a["apple"] = 1
1 loop, best of 10: 4.53 ms per loop
In [21]:
%%timeit -n1 -r10
## Using update()
a = {}
for i in range(100000):
    a.update({"apple": 1})
1 loop, best of 10: 17.5 ms per loop

One can see that subscripting [] is faster for updating single element compare to update().

But when large umber of elements has to be updated update() will be more readable and faster.

In [22]:
# variables initalization
a = {}
b = [(i,i*100) for i in range(10)] #list comprehension
print('list = ',b)
print('after converting to dict =', dict(b))
list =  [(0, 0), (1, 100), (2, 200), (3, 300), (4, 400), (5, 500), (6, 600), (7, 700), (8, 800), (9, 900)]
after converting to dict = {0: 0, 1: 100, 2: 200, 3: 300, 4: 400, 5: 500, 6: 600, 7: 700, 8: 800, 9: 900}
In [0]:
b = [(i,i*100) for i in range(1000)]
In [24]:
%%timeit -n1 
## Using Subscripting
for i in range(100000): #repeat 1M times 
    for (k, v) in b:
        a[k] = v
1 loop, best of 3: 6.01 s per loop
In [25]:
%%timeit -n1 
## Using Subscripting
for i in range(100000): #repeat 1M times 
    a.update(b)    
1 loop, best of 3: 2.62 s per loop

It can be seen that update() is faster than iterating with subscripting [].



There is a setdefault() method referenced to a key, does 2 different things based on condition.

  • If key is present - returns the value of that key -if key is not present - creates a new element with the key and initalizes to value passed as argument and returns the value.
In [26]:
a = {"ant": 1, "bear":2, "cat": 3}

n_val = a.setdefault('cat', 22)
print("If value exists returned:", n_val)
print("a =", a )

n_val = a.setdefault('wolf', 22)
print("If value does not exist returned:", n_val)
print("a =", a )
If value exists returned: 3
a = {'ant': 1, 'bear': 2, 'cat': 3}
If value does not exist returned: 22
a = {'ant': 1, 'bear': 2, 'cat': 3, 'wolf': 22}


Deleting elements

If one wants to remove any element, one can use pop().

In [27]:
a = {"ant": 1, "bear":2, "cat": 3, "dog": 4, "elephant": 5}
a.pop("bear")

print("a =", a)
a = {'ant': 1, 'cat': 3, 'dog': 4, 'elephant': 5}

There also exists popitem() which removes the item that got inserted last. One would recollect that dict doesn't preserve ordering, hence popitem() will be useful to remove elements in Last-In-First-Out basis from dict.

This behaviour of removing newly added elemt is from Python3.7, in older versions popitem() remove a random element from dict.

In [28]:
a = {"ant": 1, "bear":2, "cat": 3, "dog": 4, "elephant": 5}
print("Initial a =", a)
a.popitem()
print("1st pop: a =", a)
a.popitem()
print("2nd pop: a =", a)
Initial a = {'ant': 1, 'bear': 2, 'cat': 3, 'dog': 4, 'elephant': 5}
1st pop: a = {'ant': 1, 'bear': 2, 'cat': 3, 'dog': 4}
2nd pop: a = {'ant': 1, 'bear': 2, 'cat': 3}

Though initailized together, the elemets are created sequentially. Thus last entry is created at last, so popitem() removes it.

When some new element is added it gets removed when popitem() is called.

In [29]:
a = {"ant": 1, "bear":2, "cat": 3}
print("Initial a =", a)
a.popitem()
print("1st pop: a =", a)

a.update({"dog": 4, "elephant": 5})
print("New a =", a)
a.popitem()
print("2nd pop: a =", a)
Initial a = {'ant': 1, 'bear': 2, 'cat': 3}
1st pop: a = {'ant': 1, 'bear': 2}
New a = {'ant': 1, 'bear': 2, 'dog': 4, 'elephant': 5}
2nd pop: a = {'ant': 1, 'bear': 2, 'dog': 4}


One can use del to remove specific elements refereced by keys or to remove entire object.

In [30]:
a = {"ant": 1, "bear":2, "cat": 3, "dog": 4, "elephant": 5}
del a['ant'] 
print('After deletion: a =', a)
After deletion: a = {'bear': 2, 'cat': 3, 'dog': 4, 'elephant': 5}

When used for removing elements the existing object is modified. As one can see below that the object id remains the same.

In [31]:
a = {"ant": 1, "bear":2, "cat": 3, "dog": 4, "elephant": 5}
print("Id of `a` before:", id(a))
del a["bear"] # deletes elements from index 2 to 5 in the object.
print('`a` after modification =', a)
print("Id of `a` after:", id(a))
Id of `a` before: 140113098949832
`a` after modification = {'ant': 1, 'cat': 3, 'dog': 4, 'elephant': 5}
Id of `a` after: 140113098949832

One can use del on the object as whole.

In [32]:
c = {"ant": 1, "bear":2, "cat": 3, "dog": 4, "elephant": 5}
del c # reference to object is removed

try: 
    print(c) #This will throw ERROR since c is not defined
except NameError as error:
    print("Error:", error)
Error: name 'c' is not defined

When using del on the entire object, the object remains same and exists in memory. Only the reference between variable and object is broken and variable becomes undefined.

In [33]:
d = {"ant": 1, "bear":2, "cat": 3, "dog": 4, "elephant": 5}
print("Id of `d`:", id(d))
e = d
del d # reference to object is removed

print("Id of `e`:", id(e))
print("Object in e:", e) # `e` still references the object
Id of `d`: 140113099014936
Id of `e`: 140113099014936
Object in e: {'ant': 1, 'bear': 2, 'cat': 3, 'dog': 4, 'elephant': 5}


To remove all elements one can use clear() method, this will retain the old object and only remove the elements

In [34]:
a = {"ant": 1, "bear":2, "cat": 3, "dog": 4, "elephant": 5}
a.clear()
print("a =", a)
a = {}

At this point a question might come to mind that why should we clear elements of a dict object and reuse the same object

Check the below case where we link the object in variable a to another variable b by shallow copy. And we are changing the dict object in a by assigning a new dict.

In [35]:
a = {"ant": 1, "bear":2, "cat": 3, "dog": 4, "elephant": 5}
b = a # a & b refers to same object
print("before modifying a: \na =", a, " id= ",id(a) )
print("b =", b, " id= ",id(b))

a = {"Zebra": 26} # a now point to new object
print("after modifying a: \na =", " id= ",id(a) )
print("b =", " id= ",id(b))
before modifying a: 
a = {'ant': 1, 'bear': 2, 'cat': 3, 'dog': 4, 'elephant': 5}  id=  140113098951704
b = {'ant': 1, 'bear': 2, 'cat': 3, 'dog': 4, 'elephant': 5}  id=  140113098951704
after modifying a: 
a =  id=  140113098948968
b =  id=  140113098951704

The object initially a had, is not destroyed or modified, it still exists in memory and b still references the old object, which might not be desired in some case.

Also the link between objects a and b i.e object referenced by b is no longer associated to a.The link is broken which could be bad in some case.

In [36]:
a = {"ant": 1, "bear":2, "cat": 3, "dog": 4, "elephant": 5}
b = a # a & b refers to same object

print("Initially: \na =", a," id= ",id(a))
print(" b =", b," id= ",id(b))

a.clear()

print("clearing a: \na =", a, " id= ",id(a))
print("b =", b, " id= ",id(b))
Initially: 
a = {'ant': 1, 'bear': 2, 'cat': 3, 'dog': 4, 'elephant': 5}  id=  140113099015440
 b = {'ant': 1, 'bear': 2, 'cat': 3, 'dog': 4, 'elephant': 5}  id=  140113099015440
clearing a: 
a = {}  id=  140113099015440
b = {}  id=  140113099015440



Duplicating

It may not be the case always to have same object in two different variables.

Sometimes one may want to create a copy of a dictionary and do some modification independent of other. In such cases it is better create two different objects. For which one can use copy() which creates a deep copy of the object.
Alternatively, one can use dict() which is used to create new dict from sequence type object.

In [37]:
a = {"ant": 1, "bear":2, "cat": 3, "dog": 4, "elephant": 5}
b = a.copy() # deep copy
b.pop("ant")

c = dict(a)
c.pop("cat")

print("a =", a, " id=",id(a))
print("b =", b, " id=",id(b))
print("b =", c, " id=",id(c))
a = {'ant': 1, 'bear': 2, 'cat': 3, 'dog': 4, 'elephant': 5}  id= 140113098950120
b = {'bear': 2, 'cat': 3, 'dog': 4, 'elephant': 5}  id= 140113099016448
b = {'ant': 1, 'bear': 2, 'dog': 4, 'elephant': 5}  id= 140113098820488

But one would wonder what could be the difference between copy() and dict().

dict() function takes any iteratable data objects with paired elements, iterates through elements converts to dict type object.

whereas copy() method directly creates a copy of dict object. This makes copy() method faster than list()

In [38]:
%%timeit -n1 -r10
a = {"ant": 1, "bear":2, "cat": 3, "dog": 4, "elephant": 5}
for i in range(1000000):
    b = a.copy
1 loop, best of 10: 45.4 ms per loop
In [39]:
%%timeit -n1 -r10
a = {"ant": 1, "bear":2, "cat": 3, "dog": 4, "elephant": 5}
for i in range(1000000):
    b = dict(a)
1 loop, best of 10: 267 ms per loop


Other functionalities

One can find the total number of elements in a dict using len() function, which will return the number of elements (i.e) number of keys present in the dictionary

In [40]:
a = {"ant": 1, "bear":2, "cat": 3, "dog": 4, "elephant": 5}
b = { 1: {"A": 11, "B": 22}, 
      2: {"C": 11, "D": 22}   } #dict with dict objects

print("Num of elements in a:", len(a))
print("Num of elements in b:", len(b))
Num of elements in a: 5
Num of elements in b: 2

It can be seen that for b it show 2, len() considers only objects it contains, does not take into account elements present in the object. So b has two dict elements which is returned.



There would be cases where one will be in need to find sum of all keys in a dict.
It may not seem meaningful but there may come a necessacity

For which one can use sum() function

In [41]:
b = { 1: "mango", 2: "apple"   } 
s = sum(b)
print("sum is ", s)
sum is  3

But the keys has to have arithmetically summable object like float, int; else one will get error.

In [42]:
a = {"ant": 1, "bear":2, "cat": 3, "dog": 4, "elephant": 5}
try:
    sum(a) # this will throw error
except Exception as error:
    print("Error in a:", error)


b = { (1,2,3): "First", (4,5,6):  "Second"  } #list object
try:
    sum(b) # this will throw error
except Exception as error:
    print("Error in b:", error)
Error in a: unsupported operand type(s) for +: 'int' and 'str'
Error in b: unsupported operand type(s) for +: 'int' and 'tuple'


Operators

No arithmetic operators can be used with dictionary.



Please refer to the Padhai-One Github repo for updates & other notebooks