Difference makes the DIFFERENCE
For more reading or to go indepth refer official reference of python language https://docs.python.org/3/reference/
Also refer to the Padhai-One Github repo for updates & other notebooks
Set data objects are mutable objects. They hold unique elements, i.e elements cannot have duplicates. Set hashes the elements and stores them, making it faster than other structures for many operations.
Sets are efficient for computating mathematical set operations such as intersection, union, difference etc.
sets can be created in 2 ways
{}
curly brackets, with elements seperated by comma.set()
function, which takes any sequence type or iterable objects (like other list, tuple) as argument and converts to set# set with values
c = {1,2,3,4,5}
d = set((6,7,8,8,9,9)) # create set out of tuple (sequence type)
e = c # shallow copy from a set
# Print
print("\nc = ",c, "\nd = ",d, "\ne = ",e )
One can notice that duplicate elements from tuple got removed in set. Set can not/will not have duplicate elements.
There is catch in creating empty sets.
# empty set
a = set()
b = {}
c = {1,2,3,4,5}
print("Type of a:", type(a), "\nType of b:", type(b), "\nType of c:", type(c) )
It can be seen that the type of a
is set where as b
is a dictionary.
when one uses {}
empty it considers it as empty dictionary. When have elements seperated be ,
comma inside curly brakets only then it takes it as a set as seen in c
Possible way to remember this is to think set as dictionary with just keys and no values.Interpretor assumes empty
{}
as dictionary, and explicitly adding elements as keys with no value makes interpretor consider it as set.
From language evolution perspective, In python2{}
was reserved for dictionary and sets cannot be represented with them. One has to useset()
to define a set, only in python3 this was changed for easiness.
what other objects can we convert to set?
a = (1,2,3,4) # tuple
b = {5,6,7,8} # set
c = {'a': 1, 'b':2, 'c':3} #dictionary
d = "truth alone triumphs"
p = set(a)
q = set(b)
r = set(c)
s = set(d)
print("p = ", p, "\nq =", q, "\nr =", r, "\ns =" ,s)
One can notice that for dictionary only the keys are taken for creating the set not the values. And for strings all the characters including 'space' is converted into elements of a string.
One can easily observe two things, firstly the ordering of the elements is not preserved, secondly all data is deduped and only unique lemets are present.
Sets have restriction on what type of objects it can have as a set. As mentioned earlier all elements of sets are hashed, which mkaes it necessary that only immutable elements can be present set.
a = {"apple", "orange", 'papaya', "tomato"} # strings types are immutable, hence hashable
b = {(1,2), (3,4), (5,6)} # Tuple are immutable
print("a:", a)
print("b:", b)
try:
c = {[1,2], [4,5], [6,7]} # Lists are mutable(unhashable) so cannot be used as set element
except Exception as error:
print("Error @@ c:", error)
try:
d = {{1,2}, {4,5}, {6,7}} # Sets are mutable(unhashable) so cannot be used as set element
except Exception as error:
print("Error @@ d:", error)
Even set object cannot be nested inside a set.
Mutable objects are considered not hashable because the objects can change at any point, so hash will not be preserved. Hash values of objects in set should be persistent.
Sets are not order preserving, and location of elements in memory will be based on hash values. Hence one cannot use index for accessing the set elements.
a = {1,2,3,4,5}
try:
b = a[0]
except Exception as error:
print("Error:", error)
In order to access set elements directly on has to iterate using for
and in
statements.
c = {8,2,9,4,1,6,7}
for i in c:
print(i) # i will give all elements of c
Can one find if a element is present in a set?
One can do that using the in
statement. Since set elements are hashed the membership testing(if elements exist or not) is much faster compared to other structures.
a = {"cat", "dog", "parrot"}
a_dog = "dog" in a
a_wolf = "wolf" in a
print("is dog present:", a_dog)
print("is wolf present:", a_wolf)
Below is comparison between lists and sets for membership test.
Note: Access time of lists and sets are equivalent or sometimes faster for list.
%%timeit -n1 -r10
a = {1,2,3,4,5,6,7,8,9}
for i in range(100000):
t = 7 in a
%%timeit -n1 -r10
a = [1,2,3,4,5,6,7,8,9]
for i in range(100000):
t = 7 in a
In sets the check is faster since it is based on hashing, where as in list it has to check all elements sequentially.
For the nested sets, it only checks the top level objects, and will not check the objects inside.
b = { (1,2,3,4,5),
(11,12,13,14,15,16,17),
(21,22,23,24,25,26)}
b1 = 11 in b
print("Is 11 in b:", b1)
As said checking b
for 11
then it fails because all elements of b is a tuple and nothing matches 11
.
Set is mutable object, so one can add elements to existing object without creating new object.
One can use add()
method to add elements to a set.
Note: Please remember that only hashable objects could be added to set
# Appending new element
a = {1,2,3}
a.add(55)
b = {(1,2), (5,3), (4,5)}
b.add((8,9))
# Print
print("a = ",a)
print("b = ",b)
What if one wants to append multiple elements to a set?
a = {10, 20, 30, 40}
b = {40,50,60} # Set
c = [60, 70, 70, 80] # List
a.update(b)
a.update(c)
print("a =", a)
One can see that all the elements are deduped.
Both the add()
and update()
will function like mathematical union, owing to nature of the sets.
Note that both functions will modify the existing set object.
If one needs to remove an element, one can use remove()
.
a = {11, 12, 13, 14, 15, 16, 17}
a.remove(13) # removes 13 from set
print("a =", a)
The remove()
will throw KeyError when one tries to remove an element that does not exist in the set.
In Order to avoid that one can use discard()
which removes element if exists or else does nothing and will not throw error.
a = {11, 12, 13, 14, 15, 16, 17}
try:
a.remove(19) # This will throw error
except KeyError as error:
print("KeyError:", error)
But Why do we need remove in the first place we could use discard always right?
In some cases it is useful to have error. One might be trying to remove an element which is expected to be in set. If it is not there then there might be something wrong somewhere in code, and one may want to do something in this case with except block.
One can argue that we can test membership and act everytime before removing. but
remove()
provides simple ease of use.
One can use pop()
over a set. But unlike in list where one provides index, in sets it does not take index, since indexing is not valid an meaningless in sets.
So what does it DO?
It removes element which is at random from the set.
a = {22,33,44,55, 66,77,88}
a.pop()
print("a =", a)
One can use del
on the object as whole.
c = {5,6,7,8,9}
del c # reference to object is removed
try:
print(c) #This will throw ERROR since c is not defined
except NameError as error:
print("Error:", error)
When using del
on the entire object. The object remains same and exists in memory. Only the reference between variable and object is broken and variable becomes undefined.
d = {10, 20, 30, 40}
print("Id of `d`:", id(d))
e = d
del d # reference to object is removed
print("Id of `e`:", id(e))
print("Object in e:", e) # `e` still references the object
To remove all elements one can use clear()
method, this will retain the old object and only remove the elements
a = {100, 200, 300, 400, 500, 600}
a.clear()
print("a =", a)
At this point a question might come to mind that why should we clear elements of a set object and reuse the same object
Check the below case where we link the object in variable a
to another variable b
by shallow copy. And we are changing the set object in a
by assigning a new set.
a = {11, 22, 33, 44, 55, 66, 77, 88}
b = a # a & b refers to same object
print("before modifying a: \na =", a, " id= ",id(a) )
print("b =", b, " id= ",id(b))
a = {11} # a now point to new object
print("after modifying a: \na =", " id= ",id(a) )
print("b =", " id= ",id(b))
The object initially a
had, is not destroyed or modified, it still exists in memory and b
still references the old object, which might not be desired in some case.
Also the link between objects a
and b
i.e object referenced by b
is no longer associated to a
.The link is broken which could be bad in some case.
a = {11, 22, 33, 44, 55, 66, 77, 88}
b = a # a & b refers to same object
print("Initially: \na =", a," id= ",id(a))
print(" b =", b," id= ",id(b))
a.clear()
print("clearing a: \na =", a, " id= ",id(a))
print("b =", b, " id= ",id(b))
It may not be the case always to have same object in two different variables.
Sometimes one may want to create a copy of a set and do some modification independent of other. In such cases it is better create two different objects. For which one can use copy()
which creates a deep copy of the object.
Alternatively, one can use set()
which is used to create new set from sequence type object.
a = {10, 20 , 30, 40 ,50}
b = a.copy() # deep copy
b.remove(10)
c = set(a)
c.remove(50)
print("a =", a, " id=",id(a))
print("b =", b, " id=",id(b))
print("b =", c, " id=",id(c))
But one would wonder what could be the difference between copy()
and set()
.
set()
function takes any iteratable data objects, iterates through elements converts to set type object. This is same for set type objects.
whereas copy()
method directly creates a copy of set object. This makes copy()
method faster than set()
%%timeit -n1 -r10
a = {1,2,3,4,5,6,7,8,9}
for i in range(1000000):
b = a.copy
%%timeit -n1 -r10
a = {1,2,3,4,5,6,7,8,9}
for i in range(1000000):
b = set(a)
One can find the total number of elements in a
set using len()
function, which will return the number of elements.
a = {1.1, 2.2, 3.3, 4.4, 5.5, 6.6}
b = {(1,2), (4,5,6)} #set with tuple objects
print("Num of elements in a:", len(a))
print("Num of elements in b:", len(b))
It can be seen that for b
it show 2, len()
considers only objects it contains, does not take into account elements present in the object. So b
has two objects which is returned.
There would be cases where one will be in need to find sum of all elements in a set.
For which one can use sum()
function
a ={1.1, 2, 3.3, 4}
s = sum(a)
print("sum is ", s)
But the set has to have arithmeticaly summable object like float, int; else one will get error.
a = {"dog", "cat"} # string objects
try:
sum(a) # this will throw error
except Exception as error:
print("Error in a:", error)
b = {1,2, (1,2) }
try:
sum(b) # this will throw error
except Exception as error:
print("Error in b:", error)
In second case it has two int object and a tuple object. Though the inner tuple has int object, function doesn't not care about that. It just considers the object types in the first level. Adding up a int and tuple is not possible.
Sets are datastructures that are intented to capture the Mathematical set nature. Thus python set has many methods for set operations.
All these set operations gives results as seprate objects and does not update the existing objects.
For union) of two sets A ∪ B use union()
a = {1,2,3,4}
b = {4,5,6,7}
# returns result as seperate object
c = a.union(b)
d = b.union(c)
print("c=", c)
print("d=", d)
Notice that both a.union(b)
and b.union(a)
are same. Also mathematically set union and intersection are commutative
For intersection A ∩ B use intesection()
a = {1,2,3,4,5}
b = {4,5,6,7,8}
# returns result as seperate object
c = a.intersection(b) # same as b.intersection(a)
print("c=", c)
For checking if two sets are disjoint (A∩B=∅) or not use isdisjoint()
a = {1,2,3,4,5}
b = {4,5,6,7,8}
c = {10, 11, 12, 13}
## a.isdisjoint(b) same as b.isdisjoint(a)
print("A and B are Disjoint sets:", a.isdisjoint(b))
print("B and C are Disjoint sets:", b.isdisjoint(c))
For checking subsets A ⊆ B use issubset()
Note the subset is not commutative as union, intersection. So a.issubset(b)
not equal to b.issubset(a)
a = {1,2,3,4,5}
b = {2,3}
## a.issubset(b) not equal to b.issubset(a)
print("A is a subset of B:", a.issubset(b))
print("B is a subset of A:", b.issubset(a))
For checking superset A ⊇ B use issuperset()
Note the subset is not commutative as union, intersection. So a.issuperset(b)
not equal to b.issuperset(a)
a = {1,2,3,4,5}
b = {2,3}
## a.issuperset(b) not equal to b.issuperset(a)
print("A is a superset of B:", a.issuperset(b))
print("B is a superset of A:", b.issuperset(a))
For finding set difference) between A and B use difference()
A\B = {x ∈ A | x ∉ B}
Set difference is not commutative.
a = {1,2,3,4}
b = {3,4,5,6}
## a.difference(b) not equal to b.difference(a)
print("A difference B:", a.difference(b))
print("B difference A:", b.difference(a))
For finding symetric difference of sets use symmetric_difference()
A △ B = (A\B) ∪ (B\A)
Symmetric difference is commutative
a = {1,2,3,4}
b = {3,4,5,6}
## a.symmetric_difference(b) not equal to b.symmetric_difference(a)
print("A symmetric_difference B:", a.symmetric_difference(b))
print("B symmetric_difference A:", b.symmetric_difference(a))
There will be requirements to update an existing set with results union, intersection, difference operation between two set.
Typical way could be seen below
a = {1,2,3,4,5}
b = {4,5,6,7}
temp = a.intersection(b)
a.clear() # clear elements without changing object ID
a.update(temp)
print("a=", a)
This pattern is required in many cases, there exist a direct update methods for each methods that involves creating new set.
For union we can directly use update()
seen in inserting elements, since sets are unique, elements not present inset gets added which is equivalent to union operation. For other methods there exists intersection_update()
, difference_update()
, Symmetric_difference_update()
a = {1,2,3,4,5}
b = {4,5,6,7}
a.update(b) # union and update
c = {11,12,13,14,15}
d = {14,15,16,17}
c.intersection_update(d) # intersection and update
e = {11,22,33,44,55}
f = {44,55,66,77}
e.difference_update(f) # difference and update
g = {10,20,30,40,50}
h = {40,50,60,70}
g.symmetric_difference_update(h) # symmetric difference and update
print("a =", a)
print("c =", c)
print("e =", e)
print("g =", g)
Note that all updates change the object whose member method is invoked, i.e a.update(b)
changes a
not b
.
The result will be stored in a new set object, existing objects will not get modified.
Only -
operator can be used on set data types. The reslut will be equivalent to set difference. A\B = {x ∈ A | x ∉ B}
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
c = a - b
print("c =", c)
One cannot use /
or *
or +
operators on a set
a = {4, 5}
b = {1,2}
try:
c = a/b
except Exception as error:
print("Error on /:", error)
try:
c = a+b
except Exception as error:
print("Error on +:", error)
try:
c = a*b
except Exception as error:
print("Error on *:", error)
Frozen set is immutable set type. All other properties remain same as set except ones that rely on mutable nature of sets.
Frozen set can be created only by
frozenset()
function, which takes any sequence type or iterable objects (like other list, tuple) as argument and converts to set# set with values
c = frozenset([1,2,3,4,5]) #create set out of list
d = frozenset((6,7,8,8,9,9)) # create set out of tuple (sequence type)
e = c # shallow copy from a set
# Print
print("\nc = ",c, "\nd = ",d, "\ne = ",e )
Frozen Sets have same restrictions as set on what type of objects it can have. Only immutable and hashable object can be elements of frozenset.
a = {"apple", "orange", 'papaya', "tomato"} # strings types are immutable, hence hashable
b = {(1,2), (3,4), (5,6)} # Tuple are immutable
print("a:", a)
print("b:", b)
try:
c = {[1,2], [4,5], [6,7]} # Lists are mutable(unhashable) so cannot be used as set element
except Exception as error:
print("Error @@ c:", error)
try:
d = {{1,2}, {4,5}, {6,7}} # Sets are mutable(unhashable) so cannot be used as set element
except Exception as error:
print("Error @@ d:", error)
If one recollects set type data cannot be nested in set since sets are mutable.
Since FrozenSet are immutable they can be nested as elements of both set and frozenset.
a = frozenset((1,2,3))
b = frozenset((4,5,6))
c = {a , b} # Frozen set inside set
d = frozenset((a,b))
print("c =", c, "\nd =", d)
Accessing the frozen set is similar to regular sets.
c = frozenset((8,2,9,4,1,6,))
for i in c: # accessing elements
print(i) # i will give all elements of c
d = 8 in c # Membership test
print("Is 8 in c: ", d)
Since Frozen sets are immutable inserting elements is not possible. One has to convert to a set, update elements and cover t back to frozensets.
a = frozenset([1,2,3])
b = set(a)
b.add(4)
a = frozenset(b)
print("a =", a)
Only the entire object can be deleted. since Frozen sets are immutable.
c = frozenset((5,6,7,8,9))
del c # reference to object is removed
try:
print(c) #This will throw ERROR since c is not defined
except NameError as error:
print("Error:", error)
Like all other data objects del
removes the reference to the object from the variable.
Both shallow copy and deep copy have same behaviour as of sets.
a = frozenset((1,2,3))
c = a.copy() # deep copy
d = a # shallow copy
del a
print("c =", c)
print("d =", d)
Set methods the available in set objects for math operations are alos available in frozenset aswell. Refer the set for usage.
The operations can also be carried out between sets and frozenset. The return type will be of type of object from which the method is invoked.
a = {1, 2, 3, 4}
b = frozenset({3, 4, 5, 6})
c = b.union(b) # between frozenset and frozenset
d = a.union(b) # returns set type object
e = b.union(a) # returns frozenset type object
All the update()
based methods are not available since frozensets are immutable.
a = frozenset((1,2,3))
try:
a.update({1})
except Exception as error:
print("Error:", error)
Both len()
and sum()
functions returns similar results and behaves same as sets.
a = frozenset([1,2,3,4])
print("Length:", len(a) )
print("Sum:", sum(a) )
Similar to set only the -
can be used for set difference. All other operators like +
, *
, /
are not applicable.
a = {1, 2, 3, 4}
b = frozenset({3, 4, 5, 6})
c = a - b
print("c =", c)
Similar to methods the return type will be based on the first operand, (in this case type of a
)
Please refer to the Padhai-One Github repo for updates & other notebooks