Python Comparison Techniques
While conventional equality operators like ==
and inequalities like <
or >
are the main actors in comparison operations, techniques behind how to use these tools make a huge difference between good and exceptional code.
In this blog post, I'd like to introduce simple and complex comparison Pythonic operations that are very useful, especially while handling data.
I've divided the article into two main sections: pure pythonic, and using numpy.
Pure Pythonic
Identity Check
In Python, the is
operator is used for identity checks. It checks if two variables reference the same object in memory. Unlike the equality (==
) operator, which checks if the values are equal, the is
operator checks if the variables refer to the exact same object.
Example of identity check:
a = [1, 2, 3]
b = [1, 2, 3]
c = a
print(a is b) # False, different objects
print(a is c) # True, same object as 'a'
Chained Comparison
Chained comparisons allow you to make multiple comparisons in a single line. For example, instead of writing a < b and b < c
, you can use a chained comparison a < b < c
. The chained comparison is equivalent to the logical and
of individual comparisons.
Example of chained comparison:
x = 5
print(0 < x < 10) # True, equivalent to (0 < x) and (x < 10)
y = 15
print(10 < x < 20 < y) # True, equivalent to (10 < x) and (x < 20) and (20 < y)
Using Numpy
Any/All Array Elements
import numpy as np
arr = np.array([1, 2, 3, 4])
# Check if any element is greater than 3
any_greater_than_3 = np.any(arr > 3)
# Check if all elements are greater than 0
all_greater_than_0 = np.all(arr > 0)
print(any_greater_than_3) # True
print(all_greater_than_0) # True
Composite Conditions
import numpy as np
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([2, 2, 3, 4])
# Element-wise comparison using logical operators
result = np.logical_and(arr1 > 1, arr2 == 2)
print(result) # [False True False False]
Structured Array Comparison
Structured arrays in NumPy allow you to define arrays with multiple fields (columns), each having a specified data type. Structured array comparison involves comparing arrays based on specific fields or conditions.
import numpy as np
# Define a structured array with fields 'name' and 'age'
data = np.array([('Alice', 25), ('Bob', 30), ('Charlie', 22)], dtype=[('name', 'U10'), ('age', int)])
# Compare based on the 'age' field
result = np.sort(data, order='age')
print(result)
# [('Charlie', 22) ('Alice', 25) ('Bob', 30)]
String Array Comparison
NumPy provides methods for comparing arrays of strings, taking into account lexicographic order or using specific comparison functions.
import numpy as np
# String array comparison
arr1 = np.array(['apple', 'banana', 'cherry'])
arr2 = np.array(['apple', 'orange', 'banana'])
# Element-wise lexicographic comparison
result_lex = np.core.defchararray.less(arr1, arr2)
print(result_lex) # [False True False]
# Using np.char.equal for equality comparison
result_eq = np.char.equal(arr1, arr2)
print(result_eq) # [ True False False]
Custom Comparison Functions
You can use custom comparison functions with NumPy's np.vectorize
or np.frompyfunc
to apply element-wise comparisons based on your criteria.
import numpy as np
# Custom comparison function
def custom_comparison(x, y):
# Compare absolute difference
return np.abs(x - y) < 2
# Create a vectorized function
vectorized_comparison = np.vectorize(custom_comparison)
arr1 = np.array([1, 5, 9])
arr2 = np.array([3, 5, 8])
result = vectorized_comparison(arr1, arr2)
print(result) # [ True True False]
Membership Test
np.isin
is useful for testing whether each element of an array is contained in another array.
import numpy as np
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([2, 4, 6])
result = np.isin(arr1, arr2)
print(result) # [False True False True]
Broadcasting
Broadcasting allows NumPy to perform operations on arrays of different shapes. It's a powerful feature for array comparisons.
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Broadcasting comparison with a scalar
bigger_than_three = arr > 3
print(bigger_than_three)
# [[False False False]
# [ True True True]]
arr2 = np.array([[4, 5, 6], [1, 2, 3]])
print(arr2[bigger_than_three])
# [1, 2, 3]
Almost Equals
The np.isclose()
function is useful for element-wise comparison of two arrays with a tolerance for floating-point numbers, which is particularly relevant for numerical computations.
arr1 = np.array([1.0, 2.0, 3.0])
arr2 = np.array([1.1, 2.0, 3.1])
result = np.isclose(arr1, arr2, rtol=0.1)
print(result)
# Output: [True True False]