Topics: object orientation, identity, equality, mutability, immutability, hashability, Python internals
Topics: object orientation, functional programming, closures, generators, generator coroutines, inheritance, composition, dataclasses.dataclass
Topics: decorators, higher-order decorators, class decorators, metaclasses, init_subclass, metaprogramming, exec, eval
Subscribe to our newsletter https://bit.ly/expert-python to stay up to date on our offerings and receive exclusive discounts.
We offer a wide variety of private training courses for your team on topics such as:
Our courses and seminars are designed with the “why” at the forefront of everything we do. As a result, the courses balance information, exercises, and case studies that help encourage attendee success.
Courses are developed to fit the needs of multiple levels of mastery. We strive to ensure that every attendee is taught personally and that all the time they commit to learning is magnified.
Once per quarter, we hold our Developing Expertise in Python course, open to the public! This course is three full days of intensively personalized hands-on instruction within a small cohort (≤10). Sessions begin with individual interviews with each attendee to assess current levels of understanding and set specific, measurable goals for their individual growth and professional development.
No lecture, no slides—the sessions are driven entirely by discussion around concrete live-coded examples with detailed prep & supplementary review materials (≥50 pages of background course notes and ≥10 hours of background videos) provided.
See our Organizer Page for info on upcoming dates!
Don’t see a course you need? Contact us at learning@dutc.io to get the curricula you’re looking for!
James Powell is the founder and lead instructor at Don’t Use This Code. He currently serves as Chairman of the NumFOCUS Board of Directors, helping to oversee the governance and sustainability of all of the major tools in the Python data analysis ecosystem (i.e., pandas, NumPy, Jupyter, Matplotlib). At NumFOCUS, he helps build global open source communities for data scientists, data engineers, and business analysts. He helps NumFOCUS run the PyData conference series and has sat on speaker selection and organizing committees for 18 conferences. James is also a prolific speaker: since 2013, he has given over seventy conference talks at over fifty Python events worldwide. In fact, he is the second most prolific speaker in the PyData and Python ecosystem (source: pyvideo.org).
print("Let's take a look!")
Python variable names are just that: names. They are names that we can use to refer to some underlying data.
The == operator in Python determines whether the objects referred to by two
names are “equal.” For a container like list, this means that the two objects
contain the same elements in the same order.
xs = [1, 20, 300]
ys = [1, 20, 300]
zs = [4_000, 50_000, 600_000]
print(
f'{xs == ys = }',
f'{xs == zs = }',
sep='\n',
)
For a container like dict, this means that the two objects contain the same
key value pairs; however, order is not considered.
d0 = {'a': 1, 'b': 20, 'c': 300}
d1 = {'c': 300, 'b': 20, 'a': 1}
d2 = {'d': 4_000, 'e': 5, 'f': 600_000}
print(
f'{d0 is d1 = }',
f'{d0 is d2 = }',
sep='\n',
)
For a collections.OrderedDict, however, the order is considered when
determinig equality.
from collections import OrderedDict
d0 = OrderedDict({'a': 1, 'b': 20, 'c': 300})
d1 = OrderedDict({'c': 300, 'b': 20, 'a': 1})
print(
f'{d0 == d1 = }',
sep='\n',
)
The is operator in Python determines whether the objects referred to by two
names are, in fact, the same object. Unlike ==, this has consistent meaning
irrespect of the type of the object.
You can specify what it means for two instances of a user-defined object to be
equal (“equivalent”; ==,) but there is no way to specify an alternate or
custom meaning for identity (is.)
from dataclasses import dataclass, field
from typing import Any
@dataclass
class T:
name : str
value : int
metadata : dict[str, Any] = field(default_factory=dict)
# do not consider `.metadata` for equality
def __eq__(self, other):
return self.name == other.name and self.value == other.value
x = T('abc', 123)
y = T('abc', 123, metadata={...: ...})
z = T('def', 456)
print(
f'{x == y = }',
f'{x == z = }',
sep='\n',
)
Similarly, while it is possible to overload many operators in Python, the
assignment and assignment-expression operators (= and :=) cannot be customised
in any fashion.
These operations are also called “name-bindings.”
x = y always means “x is a new name for the object that is currently
referred to by y.” Unlike in other programming languages, x = y cannot
directly trigger any other form of computation (e.g., a copy computation.)
However, since performing a name-binding sometimes involves assignment into a
dict representing the active scope, the assignment into the dict can
trigger other computations.
from collections.abc import MutableMapping
from logging import getLogger, basicConfig, INFO
logger = getLogger(__name__)
basicConfig(level=INFO)
class namespace(dict, MutableMapping):
def __setitem__(self, key, value):
logger.info('namespace.__setitem__(key = %r, value = %r)', key, value)
super().__setitem__(key, value)
class TMeta(type):
@staticmethod
def __prepare__(name, bases, **kwds):
return namespace()
class T(metaclass=TMeta):
x = [1, 2, 3]
y = x
An alternate way to determine whether two names refer to an identical object is
to check their id(...) values. The id(...) return value is a (locally,
temporally) unique identifier for an object. In current versions of CPython,
this corresponds to the memory address of the PyObject* for the object (but
this is not guaranteed.)
xs = [1, 20, 300]
ys = [1, 20, 300]
zs = xs
print(
# f'{xs is ys = }',
# f'{xs is zs = }',
f'{id(xs) = :#_x}',
f'{id(ys) = :#_x}',
f'{id(zs) = :#_x}',
sep='\n',
)
Another way to determine if two names refer to an identical object is to perform a mutation via one name and see whether the object referred to the other name has changed or not!
xs = [1, 20, 300]
ys = [1, 20, 300]
zs = xs
xs.append(4_000)
print(
f'{xs = }',
f'{ys = }',
f'{zs = }',
sep='\n',
)
Note that if two names refer to immutable objects, then those objects cannot be changed; therefore, we will not be able to observe a useful difference between these two names refering to identical objets or merely refering to equivalent objects. As a consequence, the CPython interpreter will try to save memory by “interning” commonly found immutable objects, such as short strings and small numbers. When “interning,” all instances of the same value are, in fact, instances of an identical object.
print(
# f'{id(eval("123")) = :#_x}',
# f'{id(eval("123")) = :#_x}',
# f'{id(eval("123_456")) = :#_x}',
# f'{id(eval("123_456")) = :#_x}',
f'{id(123) = :#_x}',
f'{id(123) = :#_x}',
f'{id(123_456) = :#_x}',
f'{id(123_456) = :#_x}',
sep='\n',
)
We have to use eval in the above example, since (C)Python code in a script
will undergo the “constant folding” optimisation.
from pathlib import Path
from sys import path
from tempfile import TemporaryDirectory
from textwrap import dedent
with TemporaryDirectory() as d:
d = Path(d)
with open(d / '_module.py', 'w') as f:
print(dedent('''
def h():
x = 123_456_789
y = 123_456_789
''').strip(), file=f)
path.append(f'{d}')
from _module import h
def f():
x = 123_456_789
y = 123_456_789
def g():
x = 123_456_789
y = 123_456_789
print(
f'{f.__code__.co_consts = }',
f'{g.__code__.co_consts = }',
f'{h.__code__.co_consts = }',
f'{f.__code__.co_consts[-1] is g.__code__.co_consts[-1] = }',
f'{f.__code__.co_consts[-1] is h.__code__.co_consts[-1] = }',
sep='\n',
)
The qualifications on “unique” are necessary. Recall that the CPython value for
id(...) is currently implemented as the memory address of the object the name
refers to (i.e., the value of the PyObject*.)
/* Python/bltinmodule.c */
static PyObject *
builtin_id(PyModuleDef *self, PyObject *v)
{
PyObject *id = PyLong_FromVoidPtr(v);
if (id && PySys_Audit("builtins.id", "O", id) < 0) {
Py_DECREF(id);
return NULL;
}
return id;
}
This can be used to do things we’re otherwise not supposed to, such as directly accessing Python objects.
from numpy import array
from numpy.lib.stride_tricks import as_strided
def setitem(t, i, v):
xs = array([], dtype='uint64')
if (loc := xs.__array_interface__['data'][0]) > id(t):
raise ValueError('`numpy.ndarray` @ {id(xs):#_x} allocated after `tuple` @ {id(t):#_x}')
xs = as_strided(xs, strides=(1,), shape=((off := id(t) - loc) + 1,))
ys = as_strided(xs[off:], strides=(8,), shape=(4,))
zs = as_strided(ys[3:], strides=(8,), shape=(i + 1,))
ys[2] += max(0, i - (end := len(t)) + 1)
zs[min(i, end):] = id(v)
t = 0, 1, 2, None, 4, 5
print(f'Before: {t = !r:<24} {type(t) = }')
setitem(t, 3, 3)
print(f'After: {t = !r:<24} {type(t) = }')
As a consequence of using the memory address as the value for id(…) coupled
with the finiteness of memory, we would expect that memory addresses would
eventually be reüsed. Therefore, across an arbitrary span of time, two
objects with the same id(…) may, in fact, be distinct.
xs = [1, 2, 3]
print(f'{id(xs) = :#_x}')
del xs
ys = [1, 2, 3, 4]
print(f'{id(ys) = :#_x}')
We should not store id(…) values for comparison later. We may be tempted to
do this in the case of unhashable objects, but the result will not be
meaningful.
class T:
def __hash__(self):
raise NotImplementedError()
obj0, obj1 = T(), T()
print(
# f'{obj0 in {obj0, obj1} = }',
# f'{id(obj0) in {id(obj0): obj0, id(obj1): obj1} = }',
sep='\n',
)
(We see a very similar problem with child processes upon termination of the parent process; in general, since PID are a finite resource and may be reüsed, it is incorrect for us to store and refer to child processes across a span of time in the absence of some isolation mechanism such as a PID namespace.)
### (unsafely?) reduce maximum PID
# <<< $(( 2 ** 7 )) /proc/sys/kernel/pid_max
typeset -a pids=()
() { } & pids+=( "${!}" )
until
() { } & pids+=( "${!}" )
(( pids[1] == pids[${#pids}] ))
do :; done
printf 'pid[%d]=%d\n' 1 "${pids[1]}" \
"${#pids}" "${pids[${#pids}]}"
Therefore, the following code may be incorrect (since the PID we are killing
may not necessarily be the process we think!)
sleep infinity & pid="${!}"
: ...
kill "${pid}"
Up to a name (re-)binding, equality is a transient property but identity is a permanent property. In other words, if two names refer to equal (“equivalent”) objects at some point in time, they may or may not remain equal at some later point in time. However, if two names refer to identical objects at some point in time, the only intervening action that can alter their identicalness is a name (re-)binding.
xs = [1, 2, 3]
ys = [1, 2, 3]
assert xs == ys
...
xs.clear()
...
assert xs != ys
xs = ys = [1, 2, 3]
assert xs is ys
...
# xs = ...
# (xs := ...)
# globals()['xs'] = ...
# from module import xs
...
assert xs is ys
Of course, if the two names refer to immutable objects, then their equivalence is also a permanent property!
Note that identity and equality are separate properties. Identicalness does not necessarily imply equivalence, nor does equivalence imply identiticalness.
# i. equal and identical
xs = ys = [1, 2, 3]
assert xs == ys and xs is ys
# ii. equal but not identical
xs, ys = [1, 2, 3], [1, 2, 3]
assert xs == ys and xs is not ys
# iii. identical and equal
x = y = 2.**53
assert x is y and x == y
# iv. identical but not equal
x = y = float('nan')
# x = y = None
class T:
def __eq__(self, other):
return False
assert x is y and x != y
However, note that if two names refer to identical objects, then we are
guaranteed that the id(…) values (when captured at a single point in
time during the lifetime of both objects) must have equivalent value.
x = y = object()
# two ways to state the same thing
assert x is y and id(x) == id(y)
# since id(…) returns an `int`,
# since (in CPython) large `int`s are not interned,
# since (in CPython) `id(…)` gives the memory address, and
# since (in CPython) these memory addreses are in the upper ranges
# the `int` that `id(x)` will be allocated separately than `int` that `int(y)`
# returns, leading to the following…
assert x is y and id(x) is not id(y)
Since equality can be implemented via the object model (but identity cannot,) it is possible for an object to not be equivalent to even itself!
class T:
def __eq__(self, other):
return False
obj = T()
assert obj is obj and obj != obj
Note that since == can be implemented but is cannot, and that (in CPython)
is is a pointer comparison, is checks are very likely to be consistently
faster than == checks.
/* Include/object.h */
#define Py_Is(x, y) ((x) == (y))
Therefore, the use of an enum.Enum may prove faster than an equivalent string
equality comparison in some cases. (Note, however, that object equality comparison
may just as well implement an identity “fast-path,” minimising the performance
improvement.
from time import perf_counter
from contextlib import contextmanager
from enum import Enum
@contextmanager
def timed(msg):
before = perf_counter()
try: yield
finally: pass
after = perf_counter()
print(f'{msg:<48} \N{mathematical bold capital delta}t: {after - before:.6f}s')
def f(x):
return x == 'abcdefg'
Choice = Enum('Choice', 'Abcdefg')
def g(x):
return x is Choice.Abcdefg
with timed('f'):
x = 'abcdefg'
for _ in range(100_000):
f(x)
with timed('g'):
x = Choice.Abcdefg
for _ in range(100_000):
g(x)
Generally, whether two containers are equivalent is determined by checking whether their contents are equivalent.
def __eq__(self, other):
if len(xs) != len(ys):
return False
for x, y in zip(xs, ys, strict=True):
if x != y:
return False
return True
xs = [1, 2, 3]
ys = [1, 2, 3]
print(
f'{xs == ys = }',
f'{__eq__(xs, ys) = }',
sep='\n',
)
Except in the implementation of list, there is a shortcut: we first perform a
(quicker) check to find the first non-identical object. Then switch to an
equality check.
def __eq__(self, other):
if len(xs) != len(ys):
return False
for x, y in zip(xs, ys, strict=True):
if x is y:
continue
if x != y:
return False
return True
z = float('nan')
xs = [1, 2, 3, z]
ys = [1, 2, 3, z]
print(
f'{xs == ys = }',
f'{__eq__(xs, ys) = }',
sep='\n',
)
This is distinct from how numpy.ndarray equality works!
from numpy import array
z = float('nan')
xs = [1, 2, 3, z]
ys = [1, 2, 3, z]
assert xs == ys
z = float('nan')
xs = array([1, 2, 3, z])
ys = array([1, 2, 3, z])
assert not (xs == ys).all()
So why should I care…?
xs = ...
ys = ...
print(f'{xs is ys = }')
print("Let's take a look!")
A “snapshot (copy)” is a static copy of some state at some point in time; a “live view” is a dynamic reference to some state.
xs = [1, 2, 3]
xs.append(4)
ys = xs.copy()
xs.append(5)
print(
f'{xs = }',
f'{ys = }',
sep='\n',
)
Whereas…
xs = [1, 2, 3]
ys = xs
xs.append(4)
xs.append(5)
print(
f'{xs = }',
f'{ys = }',
sep='\n',
)
We may desire a “live view” to eliminate “update anomalies”: cases where an update to one part of the system should be reflected in another part of the system, cases where we want a “single source of truth.”
from dataclasses import dataclass
from copy import copy
@dataclass
class Employee:
name : str
role : str
salary : float
@dataclass
class Entitlement:
employee : Employee
access : bool
employees = {
'alice': Employee('alice', 'programmer', 250_000),
'bob': Employee('bob', 'programmer', 225_000),
}
entitlements = {
k: Entitlement(employee=v, access=False)
for k, v in employees.items()
}
payroll_by_year = {
2020: {
k: copy(v) for k, v in employees.items()
},
}
employees['alice'].role = 'manager'
employees['alice'].salary *= 1.5
print(
f'{employees["alice"].role = }',
f'{entitlements["alice"].employee.role = }',
f'{payroll_by_year[2020]["alice"].role = }',
sep='\n',
)
Copies can be made explicitly or implicitly in a number of different ways.
from copy import copy
xs = [1, 2, 3]
# ys = xs
ys = [*xs]
# ys = list(xs)
# ys = xs.copy()
# ys = copy(xs)
xs.append(4)
print(
f'{xs = }',
f'{ys = }',
sep='\n',
)
We often want to distinguish between “shallow” and “deep” copies. A “shallow copy” is a copy of only the top “level” of a nested container structure. A “deep copy” copies all levels of the nested structure.
xs = [
[1, 2, 3],
[4, 5, 6, 7],
]
ys = xs.copy() # or `copy.copy(xs)`
xs[0].insert(0, 0)
xs.append([8, 9])
print(
f'{xs = }',
f'{ys = }',
sep='\n',
)
Whereas with a copy.deepcopy…
from copy import deepcopy
xs = [
[1, 2, 3],
[4, 5, 6, 7],
]
ys = deepcopy(xs)
xs[0].insert(0, 0)
xs.append([8, 9])
print(
f'{xs = }',
f'{ys = }',
sep='\n',
)
Given the two changes made to xs, we can distinguish between:
(There is a necessary asymmetry here: we cannot observe only the shallow change but not the deeper change.)
from copy import copy, deepcopy
xs = [
[1, 2, 3],
[4, 5, 6, 7],
]
ys = {
# i. ii.
(True, True): xs,
(True, False): copy(xs),
# (False, True): ...,
(False, False): deepcopy(xs),
}
xs[0].insert(0, 0) # i.
xs.append([8, 9]) # ii.
print(
f'{xs = }',
*ys.values(),
sep='\n',
)
Clearly, we want a “snapshot” if we want to capture the state as of a certain point in time and not observe later updates (i.e., mutations.) We want a “live view” view if we do want to see later updates.
The .keys() on a dict (which used to be called .viewkeys() in Python 2,)
is a live view of the keys of a dict. As a consequence, if we capture a
reference to it, then subsequently mutate the dict, we will see that mutation
when iterating over the reference we have captured.
d = {'abc': 123, 'def': 456, 'xyz': 789}
keys = d.keys() # “live view”
d['ghi'] = 999
for k in keys:
print(f'{k = }')
However, if we wanted a snapshot, we may need to explicitly trigger a copy.
d = {'abc': 123, 'def': 456, 'xyz': 789}
keys = [*d.keys()] # “snapshot”
d['ghi'] = 999
for k in keys:
print(f'{k = }')
Similarly, we can consider the different import styles to be an instance
of “early”- vs “late”-binding, which is similar phraseology around the idea
of “snapshots” vs “live views.”
from textwrap import dedent
from math import cos, sin, pi
print(
f'before {sin(pi) = :>2.0f}',
f' {cos(pi) = :>2.0f}',
sep='\n',
)
# don't “pollute” namespace
exec(dedent('''
import math
math.sin, math.cos = math.cos, math.sin
'''))
print(
f'after {sin(pi) = :>2.0f}',
f' {cos(pi) = :>2.0f}',
sep='\n',
)
However…
from textwrap import dedent
import math
print(
f'before {math.sin(math.pi) = :>2.0f}',
f' {math.cos(math.pi) = :>2.0f}',
sep='\n',
)
# don't “pollute” namespace
exec(dedent('''
import math
math.sin, math.cos = math.cos, math.sin
'''))
print(
f'after {math.sin(math.pi) = :>2.0f}',
f' {math.cos(math.pi) = :>2.0f}',
sep='\n',
)
In fact, we can think of dotted __getattr__ lookup as being a key
mechanism in getting a “live view” of some data.
from dataclasses import dataclass
@dataclass
class T:
x : int
obj = T(123)
x = obj.x
print(f'before {obj.x = } · {x = }')
obj.x = 456
print(f'after {obj.x = } · {x = }')
There are many subtle design distinctions we can make in our code that differ in terms of whether they provide us with a “live view“ or a “snapshot.”
These four variations have some subtle distinctions:
class Base:
def __repr__(self):
return f'{type(self).__name__}({self.values!r})'
# i.
class T1(Base):
def __init__(self, values):
self.values = values
# ii.
class T2(Base):
def __init__(self, values):
self.values = [*values]
# iii.
class T3(Base):
def __init__(self, values):
self.values = values.copy()
# iv.
class T4(Base):
def __init__(self, *values):
self.values = values
values = [1, 2, 3]
obj = T1(values)
values.clear()
print(f'i. {values = } · {obj = }')
values = [1, 2, 3]
obj = T2(values)
values.clear()
print(f'ii. {values = } · {obj = }')
values = [1, 2, 3]
obj = T3(values)
values.clear()
print(f'iii. {values = } · {obj = }')
values = [1, 2, 3]
obj = T4(*values)
values.clear()
print(f'iv. {values = } · {obj = }')
However, this is not the only distinction between the above!
from collections import deque
class Base:
def __repr__(self):
return f'{type(self).__name__}({self.values!r})'
# i.
class T1(Base):
def __init__(self, values):
self.values = values
# ii.
class T2(Base):
def __init__(self, values):
self.values = [*values]
# iii.
class T3(Base):
def __init__(self, values):
self.values = values.copy()
# iv.
class T4(Base):
def __init__(self, *values):
self.values = values
values = deque([1, 2, 3], maxlen=3)
obj = T1(values)
values.append(4)
print(f'i. {values = } · {obj = }')
values = deque([1, 2, 3], maxlen=3)
obj = T2(values)
values.append(4)
print(f'ii. {values = } · {obj = }')
values = deque([1, 2, 3], maxlen=3)
obj = T3(values)
values.append(4)
print(f'iii. {values = } · {obj = }')
values = deque([1, 2, 3], maxlen=3)
obj = T4(*values)
values.append(4)
print(f'iv. {values = } · {obj = }')
We can think of “inheritance” as a mechanism for “live updates.”
class Base:
pass
class Derived(Base):
pass
Base.f = lambda _: ...
print(
f'{Derived.f = }',
sep='\n',
)
In fact, if we extend the idea of changes to changes across versions of our code, we can see a material distinction between “inheritance,” “composition,” and alternate approaches.
class Base:
def f(self):
pass
# statically added (e.g., in a later version)
Base.g = lambda _: ...
class Derived(Base):
pass
class Composed:
def __init__(self, base : Base = None):
self.base = Base() if base is None else base
def f(self, *args, **kwargs):
return self.base.f(*args, **kwargs)
class Constructed:
locals().update(Base.__dict__)
### alternatively…
# f = Base.f
# g = Base.g
# dynamically added (e.g., via monkey-patching)
Base.h = lambda _: ...
print(
' Derived '.center(40, '\N{box drawings light horizontal}'),
f'{hasattr(Derived, "f") = }',
f'{hasattr(Derived, "g") = }',
f'{hasattr(Derived, "h") = }',
' Composed '.center(40, '\N{box drawings light horizontal}'),
f'{hasattr(Composed, "f") = }',
f'{hasattr(Composed, "g") = }',
f'{hasattr(Composed, "h") = }',
' Constructed '.center(40, '\N{box drawings light horizontal}'),
f'{hasattr(Constructed, "f") = }',
f'{hasattr(Constructed, "g") = }',
f'{hasattr(Constructed, "h") = }',
sep='\n',
)
Consider the collections.ChainMap, which allows us to isolate writes to the
top “level” of a multi-level structure. This mechanism is closely related
to how both scopes and how __getattr__ and __setattr__ work.
base = {
'abc': 123
}
snapshot = {
**base,
'def': 456,
}
# base['abc'] *= 2
# snapshot['abc'] *= 10
print(
f'{base = }',
f'{snapshot = }',
sep='\n',
)
from collections import ChainMap
base = {
'abc': 123
}
layer = {
'def': 456,
}
live = ChainMap(layer, base)
# live['abc'] *= 10
base['abc'] *= 2
print(
f'{base = }',
f'{live = } · {live["abc"] = }',
sep='\n',
)
It is important that we be aware of “shadowing” where something that may appear to be a “live view” may become a “snapshot.”
Recall the subtle distinction between clearing a list via the following
approaches. If we have captured a “live view” of xs with ys, then we must
mutate xs with .clear() or del xs[:] for the clearing to be visible on
ys.
# i.
xs = ys = [1, 2, 3]
xs = []
print(f'{xs = } · {ys = }')
# ii.
xs = ys = [1, 2, 3]
xs.clear()
print(f'{xs = } · {ys = }')
# iii.
xs = ys = [1, 2, 3]
del xs[:]
print(f'{xs = } · {ys = }')
Similarly, manipulating sys.path requires that we manipulate the actual
sys.path. A name binding of path = … in the module scope doesn’t change
the actual sys.path.
from tempfile import TemporaryDirectory
from pathlib import Path
with TemporaryDirectory() as d:
d = Path(d)
with open(d / '_module.py', mode='w') as f:
pass
from sys import path
path.append(f'{d}') # works!
from sys import path
path.insert(0, f'{d}') # works!
from sys import path
path = path + [f'{d}'] # does not work!
from sys import path
path = [f'{d}'] +path # does not work!
import sys
sys.path.append(f'{d}') # works!
import sys
sys.path.insert(0, f'{d}') # works!
import sys
sys.path = sys.path + [f'{d}'] # works!
# what about [*sys.path, f'{d}']… ?
import sys
sys.path = [f'{d}'] + sys.path # works!
“Shadowing” is how we can describe what happens when we create a “shadow” (“snapshot (copy)”) of some data at some higher level of a scoped lookup. This can easily happen in our OO hierarchies if we are not careful.
class Base:
x = []
class Derived(Base):
pass
# Base.x.append(1)
# Derived.x.append(2)
# Base.x = [1, 2, 3, 4]
Derived.x = [1, 2, 3, 4, 5, 6]
print(
f'{Base.x = }',
f'{Derived.x = }',
f'{Base.__dict__.keys() = }',
f'{Derived.__dict__.keys() = }',
sep='\n',
)
But… what if the value is immutable? If the value is immutable, then we have to be particularly careful to update it at the right level!
class Base:
x = 123
class Derived(Base):
pass
Derived.x = 789
Base.x = 456
# del Derived.x
print(
f'{Base.x = }',
f'{Derived.x = }',
f'{Base.__dict__.keys() = }',
f'{Derived.__dict__.keys() = }',
sep='\n',
)
So why does this matter…?
print("Let's take a look!")
Obviously, mutable is data that we can change and immutable data is data that we cannot change. However, an important qualifier is whether we can change data in place.
s = 'abc'
print(f'before {s = } {id(s) = :#_x}')
s = s.upper()
print(f'after {s = } {id(s) = :#_x}')
xs = [1, 2, 3]
print(f'before {xs = } {id(xs) = :#_x}')
xs.append(4)
print(f'after {xs = } {id(xs) = :#_x}')
In both cases, the values changed, but only for xs (a mutable list) did
the value change in place. If we captured a reference to the list in another
name, we would be able to observe this change in two places.
s0 = s1 = 'abc'
xs0 = xs1 = [1, 2, 3]
print(
f'before {s0 = } · {xs0 = }',
f' {s1 = } · {xs1 = }',
sep='\n',
)
s0 = s0.upper()
xs0.append(4)
print(
f'after {s0 = } · {xs0 = }',
f' {s1 = } · {xs1 = }',
sep='\n',
)
We can litigate the mechanisms used to enforce mutability, and there are many choices. However, while the exact mechanism may have some performance or some narrow correctness consequences, it is largely irrelevant to our purposes. (Recall that the “real world” appears to be fundamentally mutable.)
t = 'abc', 123
# t[0] = ...
class T:
def __init__(self, x):
self._x = x
@property
def x(self):
return self._x
obj = T(123)
# obj.x = ...
obj._x = ...
Mutability allows us to have “action at a distance”: a change in one part of the code can change some other, non-local part of the code.
from threading import Thread
from time import sleep
class T:
def __init__(self, values):
self.values = values.copy()
def __call__(self):
while True:
sleep(1)
self.values.append(sum(self.values))
values = [1, 2, 3]
Thread(target=T(values)).start()
for _ in range(3):
print(f'{values = }')
sleep(1)
This can readily lead to code that is hard to understand using only local information.
One way to avoid this is to aggressively make copies any time we pass data around. However, we will have to be careful to make “deep copies.”
from threading import Thread
from time import sleep
class T:
def __init__(self, values):
self.values = values.copy()
def __call__(self):
while True:
sleep(1)
self.values[-1].append(sum(self.values[-1]))
values = [1, 2, 3, [4]]
Thread(target=T(values)).start()
for _ in range(3):
print(f'{values = }')
sleep(1)
Note that just as there is a distinction between a “deep” and a “shallow” copy, we can make a distinction between a “shallowly” and “deeply” immutable structure.
t = 'abc', [0, 1, 2]
print(f'before {t = }')
t[-1].append(3)
print(f'after {t = }')
Alternatively, we could design around immutable data structures, using mechanisms
such as a collections.namedtuple or dataclasses.dataclass. This can help
us ensure that we do not inadvertantly mutate data non-locally. Of course, we
will still have to be careful if these structures are only “shallowly” immutable.
from collections import namedtuple
from dataclasses import dataclass
@dataclass(frozen=True)
class T:
value : int
obj = T(123)
T = namedtuple('T', 'value')
obj = T(123)
When we want to change our data, we will use mechanisms such as ._replace or
dataclasses.replace() to replace and copy the entities as a whole.
from collections import namedtuple
from dataclasses import dataclass, replace
@dataclass(frozen=True)
class T:
value : int
obj0 = obj1 = T(123)
obj2 = replace(obj0, value=obj0.value * 10)
print(f'{obj0 = } · {obj1 = } · {obj2 = }')
T = namedtuple('T', 'value')
obj0 = obj1 = T(123)
obj2 = obj0._replace(value=obj0.value * 10)
print(f'{obj0 = } · {obj1 = } · {obj2 = }')
Note that we can keep references to the parts of the data that did not change, and we can rely on the Python garbage collector to keep those references alive only as long as they are needed. As a consequence, we may not necessarily see significantly increased memory usage from these copies.
We can use other tricks, like a collections.ChainMap, to reduce the amount
of copied information (though at the loss of functionality, such as the ability
to del an entry.)
from dataclasses import dataclass, replace, field
from collections import ChainMap
from random import Random
from string import ascii_lowercase
@dataclass(frozen=True)
class T:
values : ChainMap[dict[str, int]] = field(default_factory=ChainMap)
def __call__(self, *, random_state=None):
rnd = random_state if random_state is not None else Random()
new_entries = {
''.join(rnd.choices(ascii_lowercase, k=4)): rnd.randint(-100, +100)
for _ in range(10)
}
return replace(self, values=ChainMap(new_entries, self.values))
def __getitem__(self, key):
return self.values[key]
rnd = Random(0)
obj = T()
for _ in range(3):
obj = obj(random_state=rnd)
print(
f'{obj = }',
f'{obj["fudo"] = }',
sep='\n{}\n'.format('\N{box drawings light horizontal}' * 40),
)
However, some very useful parts of Python are inherently mutable. For example,
a generator or generator coroutine cannot be copied—at most, we can tee them,
and that may not even necessarily work or be meanignful. (Of course, for many
generators and generator coroutines, mutations may not be particularly
problematic.)
Additionally, with a strictly immutable design, we have to be very clear about how the parts of our code share state. If we do not design two parts of our code to share state upfront, we may later discover that it is very disruptive to thread that state through later.
from dataclasses import dataclass
from functools import wraps
from itertools import count
from threading import Thread
from time import sleep
from typing import Iterator
@dataclass
class T:
it : Iterator
def __call__(self):
while True:
next(self.it)
# self.it.send(True)
sleep(1)
@lambda coro: wraps(coro)(lambda *a, **kw: [ci := coro(*a, **kw), next(ci)][0])
def resettable_count(start=0):
while True:
for state in count():
if (reset := (yield start + state)):
break
# from inspect import currentframe, getouterframes
# print(f'{getouterframes(currentframe())[1].lineno = }')
rc = resettable_count(start=100)
print(f'{rc.send(True) = }')
print(f'{next(rc) = }')
Thread(target=(obj := T(rc))).start()
print(f'{next(rc) = }')
print(f'{next(rc) = }')
print(f'{rc.send(True) = }')
print(f'{next(rc) = }')
How can I use this to improve my code?
print("Let's take a look!")
We know that the keys of a dict and the elements of a set must be hashable.
# hashable → ok ✓
d = {'a': ..., 'b': ..., 'c': ...}
s = {'a', 'b', 'c'}
# hashable → ok ✓
d = {('a', 'b', 'c'): ...}
s = {('a', 'b', 'c')}
# not hashable → not ok ✗
# d = {['a', 'b', 'c']: ...)
# s = {['a', 'b', 'c']}
This leads to clumsiness such as not being able to model set[set]—“sets of
sets.” Since set is not hashable, we cannot create a set that contains
another set. However, we can create set[frozenset]—“sets of frozensets.”
# s = a # not ok ✗
s = {frozenset({'a', 'b', 'c'})} # ok ✓
Similarly, the keys of a dict can be frozenset but not set.
d = {
frozenset({'a', 'b', 'c'}): ...
}
d[frozenset({'a', 'b', 'c'})]
This may be useful in cases where we want a compound key that has unique components where order does not matter.
d = {
'a,b,c': ...,
}
print(f"{d['a,b,c'] = }")
for k in d:
k.split(',')
d = {
('a', 'b', 'c', 'd,e'): ...
}
print(f"{d['a', 'b', 'c', 'd,e'] = }")
for x, y, z, w in d:
pass
d = {
frozenset({'a', 'b', 'c'}): ...
}
print(f"{d[frozenset({'a', 'b', 'c'})] = }")
print(f"{d[frozenset({'c', 'b', 'a'})] = }")
for k in d:
pass
Naïvely, we may assume that the difference between set and frozenset that
leads to frozenset being hashable is immutability. We may naïvely (and incorrectly)
assert that hashability implies immutability (and vice-versa.)
In fact, for many of the common built-in types, we will see that those that are immutable are hashable and those that are mutable are not hashable.
xs = [1, 2, 3] # `list` mutable; not hashable
s = {1, 2, 3} # `set` mutable; not hashable
d = {'a': 1} # `dict` mutable; not hashable
t = 'a', 1 # `tuple` immutable; is hashable
s = frozenset({1, 2, 3}) # `frozenset` immutable; is hashable
x = 12, 3.4, 5+6j, False # `int`, `float`, `complex`, bool` immutable; is hashable
x = 'abc', b'def' # `str`, `bytes` immutable; is hashable
x = range(10) # `range` immutable; is hashable
When we discover that slice is immutable not hashable, we may chalk this
up to a corner-case driven by syntactical ambiguity. (In fact, in later versions
slice becomes hashable.)
x = slice(None)
# x.start = 0 # AttributeError
hash(x) # TypeError
In may be ambiguous to __getitem__ with a slice, since you cannot distinguish
between a single-item lookup where that item is a slice and a multi-item
sliced-lookup. In the case of builtin dict (which does not support multi-item)
lookup, this isn’t much of a problem; however, note that pandas.Series.loc supports
both modalities.
d = {
slice(None): ...
}
print(
f'{d[slice(None)] = }',
f'{d[:] = }',
sep='\n',
)
from pandas import Series
s = Series({
None: ...,
slice(None): ...,
})
print(
f'{s.loc[None]}',
f'{s.loc[slice(None)]}',
f'{s.loc[:]}',
sep='\n{}\n'.format('\N{box drawings light horizontal}' * 40),
)
Additionally, since we can implement __hash__, we can create mutable objects
that are hashable. Again, we may assume that this does not materially affect
the relationship between hashability and immutability.
from dataclasses import dataclass
@dataclass
class T:
value : list[int]
def __hash__(self):
return hash(id(self))
obj = T([1, 2, 3])
print(f'{hash(obj) = :#_x}')
obj.value.append(4)
print(f'{hash(obj) = :#_x}')
However, if we consider more deeply the relationship between the two, we will discover the true nature of mutability and hashability.
Let’s consider two different ways to compute the hash of a piece of data:
class Base:
def __init__(self, value):
self.value = value
def __repr__(self):
return f'T({self.value!r})'
class T0(Base):
def __hash__(self):
return hash(id(self))
class T1(Base):
def __hash__(self):
return hash(self.value)
obj0, obj1 = T0(123), T1(123)
print(
f'{hash(obj0) = }',
f'{hash(obj1) = }',
sep='\n',
)
Note that the hash when computed on identity changes across runs. In general,
since the underlying mechanism of hash is an internal implementation detail,
hash values may readily change across versions of Python.
from random import Random
rnd = Random(0)
x = (
rnd.random(),
rnd.random(),
)
print(
f'x = { x }',
f'hash(x) = {hash(x)}',
sep='\n',
)
Assume that the value is immutable. If we were to compute the hash based on
identity, then we might accidentally “lose” an object in a dict.
from dataclasses import dataclass
@dataclass(frozen=True)
class T:
value : int
def __hash__(self):
return hash(id(self))
def f(d):
d[obj := T(123)] = ...
d = {}
f(d)
# d[T(123)] # KeyError
for k in d:
print(f'{k = }')
Therefore, we must hash immutable objects based on their value (on equality.) This is a matter of practicality.
Assume that the value is mutable. If we were to compute the hash based on
equality, then we might accidentally “lose” an object in a dict.
from dataclasses import dataclass
@dataclass
class T:
value : int
def __hash__(self):
return hash(self.value)
d = {}
d[obj := T(123)] = ...
obj.value = 456
# d[obj] # KeyError
# d[T(123)] # KeyError
# d[T(456)] # KeyError
# for k in d: print(f'{k = }')
This is a serious problem, because the hash that was used to determine the
location of the entry in the dict is no longer accurate. There will be
no way to retrieve the value via __getitem__!
Therefore, we must hash mutable objects based on their identity. However,
we still have the problem of “losing” a value in the dict if we hash on
identity.
Except the value is still in the dict; we simply cannot access it via
__getitem__. We can still iterate over dict in both cases!
from dataclasses import dataclass
@dataclass
class T:
value : int
def __hash__(self):
return hash(self.value)
d = {}
d[obj := T(123)] = ...
obj.value = 456
for k in d:
print(f'{k = }')
We may then extend our understanding of this topic as follows: immutable objects must be hashed on value to support direct retrieval with equivalent objects; mutable objects must be hashed on identity and cannot support direct retrieval. In other words, hashability implies immutability if-and-only-if we need direct (or “non-intermediated” access.)
In fact, it is relatively common to see hashed mutable object. Consider the use
of a networkx.DiGraph with a custom, rich node type. (Our Node class must
be hashable, since the networkx.DiGraph is implemented as a “dict of dict of dicts.”)
from dataclasses import dataclass
from itertools import pairwise
from networkx import DiGraph
@dataclass
class Node:
name : str
value : int = 0
def __hash__(self):
return hash(id(self))
nodes = [Node('a'), Node('b'), Node('c')]
g = DiGraph()
g.add_edges_from(pairwise(nodes))
for n in nodes:
n.value += 1
for n in g.nodes:
...
Consider, however, that all access to the nodes of the networkx.DiGraph will
likely be intermediated by calls such as .nodes that allow us to iterate over
all of the nodes. We may also subclass networkx.DiGraph to allow direct access
to nodes by name, further intermediating between the __getitem__ syntax and
the hash-lookup mechanism.
from dataclasses import dataclass
from itertools import pairwise
from networkx import DiGraph
@dataclass
class Node:
name : str
value : int = 0
def __hash__(self):
return hash(id(self))
nodes = [Node('a'), Node('b'), Node('c')]
class MyDiGraph(DiGraph):
class by_name_meta(type):
def __get__(self, instance, owner):
return self(instance)
@dataclass
class by_name(metaclass=by_name_meta):
instance : 'MyDiGraph'
def __getitem__(self, key):
nodes_by_name = {k.name: k for k in self.instance.nodes}
return nodes_by_name[key]
g = MyDiGraph()
g.add_edges_from(pairwise(nodes))
for n in nodes:
n.value += 1
print(f"{g.by_name['a'] = }")
Note that it is not a good idea to store object id(…)s in structures, since
(in CPython) the memory addresses for these objects (and their corresponding
id(…) values) may be reüsed. However, over the lifetime of an object, its
id(…) will not change, so it is safe to store the id(…) if the lifetime
of this storage is tied to the lifetime of the object. This will be the case
with hashing an object on id(…) and putting it into a set or dict. While
the __hash__(…) will be implicitly stored and is a dependent value of id(…),
the lifetime of that storage will necessarily match to the lifetime of the object itself.
Furthermore, the hash is used only to find the approximate location of the entry
in the set or dict. Since hash values are finite (in CPython, constrained to the
valid range of Py_hash_t values where Py_hash_t is typedefd to Py_ssize_t which
is generally typedefd to ssize_t,) then by the “pigeonhole principle,” multiple
distinct objects must share the same hash. Therefore, after performing any necessary
additional “probing,” the set or dict will perform an == comparison to confirm
that it has found the right item. This further ensures that computing __hash__ on id(…)
won’t lead to stale entries.
It also means that objects which are not equivalent to themselves trivially get
lost in dicts! For example, float('nan') can be the key of a dict, but you
will not be able to later retrieve the value via direct __getitem__!
d = {
float('nan'): ...,
}
d[float('nan')] # KeyError
How does this affect my design?
print("Let's take a look!")
In Python, we have first-class functions: functions can be treated like any other data. For example, we can put functions into data structures.
def f(x, y):
return x + y
def g(x, y):
return x * y
for f in [f, g]:
print(f'{f(123, 456) = :,}')
break
for rv in [f(123, 456), g(123, 456)]:
print(f'{rv = :,}')
break
We can also dynamically define new functions at runtime.
def f():
def g(x):
return x ** 2
return g
g = f()
print(f'{g(123) = :,}')
Often, we may use lambda syntax if those functions are short (consisting
of a single expression with no use of the ‘statement grammar.’)
for f in [lambda x, y: x + y, lambda x, y: x * y]:
print(f'{f(123, 456) = :,}')
def f():
return lambda x: x ** 2
g = f()
print(f'{g(123) = :,}')
We know that these functions are being defined dynamically, because every definition creates a new, distinct version of that function.
def f():
def g(x):
return x ** 2
return g
g0, g1 = f(), f()
print(
f'{g0(123) = :,}',
f'{g1(123) = :,}',
f'{g0 is not g1 = }',
sep='\n',
)
Note that, in Python, we cannot compare functions for equality.
def f(x, y):
return x + y
def g(x, y):
return x + y
print(f'{f == g = }')
print(f'{f.__name__ == g.__name__ = }')
print(f'{f.__code__.co_code == g.__code__.co_code = }')
# funcs = {*()}
# for _ in range(3):
# def f(x, y):
# return x + y
# funcs.add(f)
# for _ in range(3):
# def f(x, y):
# return x + y
print(f'{funcs = }')
When we dynamically define functions in Python, a function object is created that consists of the function’s name (whether anonymous or not,) its docstring (if provided,) its default values, its code object, and any non-local, non-global data it needs to operate (its closure.)
def f(x, ys=[123, 456]):
'''
adds x to each value in ys
'''
return [x + y for y in ys]
from dis import dis
dis(f)
print(
# f'{f.__name__ = }',
# f'{f.__doc__ = }',
# f'{f.__code__ = }',
# f'{f.__code__.co_code = }',
# f'{f.__defaults__ = }',
# f'{f.__closure__ = }',
sep='\n',
)
Note that the defaults are created when the function is defined; this is why when we have “mutable default arguments,” there is only one copy of these defaults that is reüsed across all invocations of the function.
def f(xs=[123, 456]):
xs.append(len(xs) + 1)
return xs
print(
f'{f() = }',
f'{f() = }',
f'{f.__defaults__ = }',
f'{f() is f() = }',
sep='\n',
)
When the bytecode for a function is created, the Python compiler performs
scope-determination. In order to generate the bytecodes for local variable
access (LOAD_FAST,) for global variable access (LOAD_GLOBAL,) or for
closure variable access (LOAD_DEREF,) the Python parser statically determines
the scope of any variables that are used.
from dis import dis
def f():
return x
# dis(f)
def f(x):
# import x
return x
dis(f)
def f(x):
def g():
nonlocal x
x += 1
return x
return g
dis(f(...))
For variables that are neither local nor global but instead in the “enclosing
environment,” we generate a LOAD_DEREF bytecode for access and capture a
reference to that variable.
def f(x):
def g():
return x
return g
xs = [1, 2, 3]
g = f(xs)
print(
f'{g.__closure__ = }',
f'{g.__closure__[0] = }',
f'{g.__closure__[0].cell_contents = }',
f'{g.__closure__[0].cell_contents is xs = }',
sep='\n',
)
It is not a coïncidence that this is reminiscent of object orientation in Python. Just as an object “encapsulates” some (hidden) state and some behaviour that operates on such state, a dynamically defined function “closes over” some state that it can operate on.
class T:
def __init__(self, state):
self.state = state
def __call__(self):
self.state += 1
return self.state
def __repr__(self):
return f'T({self.state!r})'
obj = T(123)
print(
# f'{obj = }',
f'{obj() = }',
f'{obj() = }',
f'{obj() = }',
sep='\n',
)
def create_obj(state):
def f():
nonlocal state
state += 1
return state
return f
obj = create_obj(123)
print(
f'{obj = }',
f'{obj() = }',
f'{obj() = }',
f'{obj() = }',
sep='\n',
)
In fact, we can see the correspondence quite clearly when we look at what sits underneath.
class T:
def __init__(self, state):
self.state = state
def __call__(self):
self.state += 1
return self.state
def __repr__(self):
return f'T({self.state!r})'
def create_obj(state):
def f():
nonlocal state
state += 1
return state
return f
obj0 = T(123)
obj1 = create_obj(123)
print(
f'{obj0.__dict__ = }',
f'{obj1.__closure__ = }',
f"{obj0.__dict__['state'] = }",
f'{obj1.__closure__[0].cell_contents = }',
sep='\n',
)
This tells us that an object created with the class keyword and a dynamically
defined function created with a closure are two ways to accomplish the same goal
of encapsulation.
When we create an instance a generator coroutine, it maintains its local state in-between iterations.
def coro(state):
while True:
state += 1
yield state
ci = coro(123)
print(
f'{next(ci) = }',
f'{next(ci) = }',
f'{next(ci) = }',
f'{next(ci) = }',
f'{ci.gi_frame.f_locals = }',
sep='\n',
)
Indeed, this appears to be yet another way to accomplish the same goal!
class T:
def __init__(self, state):
self.state = state
def __call__(self):
self.state += 1
return self.state
def __repr__(self):
return f'T({self.state!r})'
def f(state):
def g():
nonlocal state
state += 1
return state
return g
def coro(state):
while True:
state += 1
yield state
obj0 = T(123)
obj1 = f(123)
obj2 = coro(123).__next__
print(
# f'{obj0() = } {obj0() = } {obj0() = }',
# f'{obj1() = } {obj1() = } {obj1() = }',
# f'{obj2() = } {obj2() = } {obj2() = }',
# f'{obj0.__dict__ = }',
# f'{obj1.__closure__ = }',
# f'{obj2.__self__.gi_frame.f_locals = }',
f"{obj0.__dict__['state'] = }",
f'{obj1.__closure__[0].cell_contents = }',
f"{obj2.__self__.gi_frame.f_locals['state'] = }",
sep='\n',
)
Facing three ways to accomplish the same goal, which do we choose?
classIf it makes sense for someone to be able to dig around into the internal
details of the object, then maybe we should choose class.
class T:
def __init__(self, state):
self.state = state
def __call__(self):
self.state += 1
return self.state
def __repr__(self):
return f'T({self.state!r})'
def __dir__(self):
return ['state']
obj = T(123)
print(
f'{obj = }',
f'{dir(obj) = }',
sep='\n',
)
def f(state):
def g():
nonlocal state
state += 1
return state
return g
obj = f(123)
print(
f'{obj = }',
f'{dir(obj) = }',
f'{obj.__closure__ = }',
sep='\n',
)
If it makes sense for the object to support multiple named methods, then class
is probably less clumsy.
class T:
def __init__(self, state):
self.state = state
def inc(self):
self.state += 1
return self.state
def dec(self):
self.state -= 1
return self.state
def __repr__(self):
return f'T({self.state!r})'
obj = T(123)
print(
f'{dir(obj) = }',
# f'{obj.inc() = }',
# f'{obj.dec() = }',
sep='\n',
)
from collections import namedtuple
def f(state):
def inc():
nonlocal state
state += 1
return state
def dec():
nonlocal state
state -= 1
return state
# return inc, dec
return namedtuple('T', 'inc dec')(inc, dec)
# obj = f(123)
# print(
# # f'{dir(obj) = }',
# f'{obj[0]() = }',
# f'{obj[1]() = }',
# sep='\n',
# )
inc, dec = f(123)
print(
f'{inc() = }',
f'{dec() = }',
sep='\n',
)
# obj = f(123)
# print(
# f'{obj.inc() = }',
# f'{obj.dec() = }',
# sep='\n',
# )
If we need to implement any other parts of the Python vocabulary, then we must
write class (or use some boilerplate elimination tool like contextlib.contextmanager.)
class T:
def __init__(self, state):
self.state = state
def __call__(self, value):
self.state.append(value)
def __len__(self):
return len(self.state)
def __getitem__(self, idx):
return self.state[idx]
def __repr__(self):
return f'T({self.state!r})'
obj = T([1, 2, 3])
obj(4)
print(
f'{len(obj) = }',
f'{obj[0] = }',
sep='\n',
)
If we want to “hide” data from our users to limit them in some antagonistic or coërcive way, we should not expect the closure to add anything but few easily circumventable steps.
def f(state):
def g():
nonlocal state
state += 1
return state
return g
g = f(123)
g.__closure__[0].cell_contents = 456
print(
f'{g() = }',
f'{g.__closure__[0].cell_contents = }',
sep='\n',
)
This is not too dissimilar from our guidance around @property.
class T:
def __init__(self, x):
self._x = x
@property
def x(self):
return self._x
def __repr__(self):
return f'T({self._x})'
obj = T(123)
# obj.x = ...
obj._x = ...
No matter how deeply we try to hide some data, it’s only a few dirs away.
def f(x):
class T:
@property
def x(self):
return x
def __repr__(self):
return f'T({self._x})'
return T()
obj = f(123)
print(
f'{obj.x = }',
f'{type(obj).x.fget.__closure__[0].cell_contents = }',
sep='\n',
)
If we want to non-antagonistically reduce clutter or noise, we may choose to use a closure.
class T:
def __init__(self, state):
self.state = state
def __call__(self):
self.state += 1
return self.state
def __repr__(self):
return f'T({self.state!r})'
def f(state):
def g():
nonlocal state
state += 1
return state
return g
obj0 = T(123)
obj1 = f(123)
print(
f'{obj0 = }',
f'{obj1 = }',
f'{obj0() = }',
f'{obj1() = }',
sep='\n',
)
If we have a heterogeneous computation, we generally do not want a generator coroutine if the computation will be triggered manually.
from dataclasses import dataclass
@dataclass
class State:
a : int = None
b : int = None
c : int = None
class T:
def __init__(self, state : State = None):
self.state = state if state is not None else State()
def f(self, value):
self.state.a = value
return self.state
def g(self, value):
self.state.b = self.state.a + value
return self.state
def h(self, value):
self.state.c = self.state.b + value
return self.state
obj = T()
print(
f'{obj.f(123) = }',
f'{obj.g(456) = }',
f'{obj.h(789) = }',
sep='\n',
)
from dataclasses import dataclass
@dataclass
class State:
a : int = None
b : int = None
c : int = None
def coro(state : State = None):
state = state if state is not None else State()
state.a = yield ...
state.b = (yield state) + state.a
state.c = (yield state) + state.b
yield state
obj = coro(); next(obj)
print(
f'{obj.send(123) = }',
...
f'{obj.send(456) = }', # ???
...
...
...
f'{obj.send(789) = }',
sep='\n',
)
If we have a single, homogeneous decomposition of a computation, we may
find a generator coroutine is less conceptual overhead than a class-style
object.
def coro():
while True:
_ = yield
ci = coro()
print(
# f'{dir(ci) = }',
f'{next(ci) = }',
f'{ci.send(...) = }',
# f'{ci.throw(Exception()) = }',
# f'{ci.close() = }',
sep='\n',
)
In fact, we may find that pumped generator coroutines with __call__-interface
unification give us an extremely simple API we can present our users.
from functools import wraps
def f(x):
pass
def g():
def f(x):
pass
return f
class T:
def __call__(self, x):
pass
@lambda coro: wraps(coro)(lambda *a, **kw: [ci := coro(*a, **kw), next(ci), ci.send][-1])
def coro():
while True:
_ = yield
How does this affect usability?
print("Let's take a look!")
If we have a base class, we can inherit from the base class in a derived class.
If the base class later adds methods, the derived class automatically sees
those methods. If the derived class wants to customise the behaviour, it can
do so and use super() to refer to the base class’s implementation.
class Base:
def f(self):
pass
class Derived(Base):
# pass
def f(self):
return super().f()
Base.g = lambda self: None
obj = Derived()
obj.f()
obj.g()
Note that when we inherit, we inherit both methods and the metaclass.
class BaseMeta(type):
def __new__(cls, name, bases, body):
print(f'BaseMeta.__new__({cls!r}, {name!r}, {bases!r}, {body!r})')
return super().__new__(cls, name, bases, body)
class Base(metaclass=BaseMeta):
pass
class Derived(Base):
pass
assert type(Base) is type(Derived)
If we have a class, we can pass an instance of that class to serve as a constituent of another class. This is called composition. With composition, if the constituent later adds methods, we must explicitly expose those. If the composed clas wants to custom the behaviour, it can do so as it sees fit.
class Component:
def f(self):
pass
class Composed:
def __init__(self, component=None):
self.component = component if component is not None else Component()
def f(self):
return self.component.f()
def g(self):
return self.component.g()
Component.g = lambda self: None
obj = Composed()
obj.f()
# obj.g()
In the case of inheritance, we also get a default implementation of the
__isinstance__ protocol.
class Base: pass
class Derived(Base): pass
obj = Derived()
assert isinstance(obj, Derived)
assert isinstance(obj, Base)
assert issubclass(Derived, Base)
However, we can implement this protocol as we see fit.
class TMeta(type):
def __instancecheck__(self, instance):
return True
class T(metaclass=TMeta):
pass
print(
f'{isinstance(123, T) = }',
)
Note that there is an inherent directionality to this implementation.
class TMeta(type):
def __instancecheck__(self, instance):
return True
class T(metaclass=TMeta):
pass
obj = T()
print(
f'{isinstance(obj, int) = }',
)
In the case of isinstance(…, int), we must subclass from int.
class T(int):
pass
obj = T()
print(
f'{isinstance(obj, int) = }',
)
In some (very limited) cases, we can patch __class__, but this probably
won’t work in general.
class T0:
pass
class T1:
pass
obj = T1()
assert not isinstance(obj, T0)
class T0Meta(T0.__class__):
def __instancecheck__(self, instance):
return True
T0.__class__ = T0Meta
assert isinstance(obj, T0)
Note that __class__ patching is much easier on regular class instances.
class A:
def f(self):
pass
class B:
def g(self):
pass
obj = A()
obj.f()
obj.__class__ = B
obj.g()
There are other options than just inheritance or composition, such as “object construction.”
def f(self): pass
def g(self): pass
methods = {
'f': f,
'g': g,
}
class A:
# locals().update(methods)
f = f
g = g
class B:
locals().update(methods)
obj0, obj1 = A(), B()
obj0.f(); obj0.g()
obj1.f(); obj1.g()
In fact, the “object construction” approach looks very similar to how we might use a class decorator.
def dec(cls):
cls.f = lambda self: None
cls.g = lambda self: None
return cls
@dec
class T:
pass
obj = T()
obj.f()
obj.g()
We may prefer the use of a class decorator over inheritance in cases where we want to have a very “light touch.”
In fact, given that inheritance is often about creating a categorisation, and categorisation schemes are intimately related to use-case, we may often prefer to avoid inheritance in a library if the library is not the centre of attention for a given system.
from networkx import DiGraph
class MyDiGraph(DiGraph):
pass
class MiDiGraph:
def __init__(self, g):
self.g = g
def nodes(self):
...
def edges(self):
...
class T:
def f(self):
pass
def g(self):
pass
class Composed:
def __init__(self, component):
self.component = component
def f(self):
...
# Cow, Pig, Chicken
# vet
class Mammal: pass
class Cow(Mammal): pass
class Pig(Mammal): pass
# sommelier
class WhiteMeat: pass
class Chicken(WhiteMeat): pass
class Pig(WhiteMeat): pass
Which should I pick?
print("Let's take a look!")
The Python object model is very “mechanical,” and our understanding of many of the protocol methods may be little more than a reflection of this mechanical understanding.
For example, when instances are created, we call __new__ prior to instance
creation and __init__ afterwards. This immediately gives us an indication
for when we may want to implement __new__ vs __init__.
class TMeta(type):
def __call__(cls):
obj = cls.__new__(cls) # ...
cls.__init__(obj) # ...
return obj
class T(metaclass=TMeta):
def __new__(cls):
return super().__new__(cls)
def __init__(self):
pass
obj = T()
print(f'{obj = }')
For other protocol methods, we may need to dig a bit deeper to discover
the underlying meaning. For example, __repr__ is the human readable representation
of an object, and __str__ is documented as the “informal” printable representation.
However, when we consider that __str__ is triggered by str(…), we can derive
an alternate meaning for __str__: it is the data represented in the form of an str.
from dataclasses import dataclass
from enum import Enum
class Op(Enum):
Eq = '='
Lt = '<'
Gt = '>'
...
def __str__(self):
return self.value
@dataclass
class Where:
column : str
op : Op
value : ...
def __str__(self):
return f'{self.column} {self.op} {self.value}'
@dataclass
class Select:
exprs : list[str]
table : str
where : Where | None
def __str__(self):
where = f' {self.where}' if self.where else ''
return f'select {", ".join(self.exprs)} from {self.table}{where}'
stmt = Select(
['name', 'value'],
'data',
Where('value', Op.Gt, 0),
)
from pathlib import Path
d = Path('/tmp')
print(
# f'{str(d) = }',
# f'{str(123.456) = }',
# f'{repr(stmt) = }',
# f'{str(stmt) = }',
sep='\n',
)
Some protocol methods are misleading. For example, it may appear that
__hash__ means “a pigeonholed identifier,” but its meaning is far narrower.
from dataclasses import dataclass
@dataclass
class T:
value : ...
def __hash__(self):
return hash(self.value)
obj = T((1, 2, 3))
print(
f'hash(obj) = {hash(obj)}',
)
For some protocol methods, we need to pay close attention to the implementation
rules. For example, __len__ means the “size” of an object, where that
concept of size must be the “integer, non-negative” size.
class T:
def __len__(self):
# return -2
# return 2.5
return 2
obj = T()
print(f'{len(obj) = }')
Sometimes, there is disagreement about the implicit rules of implementation.
python -m pip install numpy
class T:
def __bool__(self):
raise ValueError(...)
# return ...
bool(T())
from enum import Enum
from numpy import array
class Directions(Enum):
North = array([+1, 0])
South = array([-1, 0])
East = array([ 0, +1])
West = array([ 0, -1])
print(
array([0, 0]) + Directions.North * 2
)
In fact, even PEP-8 makes this mistake:
xs = [1, 2, 3]
if len(xs) > 0: pass
if not len(xs): pass
if xs: pass # preferred
from numpy import array
xs = array([1, 2, 3])
if len(xs) > 0: pass
if not len(xs): pass
if xs.size > 0: pass
# if xs: pass # ValueError
As we can see __bool__ should return True or False but, in the case of
a numpy.ndarray or pandas.Series, instead raises a ValueError.
python -m pip install numpy pandas
from numpy import array
from pandas import Series
xs = array([1, 2, 3])
s = Series([1, 2, 3])
print(
# f'{bool(xs) = }',
# f'{bool(s) = }',
sep='\n',
)
Of course, in the PEP-8 example, this isn’t altogether that meaningful of a problem.
from numpy import array
xs = [1, 2, 3]
xs.append(4)
xs.clear()
if not xs:
pass
for x in xs:
pass
xs = array([1, 2, 3])
xs = xs[xs > 10]
if not xs:
pass
Note that the entire reason we are choosing to interact with the Python “vocabulary” is to be able to write code that is obvious to the reader.
...
...
...
...
# try:
v = obj[k]
# except LookupError:
# pass
...
...
...
...
This means that when we implement data model methods, we should implement them only where their meaning is unambiguous. This suggests that the implementation of these methods should be to support a singular, unique, or privileged operation.
from pandas import Series, date_range
s = Series([10, 200, 3_000], index=date_range('2020-01-01', periods=3))
print(
s[2], # label
s[:'2020-01-01'], # positional
sep='\n',
)
from pandas import Series
s = Series([10, 200, 3_000], index=[0, 1, 2])
print(
s.loc[0],
s.loc[:1],
s.iloc[0],
s.iloc[:1],
sep='\n',
)
Similarly, consider len on a pandas.DataFrame.
from pandas import DataFrame, date_range
from numpy.random import default_rng
rng = default_rng(0)
df = DataFrame(
index=(idx := date_range('2020-01-01', periods=3)),
data={
'a': rng.normal(size=len(idx)),
'b': rng.integers(-10, +10, size=len(idx)),
},
)
for x in df.columns:
print(f'{x = }')
print(
df,
# f'{len(df) = }',
# f'{len(df.index) = }',
# f'{len(df.columns) = }',
# f'{df.size = }',
# f'{df.shape = }',
sep='\n{}\n'.format('\N{box drawings light horizontal}' * 20),
)
Where we break this intuition, we can see how it can impede understandability.
For example, when reviewing code, what transformations are safe? If we rely
on assumptions of how __getitem__ typically works, a transformation such as
the below should be fine:
from dataclasses import dataclass, field
from random import Random
@dataclass
class T(dict):
random_state : Random = field(default_factory=Random)
def __missing__(self, k):
return self.random_state.random()
def f(x, y): pass
def g(x): pass
obj = T(random_state=Random(0))
k = ...
# f(obj[k], g(obj[k]))
v = obj[k]
f(v, g(v))
However, consider __dict__.__or__ which breaks a mathematical assumption of
commutativity. Does this impede understandability?
d0 = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
d1 = { 'c': 30, 'd': 40}
print(
f'{d0 | d1 = }',
f'{d1 | d0 = }',
sep='\n',
)
Of course…
s0 = {True}
s1 = {1}
print(
f'{s0 | s1 = }',
f'{s1 | s0 = }',
f'{s0 == s1 = }',
sep='\n',
)
… and also…
s0 = 'abc'
s1 = 'def'
print(
f'{s0 + s1 = }',
f'{s1 + s0 = }',
sep='\n',
)
Does this make things more understandable?
print("Let's take a look!")
In order to actually make a class-style object in Python useful, we need to
write an lot of “boilerplate.”
class T:
def __init__(self, value):
self._value = value
@property
def value(self):
return self._value
def __hash__(self):
return hash(self.value)
def __eq__(self, other):
return self.value == other.value
def __repr__(self):
return f'T({self.value!r})'
obj0, obj1 = T(123), T(123)
print(
f'{obj0.value = }',
f'{obj1.value = }',
f'{obj0 == obj1 = }',
f'{({obj0, obj1}) = }',
sep='\n',
)
We can reduce this boilerplate in a couple of ways. One way is the use of a
collections.namedtuple:
from collections import namedtuple
T = namedtuple('T', 'value')
obj0, obj1 = T(123), T(123)
print(
f'{obj0.value = }',
f'{obj1.value = }',
f'{obj0 == obj1 = }',
f'{({obj0, obj1}) = }',
sep='\n',
)
Another option is a dataclasses.dataclass:
from dataclasses import dataclass
@dataclass(frozen=True)
class T:
value : int
obj0, obj1 = T(123), T(123)
print(
f'{obj0.value = }',
f'{obj1.value = }',
f'{obj0 == obj1 = }',
f'{({obj0, obj1}) = }',
sep='\n',
)
However, beyond just the reduction in lines-of-code, consider the “escalation pathway” we are provided with these:
entities = [
('abc', 123),
('def', 456),
('xyz', 789),
]
...
...
...
for ent in entities:
print(f'{ent[0].upper() = }', f'{ent[1] + 1 = }', sep='\N{middle dot}'.center(3))
for name, value in entities:
print(f'{name.upper() = }', f'{value + 1 = }', sep='\N{middle dot}'.center(3))
Using a list[tuple] is a very simple and quick way to start our programme,
but as our code grows, the poor ergonomics show themselves quickly.
It is at this point we may “graduate” the code to use a collections.namedtuple.
We may first create the new collections.namedtuple type:
from collections import namedtuple
Entity = namedtuple('Entity', 'name value')
Then we may apply it to our existing data:
from collections import namedtuple
Entity = namedtuple('Entity', 'name value')
entities = [
Entity('abc', 123),
Entity('def', 456),
Entity('xyz', 789),
]
for ent in entities:
print(f'{ent[0].upper() = }', f'{ent[1] + 1 = }', sep='\N{middle dot}'.center(3))
for name, value in entities:
print(f'{name.upper() = }', f'{value + 1 = }', sep='\N{middle dot}'.center(3))
Then we may rewrite any code that uses unpacking or indexing syntax to use
__getattr__ (named-lookup) syntax:
from collections import namedtuple
Entity = namedtuple('Entity', 'name value')
entities = [
Entity('abc', 123),
Entity('def', 456),
Entity('xyz', 789),
]
# for ent in entities:
# print(f'{ent[0].upper() = }', f'{ent[1] + 1 = }', sep='\N{middle dot}'.center(3))
# for name, value in entities:
# print(f'{name.upper() = }', f'{value + 1 = }', sep='\N{middle dot}'.center(3))
for ent in entities:
print(f'{ent.name.upper() = }', f'{ent.value + 1 = }', sep='\N{middle dot}'.center(3))
This allows to add fields:
from collections import namedtuple
Entity = namedtuple('Entity', 'name value flag')
entities = [
Entity('abc', 123, True),
Entity('def', 456, False),
Entity('xyz', 789, True),
]
for ent in entities:
print(f'{ent.name.upper() = }', f'{ent.value + 1 = }', sep='\N{middle dot}'.center(3))
We may subclass the collections.namedtuple to support validation and defaults:
from collections import namedtuple
class Entity(namedtuple('EntityBase', 'name value flag')):
def __new__(cls, name, value, flag=False):
if value < 0:
raise ValueError('value should not be negative')
return super().__new__(cls, name, value, flag)
entities = [
Entity('abc', 123),
Entity('def', 456),
Entity('xyz', 789, flag=True),
]
for ent in entities:
print(
f'{ent.name.upper() = }',
f'{ent.value + 1 = }',
f'{ent.flag = }',
sep='\N{middle dot}'.center(3),
)
We may further raise this into a dataclasses.dataclass if we need to add
instance methods, to add additional protocols, to customise protocol
implementation, or to support mutability.
from dataclasses import dataclass
@dataclass
class Entity:
name : str
value : int
flag : bool = False
def __post_init__(self):
if self.value < 0:
raise ValueError('value should not be negative')
def __call__(self):
self.value += 1
def __eq__(self, other):
return self.name == other.name and self.value == other.value
entities = [
Entity('abc', 123),
Entity('def', 456),
Entity('xyz', 789, flag=True),
]
for ent in entities:
ent()
print(
f'{ent.name.upper() = }',
f'{ent.value + 1 = }',
f'{ent.flag = }',
sep='\N{middle dot}'.center(3),
)
Finally, we may rewrite as a class-style object with all of the boilerplate.
class Entity:
def __init__(self, name, value, flag=False):
if value < 0:
raise ValueError('value should not be negative')
self.name, self.value, self.flag = name, value, flag
def __call__(self):
self.value += 1
def __eq__(self, other):
return self.name == other.name and self.value == other.value
def __repr__(self):
return f'Entity({self.name!r}, {self.value!r}, {self.flag!r})'
entities = [
Entity('abc', 123),
Entity('def', 456),
Entity('xyz', 789, flag=True),
]
for ent in entities:
ent()
print(
f'{ent.name.upper() = }',
f'{ent.value + 1 = }',
f'{ent.flag = }',
sep='\N{middle dot}'.center(3),
)
There are other boilerplate-elimination tools in the Python standard library.
For example, enum.Enum allows us to create enumerated types easily.
from enum import Enum
Choice = Enum('Choice', 'A B C')
print(
f'{Choice.A = }',
f'{Choice.B = }',
f'{Choice.C = }',
sep='\n',
)
functools.total_ordering allows us to implement comparison operators without
having to write them all out (assuming the object supports mathematical properties
associated with a total ordering.)
from dataclasses import dataclass
from functools import total_ordering
@total_ordering
@dataclass
class T:
value : int
def __eq__(self, other):
return self.value == other.value
def __lt__(self, other):
return self.value < other.value
# def __gt__(self, other):
# return self.value > other.value
# def __ne__(self, other):
# return self.value != other.value
# def __lte__(self, other):
# return self.value <= other.value
# def __gte__(self, other):
# return self.value >= other.value
A contextlib.contextmanager allows us to situate a generator into the
contextmanager __enter__/__exit__ protocol.
class Context:
def __enter__(self):
print(f'T.__enter__')
def __exit__(self, exc_value, exc_type, traceback):
print(f'T.__exit__')
with Context():
print('block')
from contextlib import contextmanager
@contextmanager
def context():
print(f'__enter__')
try: yield
finally: pass
print(f'__exit__')
with context():
print('block')
How can eliminating it help me work faster?
print("Let's take a look!")
Python function definitions are executed at runtime.
def f():
pass
print(f'{f = }')
This is why we can conditionally define functions or define funcitons in other functions. In Python, we can treat functions like any other data.
from random import Random
from inspect import signature
rnd = Random(0)
if rnd.choice([True, False]):
def f(x, y):
return x + y
else:
def f(x):
return -x
print(
f'{f = }',
f'{signature(f) = }',
sep='\n'
)
from types import FunctionType
def f(): pass
f = FunctionType(
f.__code__,
f.__globals__,
name=f.__name__,
argdefs=f.__defaults__,
closure=f.__closure__,
)
print(
f'{f = }',
f'{f.__code__ = }',
f'{f.__globals__ = }',
f'{f.__defaults__ = }',
f'{f.__closure__ = }',
sep='\n'
)
def f(x): return x + 1
def g(x): return x * 2
def h(x): return x ** 3
for func in [f, g, h]:
print(f'{func(123) = :,}')
for rv in [f(123), g(123), h(123)]:
print(f'{rv = :,}')
FUNCS = {
'eff': f,
'gee': g,
'aich': h,
}
for name in 'eff eff gee aich'.split():
print(f'{FUNCS[name](123) = :,}')
When we define a function in Python, it “closes” over its defining environment. In other words, if the function accesses data that is neither in the global scope nor local scope (but in the enclosing function’s scope,) we create a means to access this data. Note that this does not mean that we capture a reference to the data; the closure is its own indirection.
from dis import dis
def f(y):
def g(z):
return x + y + z
return g
x = 1
g = f(y=20)
print(
f'{g(z=300) = }',
sep='\n',
)
# dis(g)
def f(y):
def g(z):
return x + y + z
return g
x = 1
g = f(y=20)
print(
f'{g.__closure__ = }',
f'{g.__closure__[0].cell_contents = }',
sep='\n',
)
def f(x):
def g0():
return x
def g1():
return x
return g0, g1
g0, g1 = f(123)
print(
f'{g0.__closure__ = }',
f'{g1.__closure__ = }',
f'{g0.__closure__[0].cell_contents = }',
f'{g1.__closure__[0].cell_contents = }',
sep='\n',
)
from math import prod
def f(xs):
def g0():
xs.append(sum(xs))
return xs
def g1():
xs.append(prod(xs))
return xs
return g0, g1
g0, g1 = f([1, 2, 3])
print(
f'{g0() = }',
f'{g1() = }',
f'{g0() = }',
f'{g1() = }',
sep='\n',
)
from math import prod
def f(x):
def g0():
nonlocal x
x += 2
return x
def g1():
nonlocal x
x *= 2
return x
return g0, g1
g0, g1 = f(123)
print(
f'{g0() = }',
# f'{g0.__closure__[0], } · {g1.__closure__[0] = }',
f'{g1() = }',
# f'{g0.__closure__[0], } · {g1.__closure__[0] = }',
f'{g0() = }',
# f'{g0.__closure__[0], } · {g1.__closure__[0] = }',
f'{g1() = }',
# f'{g0.__closure__[0], } · {g1.__closure__[0] = }',
sep='\n',
)
This will be important later.
Recall that functions are a means by which we can eliminate “update anomalies.” They represent a “single source of truth” for how to perform an operation.
We want to distinguish between “coïncidental” and “intentional” repetition. In the case of “intentional” repetition, we want to write a function; in the case of “coïncidental” repetition, we may not want to write a function.
# library.py
from random import Random
from statistics import mean, pstdev
from string import ascii_lowercase
from itertools import groupby
def generate_data(*, random_state=None):
rnd = Random() if random_state is None else random_state
return {
''.join(rnd.choices(ascii_lowercase, k=2)): rnd.randint(-100, +100)
for _ in range(100)
}
def normalise_data(data):
μ,σ = mean(data.values()), pstdev(data.values())
return {k: (v - μ) / σ for k, v in data.items()}
def process_data(data):
return groupby(sorted(data.items(), key=(key := lambda k_v: k_v[0][0])), key=key)
def report(results):
for k, g in results:
g = dict(g)
print(f'{k:<3} {min(g.values()):>5.2f} ~ {max(g.values()):>5.2f}')
# script0.py
if __name__ == '__main__':
rnd = Random(0)
raw_data = generate_data(random_state=rnd)
data = normalise_data(raw_data)
results = process_data(data)
report(results)
# script1.py
if __name__ == '__main__':
rnd = Random(0)
raw_data = generate_data(random_state=rnd)
data = normalise_data(raw_data)
results = process_data(data)
report(results)
def do_report():
rnd = Random(0)
raw_data = generate_data(random_state=rnd)
data = normalise_data(raw_data)
results = process_data(data)
report(results)
# script0.py
if __name__ == '__main__':
do_report()
# script1.py
if __name__ == '__main__':
do_report()
def do_report(normalise=True):
rnd = Random(0)
raw_data = generate_data(random_state=rnd)
if normalise:
data = normalise_data(raw_data)
results = process_data(data)
report(results)
# script0.py
if __name__ == '__main__':
do_report(normalise=False)
# script1.py
if __name__ == '__main__':
do_report()
def report(results, prec=2):
for k, g in results:
g = dict(g)
print(f'{k:<3} {min(g.values()):>{2+1+prec}.{prec}f} ~ {max(g.values()):>{2+1+prec}.{prec}f}')
def do_report(normalise=True, digits_prec=None):
rnd = Random(0)
raw_data = generate_data(random_state=rnd)
if normalise:
data = normalise_data(raw_data)
results = process_data(data)
if digits_prec is not None:
report(results, prec=digits_prec)
else:
report(results)
# script0.py
if __name__ == '__main__':
do_report(normalise=False)
# script1.py
if __name__ == '__main__':
do_report(digits_prec=5)
If the functions provided by our analytical libraries represent the base-most, atomic units of our work, we could describe the common progression of effort as starting with manual composition of these units. Where patterns arise and intentional repetition is found, our primary work may move to managing this composition: writing classes and functions. Our work may continue to grow more abstract and we may discover patterns and intentional repetition across the writing of functions.
f()
g()
f()
h()
def func0():
f()
g()
f()
def func1():
f(g())
func0()
func1()
Mechanically, the @ syntax in Python is simple shorthand.
@dec
def f():
pass
# … means…
def f():
pass
f = dec(f)
This is key to understanding all of the mechanics behind decorators.
The simplest example of decorators is a system in which we need to instrument some code.
from random import Random
from time import sleep
def fast(x, y, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.1, .2))
return x + y
def slow(x, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.25, .5))
return x**2
if __name__ == '__main__':
print(f'{fast(123, 456) = :,}')
print(f'{slow(123) = :,}')
print(f'{slow(456) = :,}')
print(f'{fast(456, 789) = :,}')
from random import Random
from time import sleep, perf_counter
def fast(x, y, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.1, .2))
return x + y
def slow(x, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.25, .5))
return x**2
if __name__ == '__main__':
before = perf_counter()
print(f'{fast(123, 456) = :,}')
after = perf_counter()
print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
before = perf_counter()
print(f'{slow(123) = :,}')
after = perf_counter()
print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
before = perf_counter()
print(f'{slow(456) = :,}')
print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
before = perf_counter()
print(f'{fast(456, 789) = :,}')
after = perf_counter()
print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
from random import Random
from time import sleep, perf_counter
def fast(x, y, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.1, .2))
return x + y
def slow(x, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.25, .5))
return x**2
if __name__ == '__main__':
if __debug__: before = perf_counter()
print(f'{fast(123, 456) = :,}')
if __debug__:
after = perf_counter()
print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
if __debug__: before = perf_counter()
print(f'{slow(123) = :,}')
if __debug__:
after = perf_counter()
print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
if __debug__: before = perf_counter()
print(f'{slow(456) = :,}')
if __debug__:
after = perf_counter()
print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
if __debug__: before = perf_counter()
print(f'{fast(456, 789) = :,}')
if __debug__:
after = perf_counter()
print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
from random import Random
from time import sleep, perf_counter
def fast(x, y, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.1, .2))
return x + y
def slow(x, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.25, .5))
return x**2
if __debug__:
def bef():
global before
before = perf_counter()
def aft():
after = perf_counter()
print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
else:
def bef(): pass
def aft(): pass
if __name__ == '__main__':
bef()
print(f'{fast(123, 456) = :,}')
aft()
bef()
print(f'{slow(123) = :,}')
aft()
bef()
print(f'{slow(456) = :,}')
aft()
bef()
print(f'{fast(456, 789) = :,}')
aft()
from random import Random
from time import sleep, perf_counter
def fast(x, y, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.1, .2))
return x + y
def slow(x, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.25, .5))
return x**2
if __debug__:
def bef():
global before
before = perf_counter()
def aft():
after = perf_counter()
print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
else:
def bef(): pass
def aft(): pass
if __name__ == '__main__':
bef()
print(f'{fast(123, 456) = :,}')
aft()
bef()
print(f'{slow(123) = :,}')
aft()
bef()
print(f'{slow(456) = :,}')
aft()
bef()
print(f'{fast(456, 789) = :,}')
aft()
from random import Random
from time import sleep, perf_counter
def fast(x, y, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.1, .2))
return x + y
def slow(x, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.25, .5))
return x**2
if __debug__:
def timed(func, *args, **kwargs):
before = perf_counter()
rv = func(*args, **kwargs)
after = perf_counter()
print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
return rv
else:
def timed(func, *args, **kwargs):
return func(*args, **kwargs)
if __name__ == '__main__':
print(f'{timed(fast, 123, 456) = :,}')
print(f'{timed(slow, 123) = :,}')
print(f'{timed(slow, 456) = :,}')
print(f'{timed(fast, 456, 789) = :,}')
from random import Random
from time import sleep, perf_counter
def fast(x, y, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.1, .2))
return x + y
def slow(x, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.25, .5))
return x**2
if __debug__:
def timed(func):
def inner(*args, **kwargs):
before = perf_counter()
rv = func(*args, **kwargs)
after = perf_counter()
print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
return rv
return inner
else:
def timed(func):
return func
if __name__ == '__main__':
print(f'{timed(fast)(123, 456) = :,}')
print(f'{timed(slow)(123) = :,}')
print(f'{timed(slow)(456) = :,}')
print(f'{timed(fast)(456, 789) = :,}')
from random import Random
from time import sleep, perf_counter
def fast(x, y, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.1, .2))
return x + y
def slow(x, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.25, .5))
return x**2
if __debug__:
def timed(func):
def inner(*args, **kwargs):
before = perf_counter()
rv = func(*args, **kwargs)
after = perf_counter()
print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
return rv
return inner
fast, slow = timed(fast), timed(slow)
if __name__ == '__main__':
print(f'{fast(123, 456) = :,}')
print(f'{slow(123) = :,}')
print(f'{slow(456) = :,}')
print(f'{fast(456, 789) = :,}')
from random import Random
from time import sleep, perf_counter
def timed(func):
def inner(*args, **kwargs):
before = perf_counter()
rv = func(*args, **kwargs)
after = perf_counter()
print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
return rv
return inner
def fast(x, y, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.1, .2))
return x + y
if __debug__: fast = timed(fast)
def slow(x, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.25, .5))
return x**2
if __debug__: slow = timed(slow)
if __name__ == '__main__':
print(f'{fast(123, 456) = :,}')
print(f'{slow(123) = :,}')
print(f'{slow(456) = :,}')
print(f'{fast(456, 789) = :,}')
from random import Random
from time import sleep, perf_counter
def timed(func):
def inner(*args, **kwargs):
before = perf_counter()
rv = func(*args, **kwargs)
after = perf_counter()
print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
return rv
return inner
@timed if __debug__ else lambda f: f
def fast(x, y, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.1, .2))
return x + y
@timed if __debug__ else lambda f: f
def slow(x, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.25, .5))
return x**2
if __name__ == '__main__':
print(f'{fast(123, 456) = :,}')
print(f'{slow(123) = :,}')
print(f'{slow(456) = :,}')
print(f'{fast(456, 789) = :,}')
from random import Random
from time import sleep, perf_counter
def timed(func):
def inner(*args, **kwargs):
before = perf_counter()
rv = func(*args, **kwargs)
after = perf_counter()
print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
return rv
inner.orig = func
return inner
@timed if __debug__ else lambda f: f
def fast(x, y, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.1, .2))
return x + y
@timed if __debug__ else lambda f: f
def slow(x, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.25, .5))
return x**2
if __name__ == '__main__':
print(f'{fast(123, 456) = :,}')
print(f'{slow(123) = :,}')
print(f'{slow.orig(456) = :,}')
print(f'{fast(456, 789) = :,}')
# help(fast)
from random import Random
from time import sleep, perf_counter
from functools import wraps, cached_property
from collections import deque, namedtuple
from datetime import datetime
class Call(namedtuple('CallBase', 'timestamp before after func args kwargs')):
@cached_property
def elapsed(self):
return self.after - self.before
def timed(telemetry):
def dec(func):
@wraps(func)
def inner(*args, **kwargs):
before = perf_counter()
rv = func(*args, **kwargs)
after = perf_counter()
telemetry.append(
Call(datetime.now(), before, after, func, args, kwargs)
)
return rv
inner.orig = func
return inner
return dec
telemetry = []
@timed(telemetry) if __debug__ else lambda f: f
def fast(x, y, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.1, .2))
return x + y
@timed(telemetry) if __debug__ else lambda f: f
def slow(x, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.25, .5))
return x**2
if __name__ == '__main__':
print(f'{fast(123, 456) = :,}')
print(f'{slow(123) = :,}')
print(f'{slow.orig(456) = :,}')
print(f'{fast(456, 789) = :,}')
for x in telemetry:
print(f'{x.func.__name__} \N{mathematical bold capital delta}t: {x.elapsed:.2f}s')
from random import Random
from time import sleep, perf_counter
from functools import wraps, cached_property
from collections import deque, namedtuple
from datetime import datetime
from contextvars import ContextVar
from contextlib import contextmanager, nullcontext
from inspect import currentframe, getouterframes
def instrumented(func):
if not __debug__:
return func
@wraps(func)
def inner(*args, **kwargs):
ctx = inner.context.get(nullcontext)
frame = getouterframes(currentframe())[1]
with ctx(frame, func, args, kwargs) if ctx is not nullcontext else ctx():
return func(*args, **kwargs)
@contextmanager
def with_measurer(measurer):
token = inner.context.set(measurer)
try: yield
finally: pass
inner.context.reset(token)
inner.with_measurer = with_measurer
inner.context = ContextVar('context')
return inner
@instrumented
def fast(x, y, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.1, .2))
return x + y
@instrumented
def slow(x, *, random_state=None):
rnd = Random() if random_state is None else random_state
sleep(rnd.uniform(.25, .5))
return x**2
if __name__ == '__main__':
class Call(namedtuple('CallBase', 'lineno timestamp before after func args kwargs')):
@cached_property
def elapsed(self):
return self.after - self.before
def __str__(self):
if self.args and self.kwargs:
params = (
f'{", ".join(f"{x!r}" for x in self.args)}, '
f'{", ".join(f"{k}={v!r}" for k, v in self.kwargs.items())}'
)
elif self.args:
params = f'{", ".join(f"{x!r}" for x in self.args)}'
elif self.kwargs:
params = f'{", ".join(f"{k}={v!r}" for k, v in self.kwargs.items())}'
else:
params = ''
return f'{self.func.__name__}({params})'
telemetry = []
@classmethod
@contextmanager
def timed(cls, frame, func, args, kwargs):
before = perf_counter()
try: yield
finally: pass
after = perf_counter()
cls.telemetry.append(
cls(frame.lineno, datetime.now(), before, after, func, args, kwargs)
)
with fast.with_measurer(Call.timed), slow.with_measurer(Call.timed):
print(f'{fast(123, 456) = :,}')
print(f'{slow(123) = :,}')
print(f'{slow(456) = :,}')
print(f'{fast(456, 789) = :,}')
for x in Call.telemetry:
print(f'@line {x.lineno}: {x!s:<20} \N{mathematical bold capital delta}t {x.elapsed:.2f}s')
When would I actually write a decorator or a higher-order decorator… and why?
print("Let's take a look!")
A def-decorator performs the following syntactical transformation:
def dec(f): pass
@dec
def f(): pass
def f(): pass
f = dec(f)
Note that the common description of a decorator as a “function that takes a function and returns a function” is imprecise.
class T:
def __init__(self, g):
self.g = g
@T
def g():
yield
print(f'{g = }')
A class-decorator performs the following syntactical transformation:
def dec(f): pass
@dec
class cls: pass
class cls: pass
cls = dec(cls)
Just as Python functions are defined and created at runtime, Python classes are also defined and created at runtime.
from random import Random
from inspect import signature
rnd = Random(0)
if rnd.choice([True, False]):
class T:
def f(self, x, y):
return x * y
else:
class T:
def f(self, x):
return x ** 2
print(f'{signature(T.f) = }')
Unlike the body of a function, for which bytecode is generated but not executed
at function definite time, the body of a class is executed at class definition
time.
from random import Random
from inspect import signature
rnd = Random(0)
class T:
if rnd.choice([True, False]):
def f(self, x, y):
return x * y
else:
def f(self, x):
return x ** 2
print(f'{signature(T.f) = }')
A Python class can have attributes added at runtime.
Unlike in Python 2, Python 3 doesn’t distinguish between bound and unbound
methods. Instead, all Python functions support the __get__ descriptor
protocol. The __get__ method is invoked when an attribute is looked up via
the __getattr__/getattr protocol and is found on a class. When a
function’s __get__ is invoked, it returns a method which binds the instance
argument. Therefore, all Python 3 functions are unbound methods, and,
therefore, it is relatively easy to add new methods to Python classes.
class T:
pass
T.f = lambda self: ...
obj = T()
print(f'{obj.f() = }')
A class decorator receives the fully-constructed class and can therefore add,
remove, or inspect attributes on that class. Note that a class decorator
cannot distinguish the code that was statically written in the body of the
class from code that was added to the class afterwards.
def dec(cls):
print(f'{cls = }')
return cls
@dec
class A:
pass
@dec
class B(A):
pass
Just as a def-decorator is used anytime we need to eliminate the risk
of update anomaly associated with the definition of a function, a class
decorator is about eliminating the risk of update anomaly associated with
the definition of a class.
A class decorator could be used instead of inheritance to add functionality
to a class without disrupting the inheritance hierarchy while potentially
introducing modalities.
class A:
def f(self):
pass
class B(A):
def g(self):
pass
obj = B()
print(
f'{obj.f() = }',
f'{obj.g() = }',
sep='\n',
)
def dec(cls):
cls.f = lambda _: None
return cls
@dec
class A:
pass
@dec
class B(A):
def g(self):
pass
obj = B()
print(
f'{obj.f() = }',
f'{obj.g() = }',
sep='\n',
)
def add_func(*funcs):
def dec(cls):
for name in funcs:
setattr(cls, name, lambda _: None)
return cls
return dec
@add_func('f', 'g')
class A:
pass
@add_func('f', 'h')
class B(A):
def g(self):
pass
obj = B()
print(
f'{obj.f() = }',
f'{obj.g() = }',
f'{obj.h() = }',
sep='\n',
)
A class-decorator can check that a class has certain contents (though
it won’t be able to determine precisely how those contents were provided.)
def dec(cls):
if not hasattr(cls, 'f'):
raise TypeError('must define f')
return cls
class A:
def f(self):
pass
@dec
class B(A):
def f(self):
pass
When would I actually write a class decorator… and is this really better than other approaches?
print("Let's take a look!")
A Python class allows us to implement the Python “vocabulary” by writing
special __-methods.
class T:
def __getitem__(self, key):
pass
def __len__(self):
return 0
obj = T()
print(
f'{obj[...] = }',
f'{len(obj) = }',
sep='\n',
)
These special __-methods are not looked up via the __getattr__ protocol. In
CPython, they are looked up by direct C-struct access on type(…).
If we wanted to implement the Python vocabulary on a class object, we would
need to implement these methods on whatever type(cls) is. This entity is
called the “metaclass.”
A Python class is responsible for constructing its instances. A Python metaclass is responsible for constructing its instances, which happen to be Python classes.
from logging import getLogger, basicConfig, INFO
logger = getLogger(__name__)
basicConfig(level=INFO)
class TMeta(type):
def __getitem__(self, key):
logger.info('TMeta.__getitem__(%r, %r)', self, key)
pass
def __len__(self):
logger.info('TMeta.__len__(%r)', self)
return 0
class T(metaclass=TMeta):
def __getitem__(self, key):
logger.info('T.__getitem__(%r, %r)', self, key)
pass
def __len__(self):
logger.info('T.__len__(%r)', self)
return 0
obj = T()
obj[...]
len(obj)
T[...]
len(T)
This is not altogether that useful, in practice.
from logging import getLogger, basicConfig, INFO
logger = getLogger(__name__)
basicConfig(level=INFO)
class TMeta(type):
def __call__(self, *args, **kwargs):
obj = self.__new__(self, *args, **kwargs)
obj.__init__(*args, **kwargs)
obj.__post_init__()
return obj
class T(metaclass=TMeta):
def __new__(cls, value):
return super().__new__(cls)
def __init__(self, value):
self.value = value
def __post_init__(self):
self.value = abs(self.value)
def __repr__(self):
return f'T({self.value!r})'
obj = T(-123)
print(f'{obj = }')
Metaclasses are inherited down the class hierarchy. This is why, historically, they were used for enforcing constraints from base types to derived types.
Consider that Derived needs to constrain Base in order to operate
correctly. However, this can be done trivially in app.py without touching any
code in library.py.
from inspect import signature
# library.py
class Base:
def helper(self):
...
# app.py
print(
f'{signature(Base.helper) = }',
)
class Derived(Base):
def func(self):
return self.helper()
But if Base needs to constrain Derived, then this cannot be done so easily
without putting code in app.py. Instead, we need to find some mechanism
that operates at a higher level.
# library.py
class Base:
def func(self):
return self.implementation()
# app.py
class Derived(Base):
def implementation(self):
...
The highest level mechanism we can employ to add a hook into the class
construction process is builtins.__build_class__.
from functools import wraps
import builtins
@lambda f: setattr(builtins, f.__name__, f(getattr(builtins, f.__name__)))
def __build_class__(orig):
@wraps(orig)
def inner(func, name, *bases, **kwargs):
print(f'{func, name, bases, kwargs = }')
return orig(func, name, *bases)
# return orig(func, name, *bases, **kwargs)
return inner
class Base: pass
class Derived(Base): pass
class MoreDerived(Base, x=...): pass
What is the function that is passed to __build_class__?
from functools import wraps
import builtins
@lambda f: setattr(builtins, f.__name__, f(getattr(builtins, f.__name__)))
def __build_class__(orig):
@wraps(orig)
def inner(func, name, *bases, **kwargs):
print(f'{func, name, bases, kwargs = }')
print(f'{func() = }')
# exec(func.__code__, globals(), ns := {})
# print(f'{ns = }')
return orig(func, name, *bases, **kwargs)
return inner
class T:
def f(self):
pass
There’s not much we can do with __build_class__ other than debugging or
instrumentation.
from functools import wraps
import builtins
@lambda f: setattr(builtins, f.__name__, f(getattr(builtins, f.__name__)))
def __build_class__(orig):
@wraps(orig)
def inner(func, name, *bases, **kwargs):
print(f'{func, name, bases, kwargs = }')
return orig(func, name, *bases, **kwargs)
return inner
import json
# import pandas, matplotlib
Since a metaclass is inherited down the class hierarchy, it gives us a narrower
hook-point. Additionally, the metaclass gets the partially constructed class,
which is, in practice, more useful to work with.
class BaseMeta(type):
def __new__(cls, name, bases, body, **kwargs):
print(f'{cls, name, bases, body, kwargs = }')
# return super().__new__(cls, name, bases, body, **kwargs)
return super().__new__(cls, name, bases, body)
class Base(metaclass=BaseMeta):
pass
class Derived(Base, x=...):
pass
We can use this to enforce constraints.
# library.py
from inspect import signature
class BaseMeta(type):
def __new__(cls, name, bases, body, **kwargs):
rv = super().__new__(cls, name, bases, body, **kwargs)
if rv.__mro__[-2::-1].index(rv):
rv.check()
return rv
class Base(metaclass=BaseMeta):
@classmethod
def check(cls):
if not hasattr(cls, 'implementation'):
raise TypeError('must implement method')
if 'x' not in signature(cls.implementation).parameters:
raise TypeError('method must take parameter named x')
def func(self):
return self.implementation()
# app.py
class Derived(Base):
def implementation(self, x):
...
However, metaclasses tend to be tricky to write correctly, especially if you need to compose them.
# library.py
from inspect import signature
class BaseMeta(type):
pass
class Base(metaclass=BaseMeta):
pass
# app.py
class DerivedMeta(type):
pass
class Derived(Base, metaclass=DerivedMeta):
pass
# library.py
from inspect import signature
class Base0Meta(type):
def __new__(cls, name, bases, body, **kwargs):
print(f'Base0Meta.__new__({cls!r}, {name!r}, {bases!r}, {body!r}, **{kwargs!r})')
return super().__new__(cls, name, bases, body, **kwargs)
class Base0(metaclass=Base0Meta):
pass
class Base1Meta(type):
def __new__(cls, name, bases, body, **kwargs):
print(f'Base1Meta.__new__({cls!r}, {name!r}, {bases!r}, {body!r}, **{kwargs!r})')
return super().__new__(cls, name, bases, body, **kwargs)
class Base1(metaclass=Base0Meta):
pass
# app.py
class Derived(Base0, Base1):
pass
class Derived(Base1, Base0):
pass
# library.py
from inspect import signature
class Base0Meta(type):
def __new__(cls, name, bases, body, **kwargs):
print(f'Base0Meta.__new__({cls!r}, {name!r}, {bases!r}, {body!r}, **{kwargs!r})')
return super().__new__(cls, name, bases, body, **kwargs)
class Base0(metaclass=Base0Meta):
pass
class Base1Meta(type):
def __new__(cls, name, bases, body, **kwargs):
print(f'Base1Meta.__new__({cls!r}, {name!r}, {bases!r}, {body!r}, **{kwargs!r})')
return super().__new__(cls, name, bases, body, **kwargs)
class Base1(metaclass=Base0Meta):
pass
# app.py
class Derived(Base0):
pass
class MoreDerived(Base1, Derived):
pass
In Python 3.6, the __init_subclass__ mechanism was introduced. Like a metaclass,
it is inherited down the class hierarchy. Unlike the metaclass, it gets the fully
constructed class. __init_subclass__ doesn’t have the same compositional difficulties that
metaclasses have.
class Base:
def __init_subclass__(cls, **kwargs):
print(f'{cls, kwargs = }')
class Derived(Base, x=...):
pass
# library.py
from inspect import signature
class Base:
def __init_subclass__(cls):
if not hasattr(cls, 'implementation'):
raise TypeError('must implement method')
if 'x' not in signature(cls.implementation).parameters:
raise TypeError('method must take parameter named x')
def func(self):
return self.implementation()
# app.py
class Derived(Base):
def implementation(self, x):
...
class Base0:
def __init_subclass__(cls):
print(f'Base0.__init_subclass__({cls!r})')
super().__init_subclass__()
class Base1:
def __init_subclass__(cls):
print(f'Base1.__init_subclass__({cls!r})')
super().__init_subclass__()
class Derived0(Base0, Base1):
pass
class Derived1(Base1, Base0):
pass
class Base0:
def __init_subclass__(cls):
print(f'Base0.__init_subclass__({cls!r})')
super().__init_subclass__()
class Base1:
def __init_subclass__(cls):
print(f'Base1.__init_subclass__({cls!r})')
super().__init_subclass__()
class Derived(Base0):
pass
# print(f'{Derived.__mro__ = }')
class MoreDerived0(Derived, Base1):
pass
# print(f'{MoreDerived0.__mro__ = }')
class MoreDerived1(Base1, Derived):
pass
# print(f'{MoreDerived1.__mro__ = }')
However, an __init_subclass__ requires that we interact with the inheritance
hierarchy. But with a class-decorator, we do not. In the case of a
class-decorator, we also get the fully-constructed class, but we don’t
get any keyword arguments.
class Base:
def __init_subclass__(cls, **kwargs):
print(f'Base.__init_subclass__({cls!r}, **{kwargs!r})')
class Derived(Base, x=...):
pass
def dec(cls):
print(f'dec({cls!r})')
return cls
@dec
class T:
pass
However, we can write a higher-order class-decorator to introduce modalities.
class Base:
def __init_subclass__(cls, **kwargs):
print(f'Base.__init_subclass__({cls!r}, **{kwargs!r})')
class Derived(Base, x=...):
pass
def d(**kwargs):
def dec(cls):
print(f'dec({cls!r}, **{kwargs!r})')
return cls
return dec
@d(x=...)
class T:
pass
When would I actually write a metaclass… and is there a better way?
print("Let's take a look!")
In Python, the builtin eval and exec functions allow us to execute code
encoded as an str. eval allows us to evaluate a single expression and
returns its result; exec allows us to execute a suite of statements but
does not return anything.
from textwrap import dedent
code = '1 + 1'
print(f'{eval(code) = }')
code = dedent('''
x = 1
y = 1
x + y
''').strip()
print(f'{exec(code) = }')
With both exec and eval, you can pass in a namespace; with exec, you can
capture results by capturing name binding in this namespace
from textwrap import dedent
code = '1 + 1 + z'
print(f'{eval(code, globals(), ns := {"z": 123}) = }')
print(f'{ns = }')
code = dedent('''
x = 1
y = 1
w = x + y + z
''').strip()
print(f'{exec(code, globals(), ns := {"z": 123}) = }')
print(f'{ns = }')
Obviously, eval('1 + 1') is inferior to evaluating 1 + 1. We don’t get
syntax highlighting. We don’t get any static mechanisms provided by the
interpreter (such as constant folding.)
However, by encoding the executed or evaluated code as a string, that means we can use string manipulation to create code snippets. Obviously, in most cases, this is inferior to other programmatic or meta-programmatic techniques.
x, y, z = 123, 456, 789
var0, var1 = 'x', 'y'
code = f'{var0} + {var1}'
res = eval(code, globals(), locals())
print(f'{res = }')
if ...:
res = x + y
print(f'{res = }')
var0, var1 = 'x', 'y'
res = globals()[var0] + globals()[var1]
print(f'{res = }')
But there are also clearly metaprogramming situations where string manipulation may be superior.
from dataclasses import dataclass
from datetime import datetime
from typing import Any
@dataclass
class Propose:
ident : int
timestamp : datetime
payload : Any
@dataclass
class Accept:
ident : int
timestamp : datetime
@dataclass
class Reject:
ident : int
timestamp : datetime
@dataclass
class Commit:
ident : int
timestamp : datetime
payload : Any
print(
f'{Propose(..., ..., ...) = }',
f'{Accept(..., ...) = }',
f'{Reject(..., ...) = }',
f'{Commit(..., ..., ...) = }',
sep='\n',
)
from csv import reader
from textwrap import dedent
from dataclasses import dataclass
message_definitions = dedent('''
name,*fields
Propose,ident,timestamp,payload
Acc ept,ident,timestmap
Reject,ident,timestmap
Commit,ident,timestamp,payload
''').strip()
messages = {}
for lineno, (name, *fields) in enumerate(reader(message_definitions.splitlines()), start=1):
if lineno == 1: continue
messages[name] = name, fields
class MessageBase: pass
for name, fields in messages.values():
globals()[name] = dataclass(type(name, (MessageBase,), {
'__annotations__': dict.fromkeys(fields)
}))
print(
globals(),
f'{Propose(..., ..., ...) = }',
# f'{Accept(..., ...) = }',
f'{Reject(..., ...) = }',
f'{Commit(..., ..., ...) = }',
sep='\n',
)
from csv import reader
from textwrap import dedent, indent
from dataclasses import dataclass
message_definitions = dedent('''
name,*fields
Propose,ident,timestamp,payload
Accept,ident,timestmap
Reject,ident,timestmap
Commit,ident,timestamp,payload
''').strip()
messages = {}
for lineno, (name, *fields) in enumerate(reader(message_definitions.splitlines()), start=1):
if lineno == 1: continue
messages[name] = name, fields
class MessageBase: pass
for name, fields in messages.values():
code = dedent(f'''
@dataclass
class {name}(MessageBase):
''').strip().format(fields=indent('\n'.join(f"{f} : ..." for f in fields), ' ' * 4))
print(code)
exec(code, globals(), locals())
print(
f'{Propose(..., ..., ...) = }',
f'{Accept(..., ...) = }',
f'{Reject(..., ...) = }',
f'{Commit(..., ..., ...) = }',
sep='\n',
)
There is nothing inherently wrong about eval or exec (in most execution environments.)
from tempfile import TemporaryDirectory
from sys import path
from pathlib import Path
from textwrap import dedent
with TemporaryDirectory() as d:
d = Path(d)
code = dedent('''
class T: pass
''').strip()
with open(d / 'module.py', mode='wt') as f:
print(code, file=f)
path.insert(0, f'{d!s}')
import module
del path[0]
print(f'{module.T = }')
We can think of all code creation mechanisms as lying on a spectrum:
from tempfile import TemporaryDirectory
from inspect import getsource
from collections import namedtuple
from textwrap import dedent
from sys import path
from pathlib import Path
class T0: pass
T1 = namedtuple('T1', '')
class T1(namedtuple('T1', '')): pass
...
T2 = type('T2', (tuple,), {'__call__': lambda _: ...})
...
exec(dedent('''
class T3:
pass
'''), globals(), locals())
with TemporaryDirectory() as d:
d = Path(d)
code = dedent('''
class T4:
pass
''').strip()
with open(d / 'module.py', mode='wt') as f:
print(code, file=f)
path.insert(0, f'{d!s}')
from module import *
del path[0]
print(
f'{T0 = }', # getsource(T0),
f'{T1 = }', # getsource(T1),
f'{T2 = }', # getsource(T2),
f'{T3 = }', # getsource(T3),
f'{T4 = }', getsource(T4),
sep='\n',
)
When would I actually use eval or exec… and should I feel as guilty when I do it?