seminar.notes.dutc.io

More Python Basics for Experts !

Overview

Session ⓪: What is even real? (June 6, 11:30am US/Eastern)

Topics: object orientation, identity, equality, mutability, immutability, hashability, Python internals

Session ①: How do I make it real? (June 13, 11:30am US/Eastern)

Topics: object orientation, functional programming, closures, generators, generator coroutines, inheritance, composition, dataclasses.dataclass

Session ②: How do I do less work? (June 20, 11:30am US/Eastern)

Topics: decorators, higher-order decorators, class decorators, metaclasses, init_subclass, metaprogramming, exec, eval

Other Offerings

Subscribe to our newsletter https://bit.ly/expert-python to stay up to date on our offerings and receive exclusive discounts.

Team Courses & Private Bookings

We offer a wide variety of private training courses for your team on topics such as:

Our courses and seminars are designed with the “why” at the forefront of everything we do. As a result, the courses balance information, exercises, and case studies that help encourage attendee success.

Courses are developed to fit the needs of multiple levels of mastery. We strive to ensure that every attendee is taught personally and that all the time they commit to learning is magnified.

Open to the Public

Once per quarter, we hold our Developing Expertise in Python course, open to the public! This course is three full days of intensively personalized hands-on instruction within a small cohort (≤10). Sessions begin with individual interviews with each attendee to assess current levels of understanding and set specific, measurable goals for their individual growth and professional development.

No lecture, no slides—the sessions are driven entirely by discussion around concrete live-coded examples with detailed prep & supplementary review materials (≥50 pages of background course notes and ≥10 hours of background videos) provided.

See our Organizer Page for info on upcoming dates!

Don’t see a course you need? Contact us at learning@dutc.io to get the curricula you’re looking for!

Presenter (James Powell)

James Powell is the founder and lead instructor at Don’t Use This Code. He currently serves as Chairman of the NumFOCUS Board of Directors, helping to oversee the governance and sustainability of all of the major tools in the Python data analysis ecosystem (i.e., pandas, NumPy, Jupyter, Matplotlib). At NumFOCUS, he helps build global open source communities for data scientists, data engineers, and business analysts. He helps NumFOCUS run the PyData conference series and has sat on speaker selection and organizing committees for 18 conferences. James is also a prolific speaker: since 2013, he has given over seventy conference talks at over fifty Python events worldwide. In fact, he is the second most prolific speaker in the PyData and Python ecosystem (source: pyvideo.org).

Session ⓪: What is even real?

What is the difference between identity and equality… and why should I care?

print("Let's take a look!")

Python variable names are just that: names. They are names that we can use to refer to some underlying data.

The == operator in Python determines whether the objects referred to by two names are “equal.” For a container like list, this means that the two objects contain the same elements in the same order.

xs = [1, 20, 300]
ys = [1, 20, 300]
zs = [4_000, 50_000, 600_000]

print(
    f'{xs == ys = }',
    f'{xs == zs = }',
    sep='\n',
)

For a container like dict, this means that the two objects contain the same key value pairs; however, order is not considered.

d0 = {'a': 1, 'b': 20, 'c': 300}
d1 = {'c': 300, 'b': 20, 'a': 1}
d2 = {'d': 4_000, 'e': 5, 'f': 600_000}

print(
    f'{d0 is d1 = }',
    f'{d0 is d2 = }',
    sep='\n',
)

For a collections.OrderedDict, however, the order is considered when determinig equality.

from collections import OrderedDict

d0 = OrderedDict({'a': 1, 'b': 20, 'c': 300})
d1 = OrderedDict({'c': 300, 'b': 20, 'a': 1})

print(
    f'{d0 == d1 = }',
    sep='\n',
)

The is operator in Python determines whether the objects referred to by two names are, in fact, the same object. Unlike ==, this has consistent meaning irrespect of the type of the object.

You can specify what it means for two instances of a user-defined object to be equal (“equivalent”; ==,) but there is no way to specify an alternate or custom meaning for identity (is.)

from dataclasses import dataclass, field
from typing import Any

@dataclass
class T:
    name     : str
    value    : int
    metadata : dict[str, Any] = field(default_factory=dict)

    # do not consider `.metadata` for equality
    def __eq__(self, other):
        return self.name == other.name and self.value == other.value

x = T('abc', 123)
y = T('abc', 123, metadata={...: ...})
z = T('def', 456)

print(
    f'{x == y = }',
    f'{x == z = }',
    sep='\n',
)

Similarly, while it is possible to overload many operators in Python, the assignment and assignment-expression operators (= and :=) cannot be customised in any fashion.

These operations are also called “name-bindings.”

x = y always means “x is a new name for the object that is currently referred to by y.” Unlike in other programming languages, x = y cannot directly trigger any other form of computation (e.g., a copy computation.)

However, since performing a name-binding sometimes involves assignment into a dict representing the active scope, the assignment into the dict can trigger other computations.

from collections.abc import MutableMapping
from logging import getLogger, basicConfig, INFO

logger = getLogger(__name__)
basicConfig(level=INFO)

class namespace(dict, MutableMapping):
    def __setitem__(self, key, value):
        logger.info('namespace.__setitem__(key = %r, value = %r)', key, value)
        super().__setitem__(key, value)

class TMeta(type):
    @staticmethod
    def __prepare__(name, bases, **kwds):
        return namespace()

class T(metaclass=TMeta):
    x = [1, 2, 3]
    y = x

An alternate way to determine whether two names refer to an identical object is to check their id(...) values. The id(...) return value is a (locally, temporally) unique identifier for an object. In current versions of CPython, this corresponds to the memory address of the PyObject* for the object (but this is not guaranteed.)

xs = [1, 20, 300]
ys = [1, 20, 300]
zs = xs

print(
    # f'{xs is ys = }',
    # f'{xs is zs = }',
    f'{id(xs) = :#_x}',
    f'{id(ys) = :#_x}',
    f'{id(zs) = :#_x}',
    sep='\n',
)

Another way to determine if two names refer to an identical object is to perform a mutation via one name and see whether the object referred to the other name has changed or not!

xs = [1, 20, 300]
ys = [1, 20, 300]
zs = xs

xs.append(4_000)

print(
    f'{xs = }',
    f'{ys = }',
    f'{zs = }',
    sep='\n',
)

Note that if two names refer to immutable objects, then those objects cannot be changed; therefore, we will not be able to observe a useful difference between these two names refering to identical objets or merely refering to equivalent objects. As a consequence, the CPython interpreter will try to save memory by “interning” commonly found immutable objects, such as short strings and small numbers. When “interning,” all instances of the same value are, in fact, instances of an identical object.

print(
    # f'{id(eval("123"))     = :#_x}',
    # f'{id(eval("123"))     = :#_x}',
    # f'{id(eval("123_456")) = :#_x}',
    # f'{id(eval("123_456")) = :#_x}',

    f'{id(123)             = :#_x}',
    f'{id(123)             = :#_x}',
    f'{id(123_456)         = :#_x}',
    f'{id(123_456)         = :#_x}',
    sep='\n',
)

We have to use eval in the above example, since (C)Python code in a script will undergo the “constant folding” optimisation.

from pathlib import Path
from sys import path
from tempfile import TemporaryDirectory
from textwrap import dedent

with TemporaryDirectory() as d:
    d = Path(d)
    with open(d / '_module.py', 'w') as f:
        print(dedent('''
            def h():
                x = 123_456_789
                y = 123_456_789
        ''').strip(), file=f)
    path.append(f'{d}')
    from _module import h

def f():
    x = 123_456_789
    y = 123_456_789

def g():
    x = 123_456_789
    y = 123_456_789

print(
    f'{f.__code__.co_consts = }',
    f'{g.__code__.co_consts = }',
    f'{h.__code__.co_consts = }',
    f'{f.__code__.co_consts[-1] is g.__code__.co_consts[-1] = }',
    f'{f.__code__.co_consts[-1] is h.__code__.co_consts[-1] = }',
    sep='\n',
)

The qualifications on “unique” are necessary. Recall that the CPython value for id(...) is currently implemented as the memory address of the object the name refers to (i.e., the value of the PyObject*.)

/* Python/bltinmodule.c */

static PyObject *
builtin_id(PyModuleDef *self, PyObject *v)
{
    PyObject *id = PyLong_FromVoidPtr(v);

    if (id && PySys_Audit("builtins.id", "O", id) < 0) {
        Py_DECREF(id);
        return NULL;
    }

    return id;
}

This can be used to do things we’re otherwise not supposed to, such as directly accessing Python objects.

from numpy import array
from numpy.lib.stride_tricks import as_strided

def setitem(t, i, v):
    xs = array([], dtype='uint64')
    if (loc := xs.__array_interface__['data'][0]) > id(t):
        raise ValueError('`numpy.ndarray` @ {id(xs):#_x} allocated after `tuple` @ {id(t):#_x}')
    xs  = as_strided(xs, strides=(1,), shape=((off := id(t) - loc) + 1,))
    ys  = as_strided(xs[off:], strides=(8,), shape=(4,))
    zs  = as_strided(ys[3:], strides=(8,), shape=(i + 1,))
    ys[2] += max(0, i - (end := len(t)) + 1)
    zs[min(i, end):] = id(v)

t = 0, 1, 2, None, 4, 5
print(f'Before: {t = !r:<24} {type(t) = }')
setitem(t, 3, 3)
print(f'After:  {t = !r:<24} {type(t) = }')

As a consequence of using the memory address as the value for id(…) coupled with the finiteness of memory, we would expect that memory addresses would eventually be reüsed. Therefore, across an arbitrary span of time, two objects with the same id(…) may, in fact, be distinct.

xs = [1, 2, 3]
print(f'{id(xs) = :#_x}')
del xs

ys = [1, 2, 3, 4]
print(f'{id(ys) = :#_x}')

We should not store id(…) values for comparison later. We may be tempted to do this in the case of unhashable objects, but the result will not be meaningful.

class T:
    def __hash__(self):
        raise NotImplementedError()

obj0, obj1 = T(), T()

print(
    # f'{obj0     in {obj0, obj1}         = }',
    # f'{id(obj0) in {id(obj0): obj0, id(obj1): obj1} = }',
    sep='\n',
)

(We see a very similar problem with child processes upon termination of the parent process; in general, since PID are a finite resource and may be reüsed, it is incorrect for us to store and refer to child processes across a span of time in the absence of some isolation mechanism such as a PID namespace.)

### (unsafely?) reduce maximum PID
# <<< $(( 2 ** 7 )) /proc/sys/kernel/pid_max

typeset -a pids=()

() { } & pids+=( "${!}" )

until
    () { } & pids+=( "${!}" )
    (( pids[1] == pids[${#pids}] ))
do :; done

printf 'pid[%d]=%d\n'     1      "${pids[1]}" \
                      "${#pids}" "${pids[${#pids}]}"

Therefore, the following code may be incorrect (since the PID we are killing may not necessarily be the process we think!)

sleep infinity & pid="${!}"

: ...

kill "${pid}"

Up to a name (re-)binding, equality is a transient property but identity is a permanent property. In other words, if two names refer to equal (“equivalent”) objects at some point in time, they may or may not remain equal at some later point in time. However, if two names refer to identical objects at some point in time, the only intervening action that can alter their identicalness is a name (re-)binding.

xs = [1, 2, 3]
ys = [1, 2, 3]

assert xs == ys
...
xs.clear()
...
assert xs != ys
xs = ys = [1, 2, 3]

assert xs is ys
...
# xs = ...
# (xs := ...)
# globals()['xs'] = ...
# from module import xs
...
assert xs is ys

Of course, if the two names refer to immutable objects, then their equivalence is also a permanent property!

Note that identity and equality are separate properties. Identicalness does not necessarily imply equivalence, nor does equivalence imply identiticalness.

# i. equal and identical
xs = ys = [1, 2, 3]
assert xs == ys and xs is ys

# ii. equal but not identical
xs, ys = [1, 2, 3], [1, 2, 3]
assert xs == ys and xs is not ys

# iii. identical and equal
x = y = 2.**53
assert x is y and x == y

# iv. identical but not equal
x = y = float('nan')
# x = y = None
class T:
    def __eq__(self, other):
        return False
assert x is y and x != y

However, note that if two names refer to identical objects, then we are guaranteed that the id(…) values (when captured at a single point in time during the lifetime of both objects) must have equivalent value.

x = y = object()

# two ways to state the same thing
assert x is y and id(x) == id(y)

# since id(…) returns an `int`,
#   since (in CPython) large `int`s are not interned,
#   since (in CPython) `id(…)` gives the memory address, and
#   since (in CPython) these memory addreses are in the upper ranges
#   the `int` that `id(x)` will be allocated separately than `int` that `int(y)`
#   returns, leading to the following…
assert x is y and id(x) is not id(y)

Since equality can be implemented via the object model (but identity cannot,) it is possible for an object to not be equivalent to even itself!

class T:
    def __eq__(self, other):
        return False

obj = T()
assert obj is obj and obj != obj

Note that since == can be implemented but is cannot, and that (in CPython) is is a pointer comparison, is checks are very likely to be consistently faster than == checks.

/* Include/object.h */

#define Py_Is(x, y) ((x) == (y))

Therefore, the use of an enum.Enum may prove faster than an equivalent string equality comparison in some cases. (Note, however, that object equality comparison may just as well implement an identity “fast-path,” minimising the performance improvement.

from time import perf_counter
from contextlib import contextmanager
from enum import Enum

@contextmanager
def timed(msg):
    before = perf_counter()
    try: yield
    finally: pass
    after = perf_counter()
    print(f'{msg:<48} \N{mathematical bold capital delta}t: {after - before:.6f}s')

def f(x):
    return x == 'abcdefg'

Choice = Enum('Choice', 'Abcdefg')
def g(x):
    return x is Choice.Abcdefg

with timed('f'):
    x = 'abcdefg'
    for _ in range(100_000):
        f(x)

with timed('g'):
    x = Choice.Abcdefg
    for _ in range(100_000):
        g(x)

Generally, whether two containers are equivalent is determined by checking whether their contents are equivalent.

def __eq__(self, other):
    if len(xs) != len(ys):
        return False
    for x, y in zip(xs, ys, strict=True):
        if x != y:
            return False
    return True

xs = [1, 2, 3]
ys = [1, 2, 3]

print(
    f'{xs == ys       = }',
    f'{__eq__(xs, ys) = }',
    sep='\n',
)

Except in the implementation of list, there is a shortcut: we first perform a (quicker) check to find the first non-identical object. Then switch to an equality check.

def __eq__(self, other):
    if len(xs) != len(ys):
        return False
    for x, y in zip(xs, ys, strict=True):
        if x is y:
            continue
        if x != y:
            return False
    return True

z = float('nan')
xs = [1, 2, 3, z]
ys = [1, 2, 3, z]

print(
    f'{xs == ys       = }',
    f'{__eq__(xs, ys) = }',
    sep='\n',
)

This is distinct from how numpy.ndarray equality works!

from numpy import array

z = float('nan')
xs = [1, 2, 3, z]
ys = [1, 2, 3, z]

assert xs == ys

z = float('nan')
xs = array([1, 2, 3, z])
ys = array([1, 2, 3, z])

assert not (xs == ys).all()

So why should I care…?

xs = ...
ys = ...

print(f'{xs is ys = }')

What is the difference between a live view and a snapshot… and why does it matter?

print("Let's take a look!")

A “snapshot (copy)” is a static copy of some state at some point in time; a “live view” is a dynamic reference to some state.

xs = [1, 2, 3]

xs.append(4)
ys = xs.copy()
xs.append(5)

print(
    f'{xs = }',
    f'{ys = }',
    sep='\n',
)

Whereas…

xs = [1, 2, 3]
ys = xs

xs.append(4)
xs.append(5)

print(
    f'{xs = }',
    f'{ys = }',
    sep='\n',
)

We may desire a “live view” to eliminate “update anomalies”: cases where an update to one part of the system should be reflected in another part of the system, cases where we want a “single source of truth.”

from dataclasses import dataclass
from copy import copy

@dataclass
class Employee:
    name   : str
    role   : str
    salary : float

@dataclass
class Entitlement:
    employee : Employee
    access   : bool

employees = {
    'alice': Employee('alice', 'programmer', 250_000),
    'bob':   Employee('bob',   'programmer', 225_000),
}

entitlements = {
    k: Entitlement(employee=v, access=False)
    for k, v in employees.items()
}

payroll_by_year = {
    2020: {
        k: copy(v) for k, v in employees.items()
    },
}

employees['alice'].role = 'manager'
employees['alice'].salary *= 1.5

print(
    f'{employees["alice"].role             = }',
    f'{entitlements["alice"].employee.role = }',
    f'{payroll_by_year[2020]["alice"].role = }',
    sep='\n',
)

Copies can be made explicitly or implicitly in a number of different ways.

from copy import copy

xs = [1, 2, 3]
# ys = xs
ys = [*xs]
# ys = list(xs)
# ys = xs.copy()
# ys = copy(xs)

xs.append(4)

print(
    f'{xs = }',
    f'{ys = }',
    sep='\n',
)

We often want to distinguish between “shallow” and “deep” copies. A “shallow copy” is a copy of only the top “level” of a nested container structure. A “deep copy” copies all levels of the nested structure.

xs = [
    [1, 2, 3],
    [4, 5, 6, 7],
]
ys = xs.copy() # or `copy.copy(xs)`

xs[0].insert(0, 0)
xs.append([8, 9])

print(
    f'{xs = }',
    f'{ys = }',
    sep='\n',
)

Whereas with a copy.deepcopy

from copy import deepcopy

xs = [
    [1, 2, 3],
    [4, 5, 6, 7],
]
ys = deepcopy(xs)

xs[0].insert(0, 0)
xs.append([8, 9])

print(
    f'{xs = }',
    f'{ys = }',
    sep='\n',
)

Given the two changes made to xs, we can distinguish between:

(There is a necessary asymmetry here: we cannot observe only the shallow change but not the deeper change.)

from copy import copy, deepcopy

xs = [
    [1, 2, 3],
    [4, 5, 6, 7],
]

ys = {
    # i.    ii.
    (True, True):   xs,
    (True, False):  copy(xs),
    # (False, True):  ...,
    (False, False): deepcopy(xs),
}

xs[0].insert(0, 0) # i.
xs.append([8, 9])  # ii.

print(
    f'{xs = }',
    *ys.values(),
    sep='\n',
)

Clearly, we want a “snapshot” if we want to capture the state as of a certain point in time and not observe later updates (i.e., mutations.) We want a “live view” view if we do want to see later updates.

The .keys() on a dict (which used to be called .viewkeys() in Python 2,) is a live view of the keys of a dict. As a consequence, if we capture a reference to it, then subsequently mutate the dict, we will see that mutation when iterating over the reference we have captured.

d = {'abc': 123, 'def': 456, 'xyz': 789}

keys = d.keys() # “live view”

d['ghi'] = 999

for k in keys:
    print(f'{k = }')

However, if we wanted a snapshot, we may need to explicitly trigger a copy.

d = {'abc': 123, 'def': 456, 'xyz': 789}

keys = [*d.keys()] # “snapshot”

d['ghi'] = 999

for k in keys:
    print(f'{k = }')

Similarly, we can consider the different import styles to be an instance of “early”- vs “late”-binding, which is similar phraseology around the idea of “snapshots” vs “live views.”

from textwrap import dedent
from math import cos, sin, pi

print(
    f'before {sin(pi) = :>2.0f}',
    f'       {cos(pi) = :>2.0f}',
    sep='\n',
)

# don't “pollute” namespace
exec(dedent('''
    import math
    math.sin, math.cos = math.cos, math.sin
'''))

print(
    f'after  {sin(pi) = :>2.0f}',
    f'       {cos(pi) = :>2.0f}',
    sep='\n',
)

However…

from textwrap import dedent
import math

print(
    f'before {math.sin(math.pi) = :>2.0f}',
    f'       {math.cos(math.pi) = :>2.0f}',
    sep='\n',
)

# don't “pollute” namespace
exec(dedent('''
    import math
    math.sin, math.cos = math.cos, math.sin
'''))

print(
    f'after  {math.sin(math.pi) = :>2.0f}',
    f'       {math.cos(math.pi) = :>2.0f}',
    sep='\n',
)

In fact, we can think of dotted __getattr__ lookup as being a key mechanism in getting a “live view” of some data.

from dataclasses import dataclass

@dataclass
class T:
    x : int

obj = T(123)
x = obj.x

print(f'before {obj.x = } · {x = }')
obj.x = 456
print(f'after  {obj.x = } · {x = }')

There are many subtle design distinctions we can make in our code that differ in terms of whether they provide us with a “live view“ or a “snapshot.”

These four variations have some subtle distinctions:

class Base:
    def __repr__(self):
        return f'{type(self).__name__}({self.values!r})'

# i.
class T1(Base):
    def __init__(self, values):
        self.values = values

# ii.
class T2(Base):
    def __init__(self, values):
        self.values = [*values]

# iii.
class T3(Base):
    def __init__(self, values):
        self.values = values.copy()

# iv.
class T4(Base):
    def __init__(self, *values):
        self.values = values

values = [1, 2, 3]
obj = T1(values)
values.clear()
print(f'i.   {values = } · {obj = }')

values = [1, 2, 3]
obj = T2(values)
values.clear()
print(f'ii.  {values = } · {obj = }')

values = [1, 2, 3]
obj = T3(values)
values.clear()
print(f'iii. {values = } · {obj = }')

values = [1, 2, 3]
obj = T4(*values)
values.clear()
print(f'iv.  {values = } · {obj = }')

However, this is not the only distinction between the above!

from collections import deque

class Base:
    def __repr__(self):
        return f'{type(self).__name__}({self.values!r})'

# i.
class T1(Base):
    def __init__(self, values):
        self.values = values

# ii.
class T2(Base):
    def __init__(self, values):
        self.values = [*values]

# iii.
class T3(Base):
    def __init__(self, values):
        self.values = values.copy()

# iv.
class T4(Base):
    def __init__(self, *values):
        self.values = values

values = deque([1, 2, 3], maxlen=3)
obj = T1(values)
values.append(4)
print(f'i.   {values = } · {obj = }')

values = deque([1, 2, 3], maxlen=3)
obj = T2(values)
values.append(4)
print(f'ii.  {values = } · {obj = }')

values = deque([1, 2, 3], maxlen=3)
obj = T3(values)
values.append(4)
print(f'iii. {values = } · {obj = }')

values = deque([1, 2, 3], maxlen=3)
obj = T4(*values)
values.append(4)
print(f'iv.  {values = } · {obj = }')

We can think of “inheritance” as a mechanism for “live updates.”

class Base:
    pass

class Derived(Base):
    pass

Base.f = lambda _: ...

print(
    f'{Derived.f = }',
    sep='\n',
)

In fact, if we extend the idea of changes to changes across versions of our code, we can see a material distinction between “inheritance,” “composition,” and alternate approaches.

class Base:
    def f(self):
        pass

# statically added (e.g., in a later version)
Base.g = lambda _: ...

class Derived(Base):
    pass

class Composed:
    def __init__(self, base : Base = None):
        self.base = Base() if base is None else base
    def f(self, *args, **kwargs):
        return self.base.f(*args, **kwargs)

class Constructed:
    locals().update(Base.__dict__)

    ### alternatively…
    # f = Base.f
    # g = Base.g

# dynamically added (e.g., via monkey-patching)
Base.h = lambda _: ...

print(
    ' Derived '.center(40, '\N{box drawings light horizontal}'),
    f'{hasattr(Derived,     "f") = }',
    f'{hasattr(Derived,     "g") = }',
    f'{hasattr(Derived,     "h") = }',
    ' Composed '.center(40, '\N{box drawings light horizontal}'),
    f'{hasattr(Composed,    "f") = }',
    f'{hasattr(Composed,    "g") = }',
    f'{hasattr(Composed,    "h") = }',
    ' Constructed '.center(40, '\N{box drawings light horizontal}'),
    f'{hasattr(Constructed, "f") = }',
    f'{hasattr(Constructed, "g") = }',
    f'{hasattr(Constructed, "h") = }',
    sep='\n',
)

Consider the collections.ChainMap, which allows us to isolate writes to the top “level” of a multi-level structure. This mechanism is closely related to how both scopes and how __getattr__ and __setattr__ work.

base = {
    'abc': 123
}

snapshot = {
    **base,
    'def': 456,
}

# base['abc'] *= 2
# snapshot['abc'] *= 10

print(
    f'{base     = }',
    f'{snapshot = }',
    sep='\n',
)
from collections import ChainMap

base = {
    'abc': 123
}

layer = {
    'def': 456,
}

live = ChainMap(layer, base)

# live['abc'] *= 10
base['abc'] *= 2

print(
    f'{base = }',
    f'{live = } · {live["abc"] = }',
    sep='\n',
)

It is important that we be aware of “shadowing” where something that may appear to be a “live view” may become a “snapshot.”

Recall the subtle distinction between clearing a list via the following approaches. If we have captured a “live view” of xs with ys, then we must mutate xs with .clear() or del xs[:] for the clearing to be visible on ys.

# i.
xs = ys = [1, 2, 3]
xs = []
print(f'{xs = } · {ys = }')

# ii.
xs = ys = [1, 2, 3]
xs.clear()
print(f'{xs = } · {ys = }')

# iii.
xs = ys = [1, 2, 3]
del xs[:]
print(f'{xs = } · {ys = }')

Similarly, manipulating sys.path requires that we manipulate the actual sys.path. A name binding of path = … in the module scope doesn’t change the actual sys.path.

from tempfile import TemporaryDirectory
from pathlib import Path

with TemporaryDirectory() as d:
    d = Path(d)
    with open(d / '_module.py', mode='w') as f:
        pass

    from sys import path
    path.append(f'{d}') # works!
    from sys import path
    path.insert(0, f'{d}') # works!

    from sys import path
    path = path + [f'{d}'] # does not work!
    from sys import path
    path = [f'{d}'] +path # does not work!

    import sys
    sys.path.append(f'{d}') # works!
    import sys
    sys.path.insert(0, f'{d}') # works!

    import sys
    sys.path = sys.path + [f'{d}'] # works!
    # what about [*sys.path, f'{d}']… ?
    import sys
    sys.path = [f'{d}'] + sys.path # works!

“Shadowing” is how we can describe what happens when we create a “shadow” (“snapshot (copy)”) of some data at some higher level of a scoped lookup. This can easily happen in our OO hierarchies if we are not careful.

class Base:
    x = []

class Derived(Base):
    pass

# Base.x.append(1)
# Derived.x.append(2)
# Base.x = [1, 2, 3, 4]
Derived.x = [1, 2, 3, 4, 5, 6]

print(
    f'{Base.x                  = }',
    f'{Derived.x               = }',
    f'{Base.__dict__.keys()    = }',
    f'{Derived.__dict__.keys() = }',
    sep='\n',
)

But… what if the value is immutable? If the value is immutable, then we have to be particularly careful to update it at the right level!

class Base:
    x = 123

class Derived(Base):
    pass

Derived.x = 789
Base.x = 456
# del Derived.x

print(
    f'{Base.x                  = }',
    f'{Derived.x               = }',
    f'{Base.__dict__.keys()    = }',
    f'{Derived.__dict__.keys() = }',
    sep='\n',
)

So why does this matter…?

What is the difference between mutable and immutable data… and how can I use this to improve my code?

print("Let's take a look!")

Obviously, mutable is data that we can change and immutable data is data that we cannot change. However, an important qualifier is whether we can change data in place.

s = 'abc'

print(f'before {s = } {id(s) = :#_x}')
s = s.upper()
print(f'after  {s = } {id(s) = :#_x}')

xs = [1, 2, 3]

print(f'before {xs = } {id(xs) = :#_x}')
xs.append(4)
print(f'after  {xs = } {id(xs) = :#_x}')

In both cases, the values changed, but only for xs (a mutable list) did the value change in place. If we captured a reference to the list in another name, we would be able to observe this change in two places.

s0 = s1 = 'abc'
xs0 = xs1 = [1, 2, 3]

print(
    f'before {s0 = } · {xs0 = }',
    f'       {s1 = } · {xs1 = }',
    sep='\n',
)

s0 = s0.upper()
xs0.append(4)

print(
    f'after  {s0 = } · {xs0 = }',
    f'       {s1 = } · {xs1 = }',
    sep='\n',
)

We can litigate the mechanisms used to enforce mutability, and there are many choices. However, while the exact mechanism may have some performance or some narrow correctness consequences, it is largely irrelevant to our purposes. (Recall that the “real world” appears to be fundamentally mutable.)

t = 'abc', 123
# t[0] = ...

class T:
    def __init__(self, x):
        self._x = x
    @property
    def x(self):
        return self._x

obj = T(123)
# obj.x = ...
obj._x = ...

Mutability allows us to have “action at a distance”: a change in one part of the code can change some other, non-local part of the code.

from threading import Thread
from time import sleep

class T:
    def __init__(self, values):
        self.values = values.copy()

    def __call__(self):
        while True:
            sleep(1)
            self.values.append(sum(self.values))

values = [1, 2, 3]
Thread(target=T(values)).start()

for _ in range(3):
    print(f'{values = }')
    sleep(1)

This can readily lead to code that is hard to understand using only local information.

One way to avoid this is to aggressively make copies any time we pass data around. However, we will have to be careful to make “deep copies.”

from threading import Thread
from time import sleep

class T:
    def __init__(self, values):
        self.values = values.copy()

    def __call__(self):
        while True:
            sleep(1)
            self.values[-1].append(sum(self.values[-1]))

values = [1, 2, 3, [4]]
Thread(target=T(values)).start()

for _ in range(3):
    print(f'{values = }')
    sleep(1)

Note that just as there is a distinction between a “deep” and a “shallow” copy, we can make a distinction between a “shallowly” and “deeply” immutable structure.

t = 'abc', [0, 1, 2]

print(f'before {t = }')
t[-1].append(3)
print(f'after  {t = }')

Alternatively, we could design around immutable data structures, using mechanisms such as a collections.namedtuple or dataclasses.dataclass. This can help us ensure that we do not inadvertantly mutate data non-locally. Of course, we will still have to be careful if these structures are only “shallowly” immutable.

from collections import namedtuple
from dataclasses import dataclass

@dataclass(frozen=True)
class T:
    value : int
obj = T(123)

T = namedtuple('T', 'value')
obj = T(123)

When we want to change our data, we will use mechanisms such as ._replace or dataclasses.replace() to replace and copy the entities as a whole.

from collections import namedtuple
from dataclasses import dataclass, replace

@dataclass(frozen=True)
class T:
    value : int
obj0 = obj1 = T(123)
obj2 = replace(obj0, value=obj0.value * 10)
print(f'{obj0 = } · {obj1 = } · {obj2 = }')

T = namedtuple('T', 'value')
obj0 = obj1 = T(123)
obj2 = obj0._replace(value=obj0.value * 10)
print(f'{obj0 = } · {obj1 = } · {obj2 = }')

Note that we can keep references to the parts of the data that did not change, and we can rely on the Python garbage collector to keep those references alive only as long as they are needed. As a consequence, we may not necessarily see significantly increased memory usage from these copies.

We can use other tricks, like a collections.ChainMap, to reduce the amount of copied information (though at the loss of functionality, such as the ability to del an entry.)

from dataclasses import dataclass, replace, field
from collections import ChainMap
from random import Random
from string import ascii_lowercase

@dataclass(frozen=True)
class T:
    values : ChainMap[dict[str, int]] = field(default_factory=ChainMap)
    def __call__(self, *, random_state=None):
        rnd = random_state if random_state is not None else Random()
        new_entries = {
            ''.join(rnd.choices(ascii_lowercase, k=4)): rnd.randint(-100, +100)
            for _ in range(10)
        }
        return replace(self, values=ChainMap(new_entries, self.values))
    def __getitem__(self, key):
        return self.values[key]

rnd = Random(0)
obj = T()
for _ in range(3):
    obj = obj(random_state=rnd)

print(
    f'{obj = }',
    f'{obj["fudo"] = }',
    sep='\n{}\n'.format('\N{box drawings light horizontal}' * 40),
)

However, some very useful parts of Python are inherently mutable. For example, a generator or generator coroutine cannot be copied—at most, we can tee them, and that may not even necessarily work or be meanignful. (Of course, for many generators and generator coroutines, mutations may not be particularly problematic.)

Additionally, with a strictly immutable design, we have to be very clear about how the parts of our code share state. If we do not design two parts of our code to share state upfront, we may later discover that it is very disruptive to thread that state through later.

from dataclasses import dataclass
from functools import wraps
from itertools import count
from threading import Thread
from time import sleep
from typing import Iterator

@dataclass
class T:
    it : Iterator
    def __call__(self):
        while True:
            next(self.it)
            # self.it.send(True)
            sleep(1)

@lambda coro: wraps(coro)(lambda *a, **kw: [ci := coro(*a, **kw), next(ci)][0])
def resettable_count(start=0):
    while True:
        for state in count():
            if (reset := (yield start + state)):
                break
            # from inspect import currentframe, getouterframes
            # print(f'{getouterframes(currentframe())[1].lineno = }')

rc = resettable_count(start=100)
print(f'{rc.send(True) = }')
print(f'{next(rc)      = }')
Thread(target=(obj := T(rc))).start()
print(f'{next(rc)      = }')
print(f'{next(rc)      = }')
print(f'{rc.send(True) = }')
print(f'{next(rc)      = }')

How can I use this to improve my code?

What is the difference between immutability and hashability… and how does this affect my design?

print("Let's take a look!")

We know that the keys of a dict and the elements of a set must be hashable.

# hashable → ok ✓
d = {'a': ..., 'b': ..., 'c': ...}
s = {'a', 'b', 'c'}

# hashable → ok ✓
d = {('a', 'b', 'c'): ...}
s = {('a', 'b', 'c')}

# not hashable → not ok ✗
# d = {['a', 'b', 'c']: ...)
# s = {['a', 'b', 'c']}

This leads to clumsiness such as not being able to model set[set]—“sets of sets.” Since set is not hashable, we cannot create a set that contains another set. However, we can create set[frozenset]—“sets of frozensets.”

# s = a            # not ok ✗
s = {frozenset({'a', 'b', 'c'})} # ok ✓

Similarly, the keys of a dict can be frozenset but not set.

d = {
    frozenset({'a', 'b', 'c'}): ...
}
d[frozenset({'a', 'b', 'c'})]

This may be useful in cases where we want a compound key that has unique components where order does not matter.

d = {
    'a,b,c': ...,
}
print(f"{d['a,b,c'] = }")
for k in d:
    k.split(',')

d = {
    ('a', 'b', 'c', 'd,e'): ...
}
print(f"{d['a', 'b', 'c', 'd,e'] = }")
for x, y, z, w in d:
    pass

d = {
    frozenset({'a', 'b', 'c'}): ...
}
print(f"{d[frozenset({'a', 'b', 'c'})] = }")
print(f"{d[frozenset({'c', 'b', 'a'})] = }")
for k in d:
    pass

Naïvely, we may assume that the difference between set and frozenset that leads to frozenset being hashable is immutability. We may naïvely (and incorrectly) assert that hashability implies immutability (and vice-versa.)

In fact, for many of the common built-in types, we will see that those that are immutable are hashable and those that are mutable are not hashable.

xs = [1, 2, 3]            # `list`        mutable; not hashable
s  = {1, 2, 3}            # `set`         mutable; not hashable
d  = {'a': 1}             # `dict`        mutable; not hashable
t  =  'a', 1              # `tuple`     immutable;  is hashable
s  = frozenset({1, 2, 3}) # `frozenset` immutable;  is hashable

x = 12, 3.4, 5+6j, False # `int`, `float`, `complex`, bool` immutable; is hashable
x = 'abc', b'def'        # `str`, `bytes`                   immutable; is hashable

x = range(10) # `range` immutable; is hashable

When we discover that slice is immutable not hashable, we may chalk this up to a corner-case driven by syntactical ambiguity. (In fact, in later versions slice becomes hashable.)

x = slice(None)

# x.start = 0 # AttributeError

hash(x) # TypeError

In may be ambiguous to __getitem__ with a slice, since you cannot distinguish between a single-item lookup where that item is a slice and a multi-item sliced-lookup. In the case of builtin dict (which does not support multi-item) lookup, this isn’t much of a problem; however, note that pandas.Series.loc supports both modalities.

d = {
    slice(None): ...
}

print(
    f'{d[slice(None)] = }',
    f'{d[:] = }',
    sep='\n',
)
from pandas import Series

s = Series({
    None:        ...,
    slice(None): ...,
})

print(
    f'{s.loc[None]}',
    f'{s.loc[slice(None)]}',
    f'{s.loc[:]}',
    sep='\n{}\n'.format('\N{box drawings light horizontal}' * 40),
)

Additionally, since we can implement __hash__, we can create mutable objects that are hashable. Again, we may assume that this does not materially affect the relationship between hashability and immutability.

from dataclasses import dataclass

@dataclass
class T:
    value : list[int]
    def __hash__(self):
        return hash(id(self))

obj = T([1, 2, 3])
print(f'{hash(obj) = :#_x}')
obj.value.append(4)
print(f'{hash(obj) = :#_x}')

However, if we consider more deeply the relationship between the two, we will discover the true nature of mutability and hashability.

Let’s consider two different ways to compute the hash of a piece of data:

class Base:
    def __init__(self, value):
        self.value = value
    def __repr__(self):
        return f'T({self.value!r})'

class T0(Base):
    def __hash__(self):
        return hash(id(self))

class T1(Base):
    def __hash__(self):
        return hash(self.value)

obj0, obj1 = T0(123), T1(123)

print(
    f'{hash(obj0) = }',
    f'{hash(obj1) = }',
    sep='\n',
)

Note that the hash when computed on identity changes across runs. In general, since the underlying mechanism of hash is an internal implementation detail, hash values may readily change across versions of Python.

from random import Random

rnd = Random(0)
x = (
    rnd.random(),
    rnd.random(),
)

print(
    f'x =       {     x }',
    f'hash(x) = {hash(x)}',
    sep='\n',
)

Assume that the value is immutable. If we were to compute the hash based on identity, then we might accidentally “lose” an object in a dict.

from dataclasses import dataclass

@dataclass(frozen=True)
class T:
    value : int
    def __hash__(self):
        return hash(id(self))

def f(d):
    d[obj := T(123)] = ...

d = {}
f(d)

# d[T(123)] # KeyError
for k in d:
    print(f'{k = }')

Therefore, we must hash immutable objects based on their value (on equality.) This is a matter of practicality.

Assume that the value is mutable. If we were to compute the hash based on equality, then we might accidentally “lose” an object in a dict.

from dataclasses import dataclass

@dataclass
class T:
    value : int
    def __hash__(self):
        return hash(self.value)

d = {}
d[obj := T(123)] = ...
obj.value = 456

# d[obj] # KeyError
# d[T(123)] # KeyError
# d[T(456)] # KeyError
# for k in d: print(f'{k = }')

This is a serious problem, because the hash that was used to determine the location of the entry in the dict is no longer accurate. There will be no way to retrieve the value via __getitem__!

Therefore, we must hash mutable objects based on their identity. However, we still have the problem of “losing” a value in the dict if we hash on identity.

Except the value is still in the dict; we simply cannot access it via __getitem__. We can still iterate over dict in both cases!

from dataclasses import dataclass

@dataclass
class T:
    value : int
    def __hash__(self):
        return hash(self.value)

d = {}
d[obj := T(123)] = ...
obj.value = 456

for k in d:
    print(f'{k = }')

We may then extend our understanding of this topic as follows: immutable objects must be hashed on value to support direct retrieval with equivalent objects; mutable objects must be hashed on identity and cannot support direct retrieval. In other words, hashability implies immutability if-and-only-if we need direct (or “non-intermediated” access.)

In fact, it is relatively common to see hashed mutable object. Consider the use of a networkx.DiGraph with a custom, rich node type. (Our Node class must be hashable, since the networkx.DiGraph is implemented as a “dict of dict of dicts.”)

from dataclasses import dataclass
from itertools import pairwise

from networkx import DiGraph

@dataclass
class Node:
    name  : str
    value : int = 0
    def __hash__(self):
        return hash(id(self))

nodes = [Node('a'), Node('b'), Node('c')]

g = DiGraph()
g.add_edges_from(pairwise(nodes))

for n in nodes:
    n.value += 1

for n in g.nodes:
    ...

Consider, however, that all access to the nodes of the networkx.DiGraph will likely be intermediated by calls such as .nodes that allow us to iterate over all of the nodes. We may also subclass networkx.DiGraph to allow direct access to nodes by name, further intermediating between the __getitem__ syntax and the hash-lookup mechanism.

from dataclasses import dataclass
from itertools import pairwise

from networkx import DiGraph

@dataclass
class Node:
    name  : str
    value : int = 0
    def __hash__(self):
        return hash(id(self))

nodes = [Node('a'), Node('b'), Node('c')]

class MyDiGraph(DiGraph):
    class by_name_meta(type):
        def __get__(self, instance, owner):
            return self(instance)

    @dataclass
    class by_name(metaclass=by_name_meta):
        instance : 'MyDiGraph'
        def __getitem__(self, key):
            nodes_by_name = {k.name: k for k in self.instance.nodes}
            return nodes_by_name[key]

g = MyDiGraph()
g.add_edges_from(pairwise(nodes))

for n in nodes:
    n.value += 1

print(f"{g.by_name['a'] = }")

Note that it is not a good idea to store object id(…)s in structures, since (in CPython) the memory addresses for these objects (and their corresponding id(…) values) may be reüsed. However, over the lifetime of an object, its id(…) will not change, so it is safe to store the id(…) if the lifetime of this storage is tied to the lifetime of the object. This will be the case with hashing an object on id(…) and putting it into a set or dict. While the __hash__(…) will be implicitly stored and is a dependent value of id(…), the lifetime of that storage will necessarily match to the lifetime of the object itself. Furthermore, the hash is used only to find the approximate location of the entry in the set or dict. Since hash values are finite (in CPython, constrained to the valid range of Py_hash_t values where Py_hash_t is typedefd to Py_ssize_t which is generally typedefd to ssize_t,) then by the “pigeonhole principle,” multiple distinct objects must share the same hash. Therefore, after performing any necessary additional “probing,” the set or dict will perform an == comparison to confirm that it has found the right item. This further ensures that computing __hash__ on id(…) won’t lead to stale entries.

It also means that objects which are not equivalent to themselves trivially get lost in dicts! For example, float('nan') can be the key of a dict, but you will not be able to later retrieve the value via direct __getitem__!

d = {
    float('nan'): ...,
}

d[float('nan')] # KeyError

How does this affect my design?

Session ①: How do I make it real?

What is the difference between an object and a closure or an object and a generator coroutine… and how does this affect usability?

print("Let's take a look!")

In Python, we have first-class functions: functions can be treated like any other data. For example, we can put functions into data structures.

def f(x, y):
    return x + y

def g(x, y):
    return x * y

for f in [f, g]:
    print(f'{f(123, 456) = :,}')
    break

for rv in [f(123, 456), g(123, 456)]:
    print(f'{rv = :,}')
    break

We can also dynamically define new functions at runtime.

def f():
    def g(x):
        return x ** 2
    return g

g = f()
print(f'{g(123) = :,}')

Often, we may use lambda syntax if those functions are short (consisting of a single expression with no use of the ‘statement grammar.’)

for f in [lambda x, y: x + y, lambda x, y: x * y]:
    print(f'{f(123, 456) = :,}')
def f():
    return lambda x: x ** 2

g = f()
print(f'{g(123) = :,}')

We know that these functions are being defined dynamically, because every definition creates a new, distinct version of that function.

def f():
    def g(x):
        return x ** 2
    return g

g0, g1 = f(), f()

print(
    f'{g0(123)      = :,}',
    f'{g1(123)      = :,}',
    f'{g0 is not g1 = }',
    sep='\n',
)

Note that, in Python, we cannot compare functions for equality.

def f(x, y):
    return x + y
def g(x, y):
    return x + y

print(f'{f == g = }')
print(f'{f.__name__ == g.__name__ = }')
print(f'{f.__code__.co_code == g.__code__.co_code = }')
# funcs = {*()}
# for _ in range(3):
#     def f(x, y):
#         return x + y
#     funcs.add(f)

# for _ in range(3):
#     def f(x, y):
#         return x + y

print(f'{funcs = }')

When we dynamically define functions in Python, a function object is created that consists of the function’s name (whether anonymous or not,) its docstring (if provided,) its default values, its code object, and any non-local, non-global data it needs to operate (its closure.)

def f(x, ys=[123, 456]):
    '''
        adds x to each value in ys
    '''
    return [x + y for y in ys]

from dis import dis
dis(f)
print(
    # f'{f.__name__         = }',
    # f'{f.__doc__          = }',
    # f'{f.__code__         = }',
    # f'{f.__code__.co_code = }',
    # f'{f.__defaults__     = }',
    # f'{f.__closure__      = }',
    sep='\n',
)

Note that the defaults are created when the function is defined; this is why when we have “mutable default arguments,” there is only one copy of these defaults that is reüsed across all invocations of the function.

def f(xs=[123, 456]):
    xs.append(len(xs) + 1)
    return xs

print(
    f'{f()            = }',
    f'{f()            = }',
    f'{f.__defaults__ = }',
    f'{f() is f()     = }',
    sep='\n',
)

When the bytecode for a function is created, the Python compiler performs scope-determination. In order to generate the bytecodes for local variable access (LOAD_FAST,) for global variable access (LOAD_GLOBAL,) or for closure variable access (LOAD_DEREF,) the Python parser statically determines the scope of any variables that are used.

from dis import dis

def f():
    return x
# dis(f)

def f(x):
    # import x
    return x
dis(f)

def f(x):
    def g():
        nonlocal x
        x += 1
        return x
    return g
dis(f(...))

For variables that are neither local nor global but instead in the “enclosing environment,” we generate a LOAD_DEREF bytecode for access and capture a reference to that variable.

def f(x):
    def g():
        return x
    return g

xs = [1, 2, 3]
g = f(xs)

print(
    f'{g.__closure__                  = }',
    f'{g.__closure__[0]               = }',
    f'{g.__closure__[0].cell_contents = }',
    f'{g.__closure__[0].cell_contents is xs = }',
    sep='\n',
)

It is not a coïncidence that this is reminiscent of object orientation in Python. Just as an object “encapsulates” some (hidden) state and some behaviour that operates on such state, a dynamically defined function “closes over” some state that it can operate on.

class T:
    def __init__(self, state):
        self.state = state
    def __call__(self):
        self.state += 1
        return self.state
    def __repr__(self):
        return f'T({self.state!r})'

obj = T(123)
print(
    # f'{obj   = }',
    f'{obj() = }',
    f'{obj() = }',
    f'{obj() = }',
    sep='\n',
)
def create_obj(state):
    def f():
        nonlocal state
        state += 1
        return state
    return f

obj = create_obj(123)
print(
    f'{obj   = }',
    f'{obj() = }',
    f'{obj() = }',
    f'{obj() = }',
    sep='\n',
)

In fact, we can see the correspondence quite clearly when we look at what sits underneath.

class T:
    def __init__(self, state):
        self.state = state
    def __call__(self):
        self.state += 1
        return self.state
    def __repr__(self):
        return f'T({self.state!r})'

def create_obj(state):
    def f():
        nonlocal state
        state += 1
        return state
    return f

obj0 = T(123)
obj1 = create_obj(123)

print(
    f'{obj0.__dict__                     = }',
    f'{obj1.__closure__                  = }',
    f"{obj0.__dict__['state']            = }",
    f'{obj1.__closure__[0].cell_contents = }',
    sep='\n',
)

This tells us that an object created with the class keyword and a dynamically defined function created with a closure are two ways to accomplish the same goal of encapsulation.

When we create an instance a generator coroutine, it maintains its local state in-between iterations.

def coro(state):
    while True:
        state += 1
        yield state

ci = coro(123)
print(
    f'{next(ci) = }',
    f'{next(ci) = }',
    f'{next(ci) = }',
    f'{next(ci) = }',
    f'{ci.gi_frame.f_locals = }',
    sep='\n',
)

Indeed, this appears to be yet another way to accomplish the same goal!

class T:
    def __init__(self, state):
        self.state = state
    def __call__(self):
        self.state += 1
        return self.state
    def __repr__(self):
        return f'T({self.state!r})'

def f(state):
    def g():
        nonlocal state
        state += 1
        return state
    return g

def coro(state):
    while True:
        state += 1
        yield state

obj0 = T(123)
obj1 = f(123)
obj2 = coro(123).__next__

print(
    # f'{obj0() = } {obj0() = } {obj0() = }',
    # f'{obj1() = } {obj1() = } {obj1() = }',
    # f'{obj2() = } {obj2() = } {obj2() = }',

    # f'{obj0.__dict__                            = }',
    # f'{obj1.__closure__                         = }',
    # f'{obj2.__self__.gi_frame.f_locals          = }',
    f"{obj0.__dict__['state']                   = }",
    f'{obj1.__closure__[0].cell_contents        = }',
    f"{obj2.__self__.gi_frame.f_locals['state'] = }",
    sep='\n',
)

Facing three ways to accomplish the same goal, which do we choose?

choose class

If it makes sense for someone to be able to dig around into the internal details of the object, then maybe we should choose class.

class T:
    def __init__(self, state):
        self.state = state
    def __call__(self):
        self.state += 1
        return self.state
    def __repr__(self):
        return f'T({self.state!r})'
    def __dir__(self):
        return ['state']

obj = T(123)
print(
    f'{obj      = }',
    f'{dir(obj) = }',
    sep='\n',
)

def f(state):
    def g():
        nonlocal state
        state += 1
        return state
    return g

obj = f(123)
print(
    f'{obj      = }',
    f'{dir(obj) = }',
    f'{obj.__closure__ = }',
    sep='\n',
)

If it makes sense for the object to support multiple named methods, then class is probably less clumsy.

class T:
    def __init__(self, state):
        self.state = state
    def inc(self):
        self.state += 1
        return self.state
    def dec(self):
        self.state -= 1
        return self.state
    def __repr__(self):
        return f'T({self.state!r})'

obj = T(123)

print(
    f'{dir(obj) = }',
    # f'{obj.inc() = }',
    # f'{obj.dec() = }',
    sep='\n',
)
from collections import namedtuple

def f(state):
    def inc():
        nonlocal state
        state += 1
        return state
    def dec():
        nonlocal state
        state -= 1
        return state
    # return inc, dec
    return namedtuple('T', 'inc dec')(inc, dec)

# obj = f(123)
# print(
#     # f'{dir(obj) = }',
#     f'{obj[0]() = }',
#     f'{obj[1]() = }',
#     sep='\n',
# )

inc, dec = f(123)
print(
    f'{inc() = }',
    f'{dec() = }',
    sep='\n',
)

# obj = f(123)
# print(
#     f'{obj.inc() = }',
#     f'{obj.dec() = }',
#     sep='\n',
# )

If we need to implement any other parts of the Python vocabulary, then we must write class (or use some boilerplate elimination tool like contextlib.contextmanager.)

class T:
    def __init__(self, state):
        self.state = state
    def __call__(self, value):
        self.state.append(value)
    def __len__(self):
        return len(self.state)
    def __getitem__(self, idx):
        return self.state[idx]
    def __repr__(self):
        return f'T({self.state!r})'

obj = T([1, 2, 3])
obj(4)

print(
    f'{len(obj) = }',
    f'{obj[0]   = }',
    sep='\n',
)

choose closure

If we want to “hide” data from our users to limit them in some antagonistic or coërcive way, we should not expect the closure to add anything but few easily circumventable steps.

def f(state):
    def g():
        nonlocal state
        state += 1
        return state
    return g

g = f(123)
g.__closure__[0].cell_contents = 456

print(
    f'{g() = }',
    f'{g.__closure__[0].cell_contents = }',
    sep='\n',
)

This is not too dissimilar from our guidance around @property.

class T:
    def __init__(self, x):
        self._x = x

    @property
    def x(self):
        return self._x

    def __repr__(self):
        return f'T({self._x})'

obj = T(123)
# obj.x = ...
obj._x = ...

No matter how deeply we try to hide some data, it’s only a few dirs away.

def f(x):
    class T:
        @property
        def x(self):
            return x

        def __repr__(self):
            return f'T({self._x})'
    return T()

obj = f(123)

print(
    f'{obj.x = }',
    f'{type(obj).x.fget.__closure__[0].cell_contents = }',
    sep='\n',
)

If we want to non-antagonistically reduce clutter or noise, we may choose to use a closure.

class T:
    def __init__(self, state):
        self.state = state
    def __call__(self):
        self.state += 1
        return self.state
    def __repr__(self):
        return f'T({self.state!r})'

def f(state):
    def g():
        nonlocal state
        state += 1
        return state
    return g

obj0 = T(123)
obj1 = f(123)

print(
    f'{obj0   = }',
    f'{obj1   = }',
    f'{obj0() = }',
    f'{obj1() = }',
    sep='\n',
)

choose generator coroutine

If we have a heterogeneous computation, we generally do not want a generator coroutine if the computation will be triggered manually.

from dataclasses import dataclass

@dataclass
class State:
    a : int = None
    b : int = None
    c : int = None

class T:
    def __init__(self, state : State = None):
        self.state = state if state is not None else State()
    def f(self, value):
        self.state.a = value
        return self.state
    def g(self, value):
        self.state.b = self.state.a + value
        return self.state
    def h(self, value):
        self.state.c = self.state.b + value
        return self.state

obj = T()
print(
    f'{obj.f(123) = }',
    f'{obj.g(456) = }',
    f'{obj.h(789) = }',
    sep='\n',
)
from dataclasses import dataclass

@dataclass
class State:
    a : int = None
    b : int = None
    c : int = None

def coro(state : State = None):
    state = state if state is not None else State()
    state.a = yield ...
    state.b = (yield state) + state.a
    state.c = (yield state) + state.b
    yield state

obj = coro(); next(obj)
print(
    f'{obj.send(123) = }',
    ...
    f'{obj.send(456) = }', # ???
    ...
    ...
    ...
    f'{obj.send(789) = }',
    sep='\n',
)

If we have a single, homogeneous decomposition of a computation, we may find a generator coroutine is less conceptual overhead than a class-style object.

def coro():
    while True:
        _ = yield

ci = coro()
print(
    # f'{dir(ci) = }',
    f'{next(ci)              = }',
    f'{ci.send(...)          = }',
    # f'{ci.throw(Exception()) = }',
    # f'{ci.close()            = }',
    sep='\n',
)

In fact, we may find that pumped generator coroutines with __call__-interface unification give us an extremely simple API we can present our users.

from functools import wraps

def f(x):
    pass

def g():
    def f(x):
        pass
    return f

class T:
    def __call__(self, x):
        pass

@lambda coro: wraps(coro)(lambda *a, **kw: [ci := coro(*a, **kw), next(ci), ci.send][-1])
def coro():
    while True:
        _ = yield

How does this affect usability?

What is the difference between inheritance and composition… and which should I pick?

print("Let's take a look!")

If we have a base class, we can inherit from the base class in a derived class. If the base class later adds methods, the derived class automatically sees those methods. If the derived class wants to customise the behaviour, it can do so and use super() to refer to the base class’s implementation.

class Base:
    def f(self):
        pass

class Derived(Base):
    # pass
    def f(self):
        return super().f()

Base.g = lambda self: None

obj = Derived()
obj.f()
obj.g()

Note that when we inherit, we inherit both methods and the metaclass.

class BaseMeta(type):
    def __new__(cls, name, bases, body):
        print(f'BaseMeta.__new__({cls!r}, {name!r}, {bases!r}, {body!r})')
        return super().__new__(cls, name, bases, body)

class Base(metaclass=BaseMeta):
    pass

class Derived(Base):
    pass

assert type(Base) is type(Derived)

If we have a class, we can pass an instance of that class to serve as a constituent of another class. This is called composition. With composition, if the constituent later adds methods, we must explicitly expose those. If the composed clas wants to custom the behaviour, it can do so as it sees fit.

class Component:
    def f(self):
        pass

class Composed:
    def __init__(self, component=None):
        self.component = component if component is not None else Component()
    def f(self):
        return self.component.f()
    def g(self):
        return self.component.g()

Component.g = lambda self: None

obj = Composed()
obj.f()
# obj.g()

In the case of inheritance, we also get a default implementation of the __isinstance__ protocol.

class Base: pass
class Derived(Base): pass

obj = Derived()

assert isinstance(obj, Derived)
assert isinstance(obj, Base)
assert issubclass(Derived, Base)

However, we can implement this protocol as we see fit.

class TMeta(type):
    def __instancecheck__(self, instance):
        return True
class T(metaclass=TMeta):
    pass

print(
    f'{isinstance(123, T) = }',
)

Note that there is an inherent directionality to this implementation.

class TMeta(type):
    def __instancecheck__(self, instance):
        return True
class T(metaclass=TMeta):
    pass

obj = T()

print(
    f'{isinstance(obj, int) = }',
)

In the case of isinstance(…, int), we must subclass from int.

class T(int):
    pass

obj = T()

print(
    f'{isinstance(obj, int) = }',
)

In some (very limited) cases, we can patch __class__, but this probably won’t work in general.

class T0:
    pass

class T1:
    pass

obj = T1()

assert not isinstance(obj, T0)

class T0Meta(T0.__class__):
    def __instancecheck__(self, instance):
        return True

T0.__class__ = T0Meta

assert isinstance(obj, T0)

Note that __class__ patching is much easier on regular class instances.

class A:
    def f(self):
        pass

class B:
    def g(self):
        pass

obj = A()
obj.f()
obj.__class__ = B
obj.g()

There are other options than just inheritance or composition, such as “object construction.”

def f(self): pass
def g(self): pass

methods = {
    'f': f,
    'g': g,
}

class A:
    # locals().update(methods)
    f = f
    g = g

class B:
    locals().update(methods)

obj0, obj1 = A(), B()

obj0.f(); obj0.g()
obj1.f(); obj1.g()

In fact, the “object construction” approach looks very similar to how we might use a class decorator.

def dec(cls):
    cls.f = lambda self: None
    cls.g = lambda self: None
    return cls

@dec
class T:
    pass

obj = T()
obj.f()
obj.g()

We may prefer the use of a class decorator over inheritance in cases where we want to have a very “light touch.”

In fact, given that inheritance is often about creating a categorisation, and categorisation schemes are intimately related to use-case, we may often prefer to avoid inheritance in a library if the library is not the centre of attention for a given system.

from networkx import DiGraph

class MyDiGraph(DiGraph):
    pass

class MiDiGraph:
    def __init__(self, g):
        self.g = g
    def nodes(self):
        ...
    def edges(self):
        ...
class T:
    def f(self):
        pass
    def g(self):
        pass

class Composed:
    def __init__(self, component):
        self.component = component
    def f(self):
        ...
# Cow, Pig, Chicken

# vet
class Mammal: pass
class Cow(Mammal): pass
class Pig(Mammal): pass

# sommelier
class WhiteMeat: pass
class Chicken(WhiteMeat): pass
class Pig(WhiteMeat): pass

Which should I pick?

What do I need to know about object orientation in Python? What is an example convention? What is an example rule? … and does this make things more understandable?

print("Let's take a look!")

The Python object model is very “mechanical,” and our understanding of many of the protocol methods may be little more than a reflection of this mechanical understanding.

For example, when instances are created, we call __new__ prior to instance creation and __init__ afterwards. This immediately gives us an indication for when we may want to implement __new__ vs __init__.

class TMeta(type):
    def __call__(cls):
        obj = cls.__new__(cls) # ...
        cls.__init__(obj)      # ...
        return obj

class T(metaclass=TMeta):
    def __new__(cls):
        return super().__new__(cls)
    def __init__(self):
        pass

obj = T()
print(f'{obj = }')

For other protocol methods, we may need to dig a bit deeper to discover the underlying meaning. For example, __repr__ is the human readable representation of an object, and __str__ is documented as the “informal” printable representation.

However, when we consider that __str__ is triggered by str(…), we can derive an alternate meaning for __str__: it is the data represented in the form of an str.

from dataclasses import dataclass
from enum import Enum

class Op(Enum):
    Eq = '='
    Lt = '<'
    Gt = '>'
    ...
    def __str__(self):
        return self.value

@dataclass
class Where:
    column : str
    op : Op
    value : ...
    def __str__(self):
        return f'{self.column} {self.op} {self.value}'

@dataclass
class Select:
    exprs : list[str]
    table : str
    where : Where | None

    def __str__(self):
        where = f' {self.where}' if self.where else ''
        return f'select {", ".join(self.exprs)} from {self.table}{where}'

stmt = Select(
    ['name', 'value'],
    'data',
    Where('value', Op.Gt, 0),
)

from pathlib import Path

d = Path('/tmp')

print(
    # f'{str(d)       = }',
    # f'{str(123.456) = }',
    # f'{repr(stmt) = }',
    # f'{str(stmt)  = }',
    sep='\n',
)

Some protocol methods are misleading. For example, it may appear that __hash__ means “a pigeonholed identifier,” but its meaning is far narrower.

from dataclasses import dataclass

@dataclass
class T:
    value : ...
    def __hash__(self):
        return hash(self.value)

obj = T((1, 2, 3))
print(
    f'hash(obj) = {hash(obj)}',
)

For some protocol methods, we need to pay close attention to the implementation rules. For example, __len__ means the “size” of an object, where that concept of size must be the “integer, non-negative” size.

class T:
    def __len__(self):
        # return -2
        # return 2.5
        return 2

obj = T()
print(f'{len(obj) = }')

Sometimes, there is disagreement about the implicit rules of implementation.

python -m pip install numpy
class T:
    def __bool__(self):
        raise ValueError(...)
        # return ...

bool(T())
from enum import Enum
from numpy import array

class Directions(Enum):
    North = array([+1,  0])
    South = array([-1,  0])
    East  = array([ 0, +1])
    West  = array([ 0, -1])

print(
    array([0, 0]) + Directions.North * 2
)

In fact, even PEP-8 makes this mistake:

xs = [1, 2, 3]

if len(xs) > 0: pass
if not len(xs): pass
if xs: pass # preferred
from numpy import array
xs = array([1, 2, 3])

if len(xs) > 0: pass
if not len(xs): pass
if xs.size > 0: pass
# if xs: pass # ValueError

As we can see __bool__ should return True or False but, in the case of a numpy.ndarray or pandas.Series, instead raises a ValueError.

python -m pip install numpy pandas
from numpy import array
from pandas import Series

xs = array([1, 2, 3])
s = Series([1, 2, 3])

print(
    # f'{bool(xs) = }',
    # f'{bool(s)  = }',
    sep='\n',
)

Of course, in the PEP-8 example, this isn’t altogether that meaningful of a problem.

from numpy import array

xs = [1, 2, 3]
xs.append(4)
xs.clear()

if not xs:
    pass
for x in xs:
    pass

xs = array([1, 2, 3])
xs = xs[xs > 10]

if not xs:
    pass

Note that the entire reason we are choosing to interact with the Python “vocabulary” is to be able to write code that is obvious to the reader.

...
...
...
...
# try:
v = obj[k]
# except LookupError:
#     pass
...
...
...
...

This means that when we implement data model methods, we should implement them only where their meaning is unambiguous. This suggests that the implementation of these methods should be to support a singular, unique, or privileged operation.

from pandas import Series, date_range

s = Series([10, 200, 3_000], index=date_range('2020-01-01', periods=3))

print(
    s[2],  # label
    s[:'2020-01-01'], # positional
    sep='\n',
)
from pandas import Series

s = Series([10, 200, 3_000], index=[0, 1, 2])

print(
    s.loc[0],
    s.loc[:1],
    s.iloc[0],
    s.iloc[:1],
    sep='\n',
)

Similarly, consider len on a pandas.DataFrame.

from pandas import DataFrame, date_range
from numpy.random import default_rng

rng = default_rng(0)

df = DataFrame(
    index=(idx := date_range('2020-01-01', periods=3)),
    data={
        'a': rng.normal(size=len(idx)),
        'b': rng.integers(-10, +10, size=len(idx)),
    },
)

for x in df.columns:
    print(f'{x = }')

print(
    df,
    # f'{len(df) = }',
    # f'{len(df.index) = }',
    # f'{len(df.columns) = }',
    # f'{df.size  = }',
    # f'{df.shape = }',
    sep='\n{}\n'.format('\N{box drawings light horizontal}' * 20),
)

Where we break this intuition, we can see how it can impede understandability.

For example, when reviewing code, what transformations are safe? If we rely on assumptions of how __getitem__ typically works, a transformation such as the below should be fine:

from dataclasses import dataclass, field
from random import Random

@dataclass
class T(dict):
    random_state : Random = field(default_factory=Random)

    def __missing__(self, k):
        return self.random_state.random()

def f(x, y): pass
def g(x): pass

obj = T(random_state=Random(0))
k = ...
# f(obj[k], g(obj[k]))
v = obj[k]
f(v, g(v))

However, consider __dict__.__or__ which breaks a mathematical assumption of commutativity. Does this impede understandability?

d0 = {'a': 1,  'b': 2,  'c': 3, 'd': 4}
d1 = {                  'c': 30, 'd': 40}

print(
    f'{d0 | d1 = }',
    f'{d1 | d0 = }',
    sep='\n',
)

Of course…

s0 = {True}
s1 = {1}

print(
    f'{s0 | s1 = }',
    f'{s1 | s0 = }',
    f'{s0 == s1 = }',
    sep='\n',
)

… and also…

s0 = 'abc'
s1 = 'def'

print(
    f'{s0 + s1 = }',
    f'{s1 + s0 = }',
    sep='\n',
)

Does this make things more understandable?

What is boilerplate elimination, is boilerplate really that bad… and how can eliminating it help me work faster?

print("Let's take a look!")

In order to actually make a class-style object in Python useful, we need to write an lot of “boilerplate.”

class T:
    def __init__(self, value):
        self._value = value
    @property
    def value(self):
        return self._value
    def __hash__(self):
        return hash(self.value)
    def __eq__(self, other):
        return self.value == other.value
    def __repr__(self):
        return f'T({self.value!r})'

obj0, obj1 = T(123), T(123)
print(
    f'{obj0.value     = }',
    f'{obj1.value     = }',
    f'{obj0 == obj1   = }',
    f'{({obj0, obj1}) = }',
    sep='\n',
)

We can reduce this boilerplate in a couple of ways. One way is the use of a collections.namedtuple:

from collections import namedtuple

T = namedtuple('T', 'value')

obj0, obj1 = T(123), T(123)
print(
    f'{obj0.value     = }',
    f'{obj1.value     = }',
    f'{obj0 == obj1   = }',
    f'{({obj0, obj1}) = }',
    sep='\n',
)

Another option is a dataclasses.dataclass:

from dataclasses import dataclass

@dataclass(frozen=True)
class T:
    value : int

obj0, obj1 = T(123), T(123)
print(
    f'{obj0.value     = }',
    f'{obj1.value     = }',
    f'{obj0 == obj1   = }',
    f'{({obj0, obj1}) = }',
    sep='\n',
)

However, beyond just the reduction in lines-of-code, consider the “escalation pathway” we are provided with these:

entities = [
    ('abc', 123),
    ('def', 456),
    ('xyz', 789),
]

...
...
...

for ent in entities:
    print(f'{ent[0].upper() = }', f'{ent[1] + 1 = }', sep='\N{middle dot}'.center(3))

for name, value in entities:
    print(f'{name.upper() = }', f'{value + 1 = }', sep='\N{middle dot}'.center(3))

Using a list[tuple] is a very simple and quick way to start our programme, but as our code grows, the poor ergonomics show themselves quickly.

It is at this point we may “graduate” the code to use a collections.namedtuple.

We may first create the new collections.namedtuple type:

from collections import namedtuple

Entity = namedtuple('Entity', 'name value')

Then we may apply it to our existing data:

from collections import namedtuple

Entity = namedtuple('Entity', 'name value')

entities = [
    Entity('abc', 123),
    Entity('def', 456),
    Entity('xyz', 789),
]

for ent in entities:
    print(f'{ent[0].upper() = }', f'{ent[1] + 1 = }', sep='\N{middle dot}'.center(3))

for name, value in entities:
    print(f'{name.upper() = }', f'{value + 1 = }', sep='\N{middle dot}'.center(3))

Then we may rewrite any code that uses unpacking or indexing syntax to use __getattr__ (named-lookup) syntax:

from collections import namedtuple

Entity = namedtuple('Entity', 'name value')

entities = [
    Entity('abc', 123),
    Entity('def', 456),
    Entity('xyz', 789),
]

# for ent in entities:
#     print(f'{ent[0].upper() = }', f'{ent[1] + 1 = }', sep='\N{middle dot}'.center(3))

# for name, value in entities:
#     print(f'{name.upper() = }', f'{value + 1 = }', sep='\N{middle dot}'.center(3))

for ent in entities:
    print(f'{ent.name.upper() = }', f'{ent.value + 1 = }', sep='\N{middle dot}'.center(3))

This allows to add fields:

from collections import namedtuple

Entity = namedtuple('Entity', 'name value flag')

entities = [
    Entity('abc', 123, True),
    Entity('def', 456, False),
    Entity('xyz', 789, True),
]

for ent in entities:
    print(f'{ent.name.upper() = }', f'{ent.value + 1 = }', sep='\N{middle dot}'.center(3))

We may subclass the collections.namedtuple to support validation and defaults:

from collections import namedtuple

class Entity(namedtuple('EntityBase', 'name value flag')):
    def __new__(cls, name, value, flag=False):
        if value < 0:
            raise ValueError('value should not be negative')
        return super().__new__(cls, name, value, flag)

entities = [
    Entity('abc', 123),
    Entity('def', 456),
    Entity('xyz', 789, flag=True),
]

for ent in entities:
    print(
        f'{ent.name.upper() = }',
        f'{ent.value + 1 = }',
        f'{ent.flag = }',
        sep='\N{middle dot}'.center(3),
    )

We may further raise this into a dataclasses.dataclass if we need to add instance methods, to add additional protocols, to customise protocol implementation, or to support mutability.

from dataclasses import dataclass

@dataclass
class Entity:
    name  : str
    value : int
    flag  : bool = False
    def __post_init__(self):
        if self.value < 0:
            raise ValueError('value should not be negative')
    def __call__(self):
        self.value += 1
    def __eq__(self, other):
        return self.name == other.name and self.value == other.value

entities = [
    Entity('abc', 123),
    Entity('def', 456),
    Entity('xyz', 789, flag=True),
]

for ent in entities:
    ent()
    print(
        f'{ent.name.upper() = }',
        f'{ent.value + 1 = }',
        f'{ent.flag = }',
        sep='\N{middle dot}'.center(3),
    )

Finally, we may rewrite as a class-style object with all of the boilerplate.

class Entity:
    def __init__(self, name, value, flag=False):
        if value < 0:
            raise ValueError('value should not be negative')
        self.name, self.value, self.flag = name, value, flag
    def __call__(self):
        self.value += 1
    def __eq__(self, other):
        return self.name == other.name and self.value == other.value
    def __repr__(self):
        return f'Entity({self.name!r}, {self.value!r}, {self.flag!r})'

entities = [
    Entity('abc', 123),
    Entity('def', 456),
    Entity('xyz', 789, flag=True),
]

for ent in entities:
    ent()
    print(
        f'{ent.name.upper() = }',
        f'{ent.value + 1 = }',
        f'{ent.flag = }',
        sep='\N{middle dot}'.center(3),
    )

There are other boilerplate-elimination tools in the Python standard library.

For example, enum.Enum allows us to create enumerated types easily.

from enum import Enum

Choice = Enum('Choice', 'A B C')

print(
    f'{Choice.A = }',
    f'{Choice.B = }',
    f'{Choice.C = }',
    sep='\n',
)

functools.total_ordering allows us to implement comparison operators without having to write them all out (assuming the object supports mathematical properties associated with a total ordering.)

from dataclasses import dataclass
from functools import total_ordering

@total_ordering
@dataclass
class T:
    value : int
    def __eq__(self, other):
        return self.value == other.value
    def __lt__(self, other):
        return self.value < other.value
    # def __gt__(self, other):
    #     return self.value > other.value
    # def __ne__(self, other):
    #     return self.value != other.value
    # def __lte__(self, other):
    #     return self.value <= other.value
    # def __gte__(self, other):
    #     return self.value >= other.value

A contextlib.contextmanager allows us to situate a generator into the contextmanager __enter__/__exit__ protocol.

class Context:
    def __enter__(self):
        print(f'T.__enter__')
    def __exit__(self, exc_value, exc_type, traceback):
        print(f'T.__exit__')

with Context():
    print('block')
from contextlib import contextmanager

@contextmanager
def context():
    print(f'__enter__')
    try: yield
    finally: pass
    print(f'__exit__')

with context():
    print('block')

How can eliminating it help me work faster?

Session ②: How do I do less work?

When would I actually write a decorator or a higher-order decorator… and why?

print("Let's take a look!")

Python function definitions are executed at runtime.

def f():
    pass

print(f'{f = }')

This is why we can conditionally define functions or define funcitons in other functions. In Python, we can treat functions like any other data.

from random import Random
from inspect import signature

rnd = Random(0)

if rnd.choice([True, False]):
    def f(x, y):
        return x + y
else:
    def f(x):
        return -x

print(
    f'{f            = }',
    f'{signature(f) = }',
    sep='\n'
)
from types import FunctionType

def f(): pass

f = FunctionType(
    f.__code__,
    f.__globals__,
    name=f.__name__,
    argdefs=f.__defaults__,
    closure=f.__closure__,
)

print(
    f'{f              = }',
    f'{f.__code__     = }',
    f'{f.__globals__  = }',
    f'{f.__defaults__ = }',
    f'{f.__closure__  = }',
    sep='\n'
)
def f(x): return x + 1
def g(x): return x * 2
def h(x): return x ** 3

for func in [f, g, h]:
    print(f'{func(123) = :,}')

for rv in [f(123), g(123), h(123)]:
    print(f'{rv = :,}')

FUNCS = {
    'eff':  f,
    'gee':  g,
    'aich': h,
}
for name in 'eff eff gee aich'.split():
    print(f'{FUNCS[name](123) = :,}')

When we define a function in Python, it “closes” over its defining environment. In other words, if the function accesses data that is neither in the global scope nor local scope (but in the enclosing function’s scope,) we create a means to access this data. Note that this does not mean that we capture a reference to the data; the closure is its own indirection.

from dis import dis

def f(y):
    def g(z):
        return x + y + z
    return g

x = 1
g = f(y=20)

print(
    f'{g(z=300) = }',
    sep='\n',
)
# dis(g)
def f(y):
    def g(z):
        return x + y + z
    return g

x = 1
g = f(y=20)

print(
    f'{g.__closure__ = }',
    f'{g.__closure__[0].cell_contents = }',
    sep='\n',
)
def f(x):
    def g0():
        return x
    def g1():
        return x
    return g0, g1

g0, g1 = f(123)

print(
    f'{g0.__closure__ = }',
    f'{g1.__closure__ = }',
    f'{g0.__closure__[0].cell_contents = }',
    f'{g1.__closure__[0].cell_contents = }',
    sep='\n',
)
from math import prod

def f(xs):
    def g0():
        xs.append(sum(xs))
        return xs
    def g1():
        xs.append(prod(xs))
        return xs
    return g0, g1

g0, g1 = f([1, 2, 3])

print(
    f'{g0() = }',
    f'{g1() = }',
    f'{g0() = }',
    f'{g1() = }',
    sep='\n',
)
from math import prod

def f(x):
    def g0():
        nonlocal x
        x += 2
        return x
    def g1():
        nonlocal x
        x *= 2
        return x
    return g0, g1

g0, g1 = f(123)

print(
    f'{g0() = }',
    # f'{g0.__closure__[0], } · {g1.__closure__[0] = }',
    f'{g1() = }',
    # f'{g0.__closure__[0], } · {g1.__closure__[0] = }',
    f'{g0() = }',
    # f'{g0.__closure__[0], } · {g1.__closure__[0] = }',
    f'{g1() = }',
    # f'{g0.__closure__[0], } · {g1.__closure__[0] = }',
    sep='\n',
)

This will be important later.

Recall that functions are a means by which we can eliminate “update anomalies.” They represent a “single source of truth” for how to perform an operation.

We want to distinguish between “coïncidental” and “intentional” repetition. In the case of “intentional” repetition, we want to write a function; in the case of “coïncidental” repetition, we may not want to write a function.

# library.py
from random import Random
from statistics import mean, pstdev
from string import ascii_lowercase
from itertools import groupby

def generate_data(*, random_state=None):
    rnd = Random() if random_state is None else random_state
    return {
        ''.join(rnd.choices(ascii_lowercase, k=2)): rnd.randint(-100, +100)
        for _ in range(100)
    }

def normalise_data(data):
    μ,σ = mean(data.values()), pstdev(data.values())
    return {k: (v - μ) / σ for k, v in data.items()}

def process_data(data):
    return groupby(sorted(data.items(), key=(key := lambda k_v: k_v[0][0])), key=key)

def report(results):
    for k, g in results:
        g = dict(g)
        print(f'{k:<3} {min(g.values()):>5.2f} ~ {max(g.values()):>5.2f}')

# script0.py
if __name__ == '__main__':
    rnd = Random(0)
    raw_data = generate_data(random_state=rnd)
    data = normalise_data(raw_data)
    results = process_data(data)
    report(results)

# script1.py
if __name__ == '__main__':
    rnd = Random(0)
    raw_data = generate_data(random_state=rnd)
    data = normalise_data(raw_data)
    results = process_data(data)
    report(results)
def do_report():
    rnd = Random(0)
    raw_data = generate_data(random_state=rnd)
    data = normalise_data(raw_data)
    results = process_data(data)
    report(results)

# script0.py
if __name__ == '__main__':
    do_report()

# script1.py
if __name__ == '__main__':
    do_report()
def do_report(normalise=True):
    rnd = Random(0)
    raw_data = generate_data(random_state=rnd)
    if normalise:
        data = normalise_data(raw_data)
    results = process_data(data)
    report(results)

# script0.py
if __name__ == '__main__':
    do_report(normalise=False)

# script1.py
if __name__ == '__main__':
    do_report()
def report(results, prec=2):
    for k, g in results:
        g = dict(g)
        print(f'{k:<3} {min(g.values()):>{2+1+prec}.{prec}f} ~ {max(g.values()):>{2+1+prec}.{prec}f}')

def do_report(normalise=True, digits_prec=None):
    rnd = Random(0)
    raw_data = generate_data(random_state=rnd)
    if normalise:
        data = normalise_data(raw_data)
    results = process_data(data)
    if digits_prec is not None:
        report(results, prec=digits_prec)
    else:
        report(results)

# script0.py
if __name__ == '__main__':
    do_report(normalise=False)

# script1.py
if __name__ == '__main__':
    do_report(digits_prec=5)

If the functions provided by our analytical libraries represent the base-most, atomic units of our work, we could describe the common progression of effort as starting with manual composition of these units. Where patterns arise and intentional repetition is found, our primary work may move to managing this composition: writing classes and functions. Our work may continue to grow more abstract and we may discover patterns and intentional repetition across the writing of functions.

f()
g()
f()
h()
def func0():
    f()
    g()
    f()

def func1():
    f(g())

func0()
func1()

Mechanically, the @ syntax in Python is simple shorthand.

@dec
def f():
    pass

# … means…

def f():
    pass
f = dec(f)

This is key to understanding all of the mechanics behind decorators.

The simplest example of decorators is a system in which we need to instrument some code.

from random import Random
from time import sleep

def fast(x, y, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.1, .2))
    return x + y

def slow(x, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.25, .5))
    return x**2

if __name__ == '__main__':
    print(f'{fast(123, 456) = :,}')
    print(f'{slow(123)      = :,}')
    print(f'{slow(456)      = :,}')
    print(f'{fast(456, 789) = :,}')
from random import Random
from time import sleep, perf_counter

def fast(x, y, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.1, .2))
    return x + y

def slow(x, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.25, .5))
    return x**2

if __name__ == '__main__':
    before = perf_counter()
    print(f'{fast(123, 456) = :,}')
    after = perf_counter()
    print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
    before = perf_counter()
    print(f'{slow(123)      = :,}')
    after = perf_counter()
    print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
    before = perf_counter()
    print(f'{slow(456)      = :,}')
    print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
    before = perf_counter()
    print(f'{fast(456, 789) = :,}')
    after = perf_counter()
    print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
from random import Random
from time import sleep, perf_counter

def fast(x, y, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.1, .2))
    return x + y

def slow(x, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.25, .5))
    return x**2

if __name__ == '__main__':
    if __debug__: before = perf_counter()
    print(f'{fast(123, 456) = :,}')
    if __debug__:
        after = perf_counter()
        print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
    if __debug__: before = perf_counter()
    print(f'{slow(123)      = :,}')
    if __debug__:
        after = perf_counter()
        print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
    if __debug__: before = perf_counter()
    print(f'{slow(456)      = :,}')
    if __debug__:
        after = perf_counter()
        print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
    if __debug__: before = perf_counter()
    print(f'{fast(456, 789) = :,}')
    if __debug__:
        after = perf_counter()
        print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
from random import Random
from time import sleep, perf_counter

def fast(x, y, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.1, .2))
    return x + y

def slow(x, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.25, .5))
    return x**2

if __debug__:
    def bef():
        global before
        before = perf_counter()
    def aft():
        after = perf_counter()
        print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
else:
    def bef(): pass
    def aft(): pass

if __name__ == '__main__':
    bef()
    print(f'{fast(123, 456) = :,}')
    aft()
    bef()
    print(f'{slow(123)      = :,}')
    aft()
    bef()
    print(f'{slow(456)      = :,}')
    aft()
    bef()
    print(f'{fast(456, 789) = :,}')
    aft()
from random import Random
from time import sleep, perf_counter

def fast(x, y, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.1, .2))
    return x + y

def slow(x, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.25, .5))
    return x**2

if __debug__:
    def bef():
        global before
        before = perf_counter()
    def aft():
        after = perf_counter()
        print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
else:
    def bef(): pass
    def aft(): pass

if __name__ == '__main__':
    bef()
    print(f'{fast(123, 456) = :,}')
    aft()
    bef()
    print(f'{slow(123)      = :,}')
    aft()
    bef()
    print(f'{slow(456)      = :,}')
    aft()
    bef()
    print(f'{fast(456, 789) = :,}')
    aft()
from random import Random
from time import sleep, perf_counter

def fast(x, y, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.1, .2))
    return x + y

def slow(x, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.25, .5))
    return x**2

if __debug__:
    def timed(func, *args, **kwargs):
        before = perf_counter()
        rv = func(*args, **kwargs)
        after = perf_counter()
        print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
        return rv
else:
    def timed(func, *args, **kwargs):
        return func(*args, **kwargs)

if __name__ == '__main__':
    print(f'{timed(fast, 123, 456) = :,}')
    print(f'{timed(slow, 123)      = :,}')
    print(f'{timed(slow, 456)      = :,}')
    print(f'{timed(fast, 456, 789) = :,}')
from random import Random
from time import sleep, perf_counter

def fast(x, y, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.1, .2))
    return x + y

def slow(x, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.25, .5))
    return x**2

if __debug__:
    def timed(func):
        def inner(*args, **kwargs):
            before = perf_counter()
            rv = func(*args, **kwargs)
            after = perf_counter()
            print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
            return rv
        return inner
else:
    def timed(func):
        return func

if __name__ == '__main__':
    print(f'{timed(fast)(123, 456) = :,}')
    print(f'{timed(slow)(123)      = :,}')
    print(f'{timed(slow)(456)      = :,}')
    print(f'{timed(fast)(456, 789) = :,}')
from random import Random
from time import sleep, perf_counter

def fast(x, y, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.1, .2))
    return x + y

def slow(x, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.25, .5))
    return x**2

if __debug__:
    def timed(func):
        def inner(*args, **kwargs):
            before = perf_counter()
            rv = func(*args, **kwargs)
            after = perf_counter()
            print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
            return rv
        return inner
    fast, slow = timed(fast), timed(slow)

if __name__ == '__main__':
    print(f'{fast(123, 456) = :,}')
    print(f'{slow(123)      = :,}')
    print(f'{slow(456)      = :,}')
    print(f'{fast(456, 789) = :,}')
from random import Random
from time import sleep, perf_counter

def timed(func):
    def inner(*args, **kwargs):
        before = perf_counter()
        rv = func(*args, **kwargs)
        after = perf_counter()
        print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
        return rv
    return inner

def fast(x, y, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.1, .2))
    return x + y
if __debug__: fast = timed(fast)

def slow(x, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.25, .5))
    return x**2
if __debug__: slow = timed(slow)

if __name__ == '__main__':
    print(f'{fast(123, 456) = :,}')
    print(f'{slow(123)      = :,}')
    print(f'{slow(456)      = :,}')
    print(f'{fast(456, 789) = :,}')
from random import Random
from time import sleep, perf_counter

def timed(func):
    def inner(*args, **kwargs):
        before = perf_counter()
        rv = func(*args, **kwargs)
        after = perf_counter()
        print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
        return rv
    return inner

@timed if __debug__ else lambda f: f
def fast(x, y, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.1, .2))
    return x + y

@timed if __debug__ else lambda f: f
def slow(x, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.25, .5))
    return x**2

if __name__ == '__main__':
    print(f'{fast(123, 456) = :,}')
    print(f'{slow(123)      = :,}')
    print(f'{slow(456)      = :,}')
    print(f'{fast(456, 789) = :,}')
from random import Random
from time import sleep, perf_counter

def timed(func):
    def inner(*args, **kwargs):
        before = perf_counter()
        rv = func(*args, **kwargs)
        after = perf_counter()
        print(f'\N{mathematical bold capital delta}t: {after - before:.2f}s')
        return rv
    inner.orig = func
    return inner

@timed if __debug__ else lambda f: f
def fast(x, y, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.1, .2))
    return x + y

@timed if __debug__ else lambda f: f
def slow(x, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.25, .5))
    return x**2

if __name__ == '__main__':
    print(f'{fast(123, 456) = :,}')
    print(f'{slow(123)      = :,}')
    print(f'{slow.orig(456) = :,}')
    print(f'{fast(456, 789) = :,}')
    # help(fast)
from random import Random
from time import sleep, perf_counter
from functools import wraps, cached_property
from collections import deque, namedtuple
from datetime import datetime

class Call(namedtuple('CallBase', 'timestamp before after func args kwargs')):
    @cached_property
    def elapsed(self):
        return self.after - self.before

def timed(telemetry):
    def dec(func):
        @wraps(func)
        def inner(*args, **kwargs):
            before = perf_counter()
            rv = func(*args, **kwargs)
            after = perf_counter()
            telemetry.append(
                Call(datetime.now(), before, after, func, args, kwargs)
            )
            return rv
        inner.orig = func
        return inner
    return dec

telemetry = []

@timed(telemetry) if __debug__ else lambda f: f
def fast(x, y, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.1, .2))
    return x + y

@timed(telemetry) if __debug__ else lambda f: f
def slow(x, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.25, .5))
    return x**2

if __name__ == '__main__':
    print(f'{fast(123, 456) = :,}')
    print(f'{slow(123)      = :,}')
    print(f'{slow.orig(456) = :,}')
    print(f'{fast(456, 789) = :,}')

    for x in telemetry:
        print(f'{x.func.__name__} \N{mathematical bold capital delta}t: {x.elapsed:.2f}s')
from random import Random
from time import sleep, perf_counter
from functools import wraps, cached_property
from collections import deque, namedtuple
from datetime import datetime
from contextvars import ContextVar
from contextlib import contextmanager, nullcontext
from inspect import currentframe, getouterframes

def instrumented(func):
    if not __debug__:
        return func
    @wraps(func)
    def inner(*args, **kwargs):
        ctx = inner.context.get(nullcontext)
        frame = getouterframes(currentframe())[1]
        with ctx(frame, func, args, kwargs) if ctx is not nullcontext else ctx():
            return func(*args, **kwargs)
    @contextmanager
    def with_measurer(measurer):
        token = inner.context.set(measurer)
        try: yield
        finally: pass
        inner.context.reset(token)
    inner.with_measurer = with_measurer
    inner.context = ContextVar('context')
    return inner

@instrumented
def fast(x, y, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.1, .2))
    return x + y

@instrumented
def slow(x, *, random_state=None):
    rnd = Random() if random_state is None else random_state
    sleep(rnd.uniform(.25, .5))
    return x**2

if __name__ == '__main__':
    class Call(namedtuple('CallBase', 'lineno timestamp before after func args kwargs')):
        @cached_property
        def elapsed(self):
            return self.after - self.before
        def __str__(self):
            if self.args and self.kwargs:
                params = (
                    f'{", ".join(f"{x!r}" for x in self.args)}, '
                    f'{", ".join(f"{k}={v!r}" for k, v in self.kwargs.items())}'
                )
            elif self.args:
                params = f'{", ".join(f"{x!r}" for x in self.args)}'
            elif self.kwargs:
                params = f'{", ".join(f"{k}={v!r}" for k, v in self.kwargs.items())}'
            else:
                params = ''
            return f'{self.func.__name__}({params})'

        telemetry = []

        @classmethod
        @contextmanager
        def timed(cls, frame, func, args, kwargs):
            before = perf_counter()
            try: yield
            finally: pass
            after = perf_counter()
            cls.telemetry.append(
                cls(frame.lineno, datetime.now(), before, after, func, args, kwargs)
            )

    with fast.with_measurer(Call.timed), slow.with_measurer(Call.timed):
        print(f'{fast(123, 456) = :,}')
        print(f'{slow(123)      = :,}')
    print(f'{slow(456)      = :,}')
    print(f'{fast(456, 789) = :,}')

    for x in Call.telemetry:
        print(f'@line {x.lineno}: {x!s:<20} \N{mathematical bold capital delta}t {x.elapsed:.2f}s')

When would I actually write a decorator or a higher-order decorator… and why?

When would I actually write a class decorator… and is this really better than other approaches?

print("Let's take a look!")

A def-decorator performs the following syntactical transformation:

def dec(f): pass

@dec
def f(): pass

def f(): pass
f = dec(f)

Note that the common description of a decorator as a “function that takes a function and returns a function” is imprecise.

class T:
    def __init__(self, g):
        self.g = g

@T
def g():
    yield

print(f'{g = }')

A class-decorator performs the following syntactical transformation:

def dec(f): pass

@dec
class cls: pass

class cls: pass
cls = dec(cls)

Just as Python functions are defined and created at runtime, Python classes are also defined and created at runtime.

from random import Random
from inspect import signature

rnd = Random(0)
if rnd.choice([True, False]):
    class T:
        def f(self, x, y):
            return x * y
else:
    class T:
        def f(self, x):
            return x ** 2

print(f'{signature(T.f) = }')

Unlike the body of a function, for which bytecode is generated but not executed at function definite time, the body of a class is executed at class definition time.

from random import Random
from inspect import signature

rnd = Random(0)
class T:
    if rnd.choice([True, False]):
        def f(self, x, y):
            return x * y
    else:
        def f(self, x):
            return x ** 2

print(f'{signature(T.f) = }')

A Python class can have attributes added at runtime.

Unlike in Python 2, Python 3 doesn’t distinguish between bound and unbound methods. Instead, all Python functions support the __get__ descriptor protocol. The __get__ method is invoked when an attribute is looked up via the __getattr__/getattr protocol and is found on a class. When a function’s __get__ is invoked, it returns a method which binds the instance argument. Therefore, all Python 3 functions are unbound methods, and, therefore, it is relatively easy to add new methods to Python classes.

class T:
    pass

T.f = lambda self: ...

obj = T()
print(f'{obj.f() = }')

A class decorator receives the fully-constructed class and can therefore add, remove, or inspect attributes on that class. Note that a class decorator cannot distinguish the code that was statically written in the body of the class from code that was added to the class afterwards.

def dec(cls):
    print(f'{cls = }')
    return cls

@dec
class A:
    pass

@dec
class B(A):
    pass

Just as a def-decorator is used anytime we need to eliminate the risk of update anomaly associated with the definition of a function, a class decorator is about eliminating the risk of update anomaly associated with the definition of a class.

A class decorator could be used instead of inheritance to add functionality to a class without disrupting the inheritance hierarchy while potentially introducing modalities.

class A:
    def f(self):
        pass

class B(A):
    def g(self):
        pass

obj = B()
print(
    f'{obj.f() = }',
    f'{obj.g() = }',
    sep='\n',
)
def dec(cls):
    cls.f = lambda _: None
    return cls

@dec
class A:
    pass

@dec
class B(A):
    def g(self):
        pass

obj = B()
print(
    f'{obj.f() = }',
    f'{obj.g() = }',
    sep='\n',
)
def add_func(*funcs):
    def dec(cls):
        for name in funcs:
            setattr(cls, name, lambda _: None)
        return cls
    return dec

@add_func('f', 'g')
class A:
    pass

@add_func('f', 'h')
class B(A):
    def g(self):
        pass

obj = B()
print(
    f'{obj.f() = }',
    f'{obj.g() = }',
    f'{obj.h() = }',
    sep='\n',
)

A class-decorator can check that a class has certain contents (though it won’t be able to determine precisely how those contents were provided.)

def dec(cls):
    if not hasattr(cls, 'f'):
        raise TypeError('must define f')
    return cls

class A:
    def f(self):
        pass

@dec
class B(A):
    def f(self):
        pass

When would I actually write a class decorator… and is this really better than other approaches?

When would I actually write a metaclass… and is there a better way?

print("Let's take a look!")

A Python class allows us to implement the Python “vocabulary” by writing special __-methods.

class T:
    def __getitem__(self, key):
        pass
    def __len__(self):
        return 0

obj = T()
print(
    f'{obj[...] = }',
    f'{len(obj) = }',
    sep='\n',
)

These special __-methods are not looked up via the __getattr__ protocol. In CPython, they are looked up by direct C-struct access on type(…).

If we wanted to implement the Python vocabulary on a class object, we would need to implement these methods on whatever type(cls) is. This entity is called the “metaclass.”

A Python class is responsible for constructing its instances. A Python metaclass is responsible for constructing its instances, which happen to be Python classes.

from logging import getLogger, basicConfig, INFO

logger = getLogger(__name__)
basicConfig(level=INFO)

class TMeta(type):
    def __getitem__(self, key):
        logger.info('TMeta.__getitem__(%r, %r)', self, key)
        pass
    def __len__(self):
        logger.info('TMeta.__len__(%r)', self)
        return 0

class T(metaclass=TMeta):
    def __getitem__(self, key):
        logger.info('T.__getitem__(%r, %r)', self, key)
        pass
    def __len__(self):
        logger.info('T.__len__(%r)', self)
        return 0

obj = T()

obj[...]
len(obj)

T[...]
len(T)

This is not altogether that useful, in practice.

from logging import getLogger, basicConfig, INFO

logger = getLogger(__name__)
basicConfig(level=INFO)

class TMeta(type):
    def __call__(self, *args, **kwargs):
        obj = self.__new__(self, *args, **kwargs)
        obj.__init__(*args, **kwargs)
        obj.__post_init__()
        return obj

class T(metaclass=TMeta):
    def __new__(cls, value):
        return super().__new__(cls)
    def __init__(self, value):
        self.value = value
    def __post_init__(self):
        self.value = abs(self.value)
    def __repr__(self):
        return f'T({self.value!r})'

obj = T(-123)
print(f'{obj = }')

Metaclasses are inherited down the class hierarchy. This is why, historically, they were used for enforcing constraints from base types to derived types.

Consider that Derived needs to constrain Base in order to operate correctly. However, this can be done trivially in app.py without touching any code in library.py.

from inspect import signature

# library.py
class Base:
    def helper(self):
        ...

# app.py
print(
    f'{signature(Base.helper) = }',
)
class Derived(Base):
    def func(self):
        return self.helper()

But if Base needs to constrain Derived, then this cannot be done so easily without putting code in app.py. Instead, we need to find some mechanism that operates at a higher level.

# library.py
class Base:
    def func(self):
        return self.implementation()

# app.py
class Derived(Base):
    def implementation(self):
        ...

The highest level mechanism we can employ to add a hook into the class construction process is builtins.__build_class__.

from functools import wraps
import builtins

@lambda f: setattr(builtins, f.__name__, f(getattr(builtins, f.__name__)))
def __build_class__(orig):
    @wraps(orig)
    def inner(func, name, *bases, **kwargs):
        print(f'{func, name, bases, kwargs = }')
        return orig(func, name, *bases)
        # return orig(func, name, *bases, **kwargs)
    return inner

class Base: pass
class Derived(Base): pass
class MoreDerived(Base, x=...): pass

What is the function that is passed to __build_class__?

from functools import wraps
import builtins

@lambda f: setattr(builtins, f.__name__, f(getattr(builtins, f.__name__)))
def __build_class__(orig):
    @wraps(orig)
    def inner(func, name, *bases, **kwargs):
        print(f'{func, name, bases, kwargs = }')
        print(f'{func() = }')
        # exec(func.__code__, globals(), ns := {})
        # print(f'{ns = }')
        return orig(func, name, *bases, **kwargs)
    return inner

class T:
    def f(self):
        pass

There’s not much we can do with __build_class__ other than debugging or instrumentation.

from functools import wraps
import builtins

@lambda f: setattr(builtins, f.__name__, f(getattr(builtins, f.__name__)))
def __build_class__(orig):
    @wraps(orig)
    def inner(func, name, *bases, **kwargs):
        print(f'{func, name, bases, kwargs = }')
        return orig(func, name, *bases, **kwargs)
    return inner

import json
# import pandas, matplotlib

Since a metaclass is inherited down the class hierarchy, it gives us a narrower hook-point. Additionally, the metaclass gets the partially constructed class, which is, in practice, more useful to work with.

class BaseMeta(type):
    def __new__(cls, name, bases, body, **kwargs):
        print(f'{cls, name, bases, body, kwargs = }')
        # return super().__new__(cls, name, bases, body, **kwargs)
        return super().__new__(cls, name, bases, body)

class Base(metaclass=BaseMeta):
    pass

class Derived(Base, x=...):
    pass

We can use this to enforce constraints.

# library.py
from inspect import signature

class BaseMeta(type):
    def __new__(cls, name, bases, body, **kwargs):
        rv = super().__new__(cls, name, bases, body, **kwargs)
        if rv.__mro__[-2::-1].index(rv):
            rv.check()
        return rv

class Base(metaclass=BaseMeta):
    @classmethod
    def check(cls):
        if not hasattr(cls, 'implementation'):
            raise TypeError('must implement method')
        if 'x' not in signature(cls.implementation).parameters:
            raise TypeError('method must take parameter named x')
    def func(self):
        return self.implementation()

# app.py
class Derived(Base):
    def implementation(self, x):
        ...

However, metaclasses tend to be tricky to write correctly, especially if you need to compose them.

# library.py
from inspect import signature

class BaseMeta(type):
    pass

class Base(metaclass=BaseMeta):
    pass

# app.py
class DerivedMeta(type):
    pass

class Derived(Base, metaclass=DerivedMeta):
    pass
# library.py
from inspect import signature

class Base0Meta(type):
    def __new__(cls, name, bases, body, **kwargs):
        print(f'Base0Meta.__new__({cls!r}, {name!r}, {bases!r}, {body!r}, **{kwargs!r})')
        return super().__new__(cls, name, bases, body, **kwargs)

class Base0(metaclass=Base0Meta):
    pass

class Base1Meta(type):
    def __new__(cls, name, bases, body, **kwargs):
        print(f'Base1Meta.__new__({cls!r}, {name!r}, {bases!r}, {body!r}, **{kwargs!r})')
        return super().__new__(cls, name, bases, body, **kwargs)

class Base1(metaclass=Base0Meta):
    pass

# app.py
class Derived(Base0, Base1):
    pass

class Derived(Base1, Base0):
    pass
# library.py
from inspect import signature

class Base0Meta(type):
    def __new__(cls, name, bases, body, **kwargs):
        print(f'Base0Meta.__new__({cls!r}, {name!r}, {bases!r}, {body!r}, **{kwargs!r})')
        return super().__new__(cls, name, bases, body, **kwargs)

class Base0(metaclass=Base0Meta):
    pass

class Base1Meta(type):
    def __new__(cls, name, bases, body, **kwargs):
        print(f'Base1Meta.__new__({cls!r}, {name!r}, {bases!r}, {body!r}, **{kwargs!r})')
        return super().__new__(cls, name, bases, body, **kwargs)

class Base1(metaclass=Base0Meta):
    pass

# app.py
class Derived(Base0):
    pass

class MoreDerived(Base1, Derived):
    pass

In Python 3.6, the __init_subclass__ mechanism was introduced. Like a metaclass, it is inherited down the class hierarchy. Unlike the metaclass, it gets the fully constructed class. __init_subclass__ doesn’t have the same compositional difficulties that metaclasses have.

class Base:
    def __init_subclass__(cls, **kwargs):
        print(f'{cls, kwargs = }')

class Derived(Base, x=...):
    pass
# library.py
from inspect import signature

class Base:
    def __init_subclass__(cls):
        if not hasattr(cls, 'implementation'):
            raise TypeError('must implement method')
        if 'x' not in signature(cls.implementation).parameters:
            raise TypeError('method must take parameter named x')
    def func(self):
        return self.implementation()

# app.py
class Derived(Base):
    def implementation(self, x):
        ...
class Base0:
    def __init_subclass__(cls):
        print(f'Base0.__init_subclass__({cls!r})')
        super().__init_subclass__()

class Base1:
    def __init_subclass__(cls):
        print(f'Base1.__init_subclass__({cls!r})')
        super().__init_subclass__()

class Derived0(Base0, Base1):
    pass

class Derived1(Base1, Base0):
    pass
class Base0:
    def __init_subclass__(cls):
        print(f'Base0.__init_subclass__({cls!r})')
        super().__init_subclass__()

class Base1:
    def __init_subclass__(cls):
        print(f'Base1.__init_subclass__({cls!r})')
        super().__init_subclass__()

class Derived(Base0):
    pass
# print(f'{Derived.__mro__ = }')

class MoreDerived0(Derived, Base1):
    pass
# print(f'{MoreDerived0.__mro__ = }')

class MoreDerived1(Base1, Derived):
    pass
# print(f'{MoreDerived1.__mro__ = }')

However, an __init_subclass__ requires that we interact with the inheritance hierarchy. But with a class-decorator, we do not. In the case of a class-decorator, we also get the fully-constructed class, but we don’t get any keyword arguments.

class Base:
    def __init_subclass__(cls, **kwargs):
        print(f'Base.__init_subclass__({cls!r}, **{kwargs!r})')

class Derived(Base, x=...):
    pass

def dec(cls):
    print(f'dec({cls!r})')
    return cls

@dec
class T:
    pass

However, we can write a higher-order class-decorator to introduce modalities.

class Base:
    def __init_subclass__(cls, **kwargs):
        print(f'Base.__init_subclass__({cls!r}, **{kwargs!r})')

class Derived(Base, x=...):
    pass

def d(**kwargs):
    def dec(cls):
        print(f'dec({cls!r}, **{kwargs!r})')
        return cls
    return dec

@d(x=...)
class T:
    pass

When would I actually write a metaclass… and is there a better way?

When would I actually use eval or exec… and should I feel as guilty when I do it?

print("Let's take a look!")

In Python, the builtin eval and exec functions allow us to execute code encoded as an str. eval allows us to evaluate a single expression and returns its result; exec allows us to execute a suite of statements but does not return anything.

from textwrap import dedent

code = '1 + 1'
print(f'{eval(code) = }')

code = dedent('''
    x = 1
    y = 1
    x + y
''').strip()
print(f'{exec(code) = }')

With both exec and eval, you can pass in a namespace; with exec, you can capture results by capturing name binding in this namespace

from textwrap import dedent

code = '1 + 1 + z'
print(f'{eval(code, globals(), ns := {"z": 123}) = }')
print(f'{ns = }')

code = dedent('''
    x = 1
    y = 1
    w = x + y + z
''').strip()
print(f'{exec(code, globals(), ns := {"z": 123}) = }')
print(f'{ns = }')

Obviously, eval('1 + 1') is inferior to evaluating 1 + 1. We don’t get syntax highlighting. We don’t get any static mechanisms provided by the interpreter (such as constant folding.)

However, by encoding the executed or evaluated code as a string, that means we can use string manipulation to create code snippets. Obviously, in most cases, this is inferior to other programmatic or meta-programmatic techniques.

x, y, z = 123, 456, 789

var0, var1 = 'x', 'y'
code = f'{var0} + {var1}'
res = eval(code, globals(), locals())
print(f'{res = }')

if ...:
    res = x + y
    print(f'{res = }')

var0, var1 = 'x', 'y'
res = globals()[var0] + globals()[var1]
print(f'{res = }')

But there are also clearly metaprogramming situations where string manipulation may be superior.

from dataclasses import dataclass
from datetime import datetime
from typing import Any

@dataclass
class Propose:
    ident : int
    timestamp : datetime
    payload : Any

@dataclass
class Accept:
    ident : int
    timestamp : datetime

@dataclass
class Reject:
    ident : int
    timestamp : datetime

@dataclass
class Commit:
    ident : int
    timestamp : datetime
    payload : Any

print(
    f'{Propose(..., ..., ...) = }',
    f'{Accept(..., ...)       = }',
    f'{Reject(..., ...)       = }',
    f'{Commit(..., ..., ...)  = }',
    sep='\n',
)
from csv import reader
from textwrap import dedent
from dataclasses import dataclass

message_definitions = dedent('''
    name,*fields
    Propose,ident,timestamp,payload
    Acc ept,ident,timestmap
    Reject,ident,timestmap
    Commit,ident,timestamp,payload
''').strip()
messages = {}
for lineno, (name, *fields) in enumerate(reader(message_definitions.splitlines()), start=1):
    if lineno == 1: continue
    messages[name] = name, fields

class MessageBase: pass
for name, fields in messages.values():
    globals()[name] = dataclass(type(name, (MessageBase,), {
        '__annotations__': dict.fromkeys(fields)
    }))

print(
    globals(),
    f'{Propose(..., ..., ...) = }',
    # f'{Accept(..., ...)       = }',
    f'{Reject(..., ...)       = }',
    f'{Commit(..., ..., ...)  = }',
    sep='\n',
)
from csv import reader
from textwrap import dedent, indent
from dataclasses import dataclass

message_definitions = dedent('''
    name,*fields
    Propose,ident,timestamp,payload
    Accept,ident,timestmap
    Reject,ident,timestmap
    Commit,ident,timestamp,payload
''').strip()
messages = {}
for lineno, (name, *fields) in enumerate(reader(message_definitions.splitlines()), start=1):
    if lineno == 1: continue
    messages[name] = name, fields

class MessageBase: pass
for name, fields in messages.values():
    code = dedent(f'''
        @dataclass
        class {name}(MessageBase):
        
    ''').strip().format(fields=indent('\n'.join(f"{f} : ..." for f in fields), ' ' * 4))
    print(code)
    exec(code, globals(), locals())

print(
    f'{Propose(..., ..., ...) = }',
    f'{Accept(..., ...)       = }',
    f'{Reject(..., ...)       = }',
    f'{Commit(..., ..., ...)  = }',
    sep='\n',
)

There is nothing inherently wrong about eval or exec (in most execution environments.)

from tempfile import TemporaryDirectory
from sys import path
from pathlib import Path
from textwrap import dedent

with TemporaryDirectory() as d:
    d = Path(d)
    code = dedent('''
        class T: pass
    ''').strip()
    with open(d / 'module.py', mode='wt') as f:
        print(code, file=f)
    path.insert(0, f'{d!s}')
    import module
    del path[0]

print(f'{module.T = }')

We can think of all code creation mechanisms as lying on a spectrum:

from tempfile import TemporaryDirectory
from inspect import getsource
from collections import namedtuple
from textwrap import dedent
from sys import path
from pathlib import Path

class T0: pass

T1 = namedtuple('T1', '')
class T1(namedtuple('T1', '')): pass

...

T2 = type('T2', (tuple,), {'__call__': lambda _: ...})

...

exec(dedent('''
    class T3:
        pass
'''), globals(), locals())

with TemporaryDirectory() as d:
    d = Path(d)
    code = dedent('''
        class T4:
            pass
    ''').strip()
    with open(d / 'module.py', mode='wt') as f:
        print(code, file=f)
    path.insert(0, f'{d!s}')
    from module import *
    del path[0]

    print(
        f'{T0 = }', # getsource(T0),
        f'{T1 = }', # getsource(T1),
        f'{T2 = }', # getsource(T2),
        f'{T3 = }', # getsource(T3),
        f'{T4 = }', getsource(T4),
        sep='\n',
    )

When would I actually use eval or exec… and should I feel as guilty when I do it?