Copy
View this email in your browser
Subscribed by mistake? Unsubscribe
DUTC WEEKLY
Issue no. 83
May the 4th 🖖,  2022

Issue Preview


1. Front Page: New Seminar Series - Think Fast! 
2. News Article: We Want to Train You!
3. Cameron's Corner
4. Community Calendar

New Seminar Series: Think Fast!


DUTC is proud to announce our newest seminar series, coming at you during the month of May: "Coding at the Speed of Thought"! 
This May will be PACKED with seminars! You can expect multiple topics, pop-up sessions, and even an exclusive code-along!

Are you looking for a dense, compact problem that can help you clearly explore how to better design and iterate through the designs of your programs?

Do you like watching a complex program emerge as the natural consequence of a deliberate, methodical sequence of simple considerations and decisions? (I know I do!)

If this sounds like you, you’re going to want to join us for this three-part seminar series! Keep your eye out for an Event Alert later this week to sign up!

We Want to Train You!


As a quick reminder, if you love our seminars, then you'll love our industrial-strength corporate training! We offer customized training courses for teams, ranging in topics from Python to shell scripting to SQL. Whatever your team needs for individualized instruction, we can help!

To get more information, reach out to us at info@dutc.io, or send your boss or Training Manager our way! 

 

Cameron’s Corner

 

Decorators: Reinventing the Wheel

Hey everyone, welcome to another week of Cameron's Corner! This is going to be my last post on decorators for a little while, so I wanted to take some time and expand on what packages you might see generators in and how I would implement them from scratch. In this post, I'm going to reinvent the wheel—that is, you'll see code I've written to replicate popular decorators from many third-party packages. I am aiming to replicate only the core functionality of these decorator patterns in order to better highlight that these mechanisms are not something magical. There is real code underlying these patterns that enable unique designs.

When writing these examples, I only looked at various documentation pages and examples that use these decorators. No source code was examined or copied.

Where might I see decorators used?

 
  • Registration: keep track of groups of similar functions without the need of a class or inheritance
  • Logging/Warnings: dynamically warn the user that a given function will be deprecated in the future.
  • Wrapping: add some preprocessing or postprocessing steps to inbound/outbound data before it reaches the decorated function.
    • validation
    • caching
In this (overgeneralized, but not exhaustive) enumeration, registration is a pattern we see taking advantage of the function definition entry point, whereas wrappers take advantage of the before/after execution entry points.

In the following examples, I am not attempting to replicate the complexity of processing each library strove to implement, but merely implementing the same pattern these libraries use via decorators. This is meant to help build mental models about how these features work, but may not be an exact 1:1 implementation as I am not reading through source code to put these together.
app.route
  • Packages: Flask, pandas (register accessor)
The registration approach has been used largely in Flask, and while I do not consider myself to be a Flask expert, I do know their API revolves heavily around higher-order decorators and registration to create applications. Let's take a look:
class App:
    def __init__(self):
        self.endpoints = []
    
    def route(self, path, method='GET'):
        def decorator(f):
            entry = (path, method, f)
            self.endpoints.append(entry)
            return f
        return decorator

    
app = App()
    
@app.route('/')
def home():
    pass

@app.route('/blog', method='GET')
def blog():
    pass

@app.route('/login', method='POST')
def login():
    pass

print(app.endpoints)
# [('/', 'GET', <function __main__.home()>),
#  ('/blog', 'GET', <function __main__.blog()>),
#  ('/login', 'POST', <function __main__.login()>)]
I also mentioned that pandas uses a registration pattern for their accessors. If you've ever used Series.str, Series.dt, Series.cat, {Series,DataFrame}.plot or geopandas {Series,DataFrame}.geo, then you've used an accessor in pandas. These accessors are dynamically added on to pandas objects at runtime to easily enable users and library authors to extend their functionality without needing to subclass pandas objects. This uniquely enables us to extend pandas without needing to replace every instance of a DataFrame or Series with a custom subclass as well as providing a convenient namespacing for the added functionality.
Input/Output validation

Another use-case we encounter fairly often is input/output validation. This idea is useful for writing and parameterizing tests in order to provide a separation of your test code from the possible parameters you want to input. In addition to tests, we can use this same idea to perform checks at runtime (instead of explicit tests).

In the following example, I've written a runtime type checker.
from inspect import signature
from collections import namedtuple

mismatch = namedtuple('mismatch', 'arg expected_type received_type value')

def type_enforce(f):   
    def wrapper(*args, **kwargs):
        ba = sig.bind(*args, **kwargs)
        ba.apply_defaults()
        
        mismatched_types = []
        for key, value in ba.arguments.items():
            annot = f.__annotations__.get(key, None)
            if annot is None:
                continue
            
            elif not isinstance(value, annot):
                mismatched_types.append(
                    mismatch(
                        arg=key,
                        expected_type=annot,
                        received_type=type(value),
                        value=value)
                )
        
        if mismatched_types:
            message = '{prefix} for {f}\n{mismatches}\n'.format(
                prefix='Incorrect input types detected',
                f=f,
                mismatches='\n'.join([f'\t{m}' for m in mismatched_types])
            )
            raise TypeError(message)
            
        return f(*args, **kwargs)

    sig = signature(f)
    return wrapper


@type_enforce
def f1(a: int, b: int, c: None = True):
    if c:
        return a + b
    else:
        return a - b

>>> f1(2, b=1) # works as expected
3

>>> f1(2, b='hi', c=False) # b should be an int
# ---------------------------------------------------------------------------
# TypeError                                 Traceback (most recent call last)
# /tmp/ipykernel_180463/3590008736.py in <cell line: 1>()
# ----> 1 f1(2, b='hi', c=False) # b should be an int

# /tmp/ipykernel_180463/1928943809.py in wrapper(*args, **kwargs)
#      30                 mismatches='\n'.join([f'\t{m}' for m in mismatched_types])
#      31             )
# ---> 32             raise TypeError(message)
#      33 
#      34         return f(*args, **kwargs)

# TypeError: Incorrect input types detected for <function f1 at 0x7fd125eec0d0>
# 	mismatch(arg='b', expected_type=<class 'int'>, received_type=<class 'str'>, value='hi')

>>> f1(2.1, 'bye', c=-1) # a and b should be integers
# ---------------------------------------------------------------------------
# TypeError                                 Traceback (most recent call last)
# /tmp/ipykernel_180463/730960059.py in <cell line: 1>()
# ----> 1 f1(2.1, 'bye', c=-1) # a and b should be integers

# /tmp/ipykernel_180463/1928943809.py in wrapper(*args, **kwargs)
#      30                 mismatches='\n'.join([f'\t{m}' for m in mismatched_types])
#      31             )
# ---> 32             raise TypeError(message)
#      33 
#      34         return f(*args, **kwargs)

# TypeError: Incorrect input types detected for <function f1 at 0x7fd125eec0d0>
# 	mismatch(arg='a', expected_type=<class 'int'>, received_type=<class 'float'>, value=2.1)
# 	mismatch(arg='b', expected_type=<class 'int'>, received_type=<class 'str'>, value='bye')
As you can see in the above example, I am using the functions type annotations to perform actual runtime checks! While this is not an alternative to using a real type checker, you can see how we can use a decorator to perform various types of input or output validation.

You can also extend this idea to higher-order decorators to perform parameterized testing like you encounter in pytest and hypothesis.
numpy.vectorize
  • Packages: NumPy, many many others
Many decorators will actually change or coerce inputs to the functions they're decorating. This is typically used to extend the behavior of that function. A great example of this is numpy.vectorize. This is a convenience function to help users abstract away Python for-loops. I also want everyone to note that numpy.vectorize does NOT magically implement your Python function any faster than using an arbitrary, Python-based for-loop.
from numpy import broadcast_arrays, full_like, nan

def vectorize(func):
    def wrapper(*args, **kwargs):
        # would need reflection to determine number of args in func
        a, b = args
        a_arr, b_arr = broadcast_arrays(a, b)
        out = full_like(a_arr, nan)
        _out_raveled = out.ravel()
        
        for i, (_a, _b) in enumerate(zip(a_arr.ravel(), b_arr.ravel())):
            _out_raveled[i] = func(_a, _b)
        return out
    return wrapper
        
    
# example function copied from numpy.vectorize docs
@vectorize
def myfunc(a, b):
    "Return a-b if a>b, otherwise return a+b"
    if a > b:
        return a - b
    else:
        return a + b

print(
    myfunc(1, 3),
    myfunc([1, 2, 3, 4, 5, 6], 3),
    myfunc(3, [1, 2, 3, 4, 5, 6]),
    myfunc([[1,2,3],[4,5,6]], [[2], [4]])
)
# array(4)
# array([4, 5, 6, 1, 2, 3])
# array([2, 1, 6, 7, 8, 9])
# array([[3, 4, 1],
#        [8, 1, 2]])
functools.lru_cache

Caching, or memoization in this specific case, is a technique used to circumvent a computationally intensive function call when we have previously used inputs. Say we have a function that takes a few seconds to complete and it takes two inputs. If we don't expect the output of this function to change when the input doesn't change AND we expect the need to call this function with the same inputs many times, then we have a great scenario for effective memoization.

Essentially, the first time we call a function with a set of inputs, we store the output. Then whenever we encounter those same inputs we simply load the stored output instead of repeating the computational step again. This type of functionality (with added complexity) can be seen in the built-in
functools.lru_cache.
from inspect import signature

class memoize(dict):
    def __init__(self, f):
        self.f, self.sig = f, signature(f)
    def __call__(self, *args, **kwargs):
        key = self.sig.bind(*args, **kwargs)
        return self[key.args, frozenset(key.kwargs.items())]
    def __missing__(self, key):
        args, kwargs = key
        self[key] = self.f(*args, **dict(kwargs))
        return self[key]

from time import sleep    

@memoize
def add_sleep(a, b, *, sleep_for=0):
    sleep(sleep_for)
    return a + b


%timeit -n 1 -r 1 add_sleep(4, 2, sleep_for=1)
%timeit -n 1 -r 1 add_sleep(4, 2, sleep_for=1)

# 1 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
# 97.5 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
As you can see in the above, the first time we call add_sleep, it takes the function one second to complete. Whereas, when we call it the second time, it now returns instantly! This is because we have cached the result and can skip the actual execution of the function and return the stored result.
contextlib.contextmanager

I couldn't avoid this gem in the standard library. contextlib.contextmanager enables us to turn a two-step generator into a valid Python context manager. This idea is that a generator represents a step-based computation of unbounded length. If we take this mindset and apply it to what a context manager is, then we can think of a context manager representing a generator that only has two steps: __enter__ and __exit__!

To turn a two-step generator into an actual context manager, we need to create a class that has its
__enter__ and __exit__. We can simply use this class to advance the generator by one step when we enter the context, and then advance that same generator one more time when we exit the context.
class my_contextmanager:
    def __init__(self, func):
        self.func = func
        
    def __call__(self, *args, **kwargs):
        self.gen = self.func(*args, **kwargs)
        return self
        
    def __enter__(self):
        return next(self.gen)
    
    def __exit__(self, typ, value, traceback):
        try:
            next(self.gen)
        except StopIteration:
            return False
    
@my_contextmanager
def generator():
    print('entering context')
    yield
    print('exiting context')

with generator():
    pass
Logging/Warnings

Last up, we have some logging, warnings, and deprecation markers. Decorators are used for this purpose in many packages to notify users that certain functions will be removed in a future version (or specific date) of a package. Here, I've implemented a date-based deprecation system where we either warn a user that a specific function will be removed or emit a message that we haven't yet removed this function. Importantly, we will only warn users about these depreciations if they actually attempt to call the decorated function.
from datetime import datetime
from warnings import warn

def deprecate(*, remove_on):
    def decorator(f):
        def wrapper(*args, **kwargs):
            if datetime.now() < remove_on:
                warn(f'Please dont use {f} anymore, we will remove it on {remove_on:%Y-%m-%d}'.strip())
            else:
                warn('Wait a minute, this function should have been removed already!')

            return f(*args, **kwargs)
        return wrapper
    
    if remove_on is not None:
        remove_on = datetime.strptime(remove_on, '%Y-%m-%d')
    
    return decorator


@deprecate(remove_on='2035-02-04')
def f1():
    pass

@deprecate(remove_on='2000-02-04')
def f2():
    pass

>>> f1()
# /tmp/ipykernel_180463/1508361589.py:8: UserWarning: Please dont use <function f1 at 0x7fd11c0ad480> anymore, we will remove it on 2035-02-04
#   warn(f'Please dont use {f} anymore, we will remove it on {remove_on:%Y-%m-%d}'.strip())

>>> f2()
# /tmp/ipykernel_180463/1508361589.py:10: UserWarning: Wait a minute, this function should have been removed already!
#   warn('Wait a minute, this function should have been removed already!')

Summary


That was a lot of decorators! We've implemented core features of many popular Python packages and modules. You can see that a lot of these decorators have vastly different behaviors, but a single common syntax. When thinking about writing a decorator in your own code, always start with this simple question:
  • Do you want to run some common code against multiple functions/classes?
This will give you the hard answer about whether or not your code needs a decorator. Once you have answered that, then you should begin thinking about the various implementations and how they would interface with your existing code. I hope you were able to learn something from this demonstration to apply these ideas within your own code! Importantly, I hope that some of these widely used decorator patterns are no longer a mystery or 'magical' when you see them.

Until next week!


- Cam

Community Calendar

Django Girls 2022
May 21, 2022

If you are a woman, know English, and have a laptop, you can apply for this event to learn how to build a website! You don't need to know any technical stuff – this workshop is for people who are new to programming.
PyCon LT 2022
May 26–27, 2022



DUTC's James Powell will be attending as a keynote speaker!

PyCon LT is a community event that brings together new and experienced Python users. Their goals are to grow and support a community of Python users, encourage learning and knowledge sharing, and popularize Python tools/libraries and open source in general. You can find more information on their website or Facebook page.
PyCon IT 2022
June 2–5, 2022

PyCon Italia is the Italian conference on Python. Organised by Python Italia, it has now become one of the most important Python conferences in Europe. With over 700 attendees, the next edition will be the 12th.
PyData London 2022
June 17–19, 2022



DUTC's James Powell will be attending, giving a talk, and hosting the PubQuiz!

PyData London 2022 is a 3-day event for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization.
GeoPython 2022
June 20–22, 2022

The conference is focused on Python and Geo, its toolkits and applications. GeoPython 2022 will be the continuation of the yearly conference series. The conference started in 2016 and this is now the 7th edition. Key subjects for the conference series are the combination of the Python programming language and Geo.
EuroPython 2022
July 11–17, 2022

   

DUTC's James Powell and Cameron Riddell will be participating in the EuroPython mentorship program!

Welcome to the 21st EuroPython. We're the oldest and longest-running, volunteer-led Python programming conference on the planet! Join us in July in the beautiful and vibrant city of Dublin. We'll be together, face to face and online, to celebrate our shared passion for Python and its community!
August 19–21, 2022

The first-ever Kiwi PyCon was held in Ōtautahi Christchurch, in 2009, the same year that New Zealand Python User Group was founded. The Garden City holds a very special place in the history of Python in New Zealand and Kiwi PyCon XI should have been held there in September 2020.  Along came the pandemic...

Somewhat later, we are back!

For more information, check out their official Twitter account.
PyCon APAC 2022
September 3–4, 2022

PyCon Taiwan is an annual convention in Taiwan for the discussion and promotion of the Python programming language. It is held by enthusiasts and focuses on Python technology and its versatile applications.
PyCon UK 2022
September 16–18, 2022

PyCon UK will be returning to Cardiff City Hall from Friday 16th to Sunday 18th September 2022.
More details coming soon!
DjangoCon Europe 2022
September 21–25, 2022

This is the 14th edition of the Conference and it is organized by a team made up of Django practitioners from all levels. We welcome people from all over the world.

Our conference seeks to educate and develop new skills, best practices, and ideas for the benefit of attendees, developers, speakers, and everyone in our global Django Community, not least those watching the talks online.
For more community events, check out python.org's event page.

Have any community events you'd like to see in the Community Calendar? Submit your suggestions to newsletter@dutc.io!
Twitter
Website
Copyright © 2022 Don't Use This Code, All rights reserved.