Python Descriptors Demystified

By Chris Beaumont

Python includes many built-in language features to enable concise, easily-understood code. Some of these niceties include list/set/dictionary comprehensions, properties, and decorators. For the most part, these "intermediate-level" language features are well-documented, and easy to learn.

There is one notable exception to this: descriptors. For me at least, descriptors were the feature of the core Python language that remained mysterious for the longest time. There are a few reasons for this:

  1. The official documentation on descriptors is rather esoteric, and doesn't include good use cases for why you might write descriptors (My apologies to Raymond Hettinger, whose other Python articles and videos I have found very helpful).

  2. The syntax for writing descriptors is a little weird.

  3. Custom descriptors might be the least-utilized feature of the Python language, so it's hard to find good examples in open source projects.

Nevertheless, descriptors do have their use once you figure them out. This document tries to build the argument for what descriptors do, and why you should care.

The punchline: descriptors are reusable properties

Here's what we're building up to: fundamentally, descriptors are properties that you can reuse. That is, descriptors let you write code that looks like this

In [14]:
f = Foo()
b = f.bar
f.bar = c
del f.bar

and, behind the scenes, calls custom methods when trying to access (b = f.bar), assign to (f.bar = c), or delete an instance variable (del f.bar)

Let's establish why being able to disguise function calls as attribute access is a good thing.

Properties disguise function calls as attributes

Imagine you are writing some code to organize information about movies (spoiler alert: these projects beat you to it). You might end up with a movie class that looks like this:

In [15]:
class Movie(object):
    def __init__(self, title, rating, runtime, budget, gross):
        self.title = title
        self.rating = rating
        self.runtime = runtime
        self.budget = budget
        self.gross = gross
        
    def profit(self):
        return self.gross - self.budget

You start using this class in other parts of your project, but then you realize something: by mistake, you sometimes assign negative budgets to movies. You decide this is bad, and want the Movie class to forbid this. The first thing you think to try is this:

In [17]:
class Movie(object):
    def __init__(self, title, rating, runtime, budget, gross):
        self.title = title
        self.rating = rating
        self.runtime = runtime
        self.gross = gross
        if budget < 0:
            raise ValueError("Negative value not allowed: %s" % budget)
        self.budget = budget
        
    def profit(self):
        return self.gross - self.budget

But that won't work, because other parts of your code assign values to Movie.budget directly -- this new class catches data entry errors within the __init__ method, but not the cases where somebody tries to run m.budget = -100 on a pre-existing instance. What's a cinephile pythonista to do?

Luckily, Python properties solve this problem. If you've never seen properties before, here's how they work:

In [18]:
class Movie(object):
    def __init__(self, title, rating, runtime, budget, gross):
        self._budget = None

        self.title = title
        self.rating = rating
        self.runtime = runtime
        self.gross = gross
        self.budget = budget
        
    @property
    def budget(self):
        return self._budget
    
    @budget.setter
    def budget(self, value):
        if value < 0:
            raise ValueError("Negative value not allowed: %s" % value)
        self._budget = value
        
    def profit(self):
        return self.gross - self.budget

    
m = Movie('Casablanca', 97, 102, 964000, 1300000)
print m.budget       # calls m.budget(), returns result
try:
    m.budget = -100  # calls budget.setter(-100), and raises ValueError
except ValueError:
    print "Woops. Not allowed"
964000
Woops. Not allowed

We specify a getter method with a @property decorator, and a setter method with a @budget.setter decorator. When we do that, Python automatically calls the getter whenever anybody tries to access the budget. Likewise Python automatically calls budget.setter whenever it encounters code like m.budget = value.

Take a moment to appreciate how nice it is that Python does this: if properties didn't exist, we'd have to hide all of our instance attributes, and provide lots of explicit methods like get_budget and set_budget. Code that uses our classes would constantly be calling these getter/setter methods, and would start to look like crufty Java code. Even worse, if we ignored this coding style and just gave direct access to an instance attribute like budget, there would be no clean way to later add the non-negativity check -- we would have to retroactively create the set_budget method, and search our entire project to change lines like m.budget = value to m.set_budget(value). Gross.

So properties let you attach custom code to variable getting/setting, while maintaining a simple attribute-like interface for your classes. Nice.

Properties Get Tedious

The main downside to properties is that they aren't reusable. For example, let's assume you want to add the non-negativity check to the rating, runtime, and gross fields as well. Here's the new class

In [19]:
class Movie(object):
    def __init__(self, title, rating, runtime, budget, gross):
        self._rating = None
        self._runtime = None
        self._budget = None
        self._gross = None

        self.title = title
        self.rating = rating
        self.runtime = runtime
        self.gross = gross
        self.budget = budget
        
    #nice
    @property
    def budget(self):
        return self._budget
    
    @budget.setter
    def budget(self, value):
        if value < 0:
            raise ValueError("Negative value not allowed: %s" % value)
        self._budget = value
        
    #ok    
    @property
    def rating(self):
        return self._rating
    
    @rating.setter
    def rating(self, value):
        if value < 0:
            raise ValueError("Negative value not allowed: %s" % value)
        self._rating = value
       
    #uhh...
    @property
    def runtime(self):
        return self._runtime
    
    @runtime.setter
    def runtime(self, value):
        if value < 0:
            raise ValueError("Negative value not allowed: %s" % value)
        self._runtime = value        
    
    #is this forever?
    @property
    def gross(self):
        return self._gross
    
    @gross.setter
    def gross(self, value):
        if value < 0:
            raise ValueError("Negative value not allowed: %s" % value)
        self._gross = value        
        
    def profit(self):
        return self.gross - self.budget

That's a lot of code, and a lot of duplicated logic. While properties make the outsides of classes look nice, they don't make the insides of classes look nice.

Descriptors (Finally)

This is the problem that descriptors solve. Descriptors generalize properties, and let you write separate classes for reusable property logic. Here's an example of how they work (for the moment, don't worry about what's inside NonNegative):

In [2]:
from weakref import WeakKeyDictionary

class NonNegative(object):
    """A descriptor that forbids negative values"""
    def __init__(self, default):
        self.default = default
        self.data = WeakKeyDictionary()
        
    def __get__(self, instance, owner):
        # we get here when someone calls x.d, and d is a NonNegative instance
        # instance = x
        # owner = type(x)
        return self.data.get(instance, self.default)
    
    def __set__(self, instance, value):
        # we get here when someone calls x.d = val, and d is a NonNegative instance
        # instance = x
        # value = val
        if value < 0:
            raise ValueError("Negative value not allowed: %s" % value)
        self.data[instance] = value

        
class Movie(object):
    
    #always put descriptors at the class-level
    rating = NonNegative(0)
    runtime = NonNegative(0)
    budget = NonNegative(0)
    gross = NonNegative(0)
    
    def __init__(self, title, rating, runtime, budget, gross):
        self.title = title
        self.rating = rating
        self.runtime = runtime
        self.budget = budget
        self.gross = gross
    
    def profit(self):
        return self.gross - self.budget
    
    
m = Movie('Casablanca', 97, 102, 964000, 1300000)
print m.budget  # calls Movie.budget.__get__(m, Movie)
m.rating = 100  # calls Movie.budget.__set__(m, 100)
try:
    m.rating = -1   # calls Movie.budget.__set__(m, -100)
except ValueError:
    print "Woops, negative value"
964000
Woops, negative value

There's some new syntax in here, so let's look at things piece by piece:

NonNegative is a descriptor object. It's a descriptor because it defines the __get__, __set__, or __delete__ method.

The Movie class looks very clean. We create 4 descriptors at the class level, and treat them like normal (instance-level) attributes everywhere else. And apparently, the desciptors are checking for non-negative values for us.

Accessing a descriptor

When Python sees the line print m.budget, it recognizes that budget is a descriptor with a __get__ method. Instead of passing m.budget to print directly, it calls Movie.budget.__get__, and feeds the result of that to print. This is similar to what happens when you access a property -- Python automatically calls a method, and returns the result.

__get__ receives two arguments: the instance object to the left of the period (that is, the m object in m.budget), and the type of that instance (Movie). In some Python documentation, Movie is called the owner of the descriptor. If we had asked for Movie.budget, Python whould have called Movie.budget.__get__(None, Movie); that is, the fist argument is either an instance of the owner, or None. These input arguments may seem weird to you, but they're there to give you information about what object the descriptor is part of. This will make sense once we look inside the NonNegative class.

Assigning to a descriptor

When Python sees m.rating = 100, Python recognizes rating is a descriptor with a __set__ method, and it calls Movie.rating.__set__(m, 100). Like __get__, the first argument of __set__ is the instance to the left of the period (the m in m.rating = 100). The second argument is the value to the right of the equals sign (100).

Deleting a descriptor

For the sake of completeness, if you call del m.budget, Python will call Movie.budget.__delete__(m).

How NonNegative works

With this in mind, we can now look to see how the NonNegative class works. Each instance of NonNegative maintains a dictionary that maps owner instances to data values. When we call m.budget, the __get__ method looks up the data associated with m, and returns the result (or a default value, if no such value exists). __set__ uses the same approach, but includes the extra non-negativity check. We use a WeakKeyDictionary instead of a normal dict to prevent a memory leak -- we don't want an instance to stay alive simply because it's in the descriptor dictionary, and otherwise unused.

Working with descriptors is slightly awkward. Because they live at the class level, every instance shares the same descriptor. This means that descriptors have to manually manage different states for different object instances, and need to explicitly be passed instances as the first argument of the __get__, __set__, and __delete__ methods.

Hopefully, however, this example gives you an idea of what descriptors can be useful for -- they provide a way to organize property logic into isolated classes. If you find yourself repeating the same logic across several properties, that should be a clue to consider whether refactoring that code into a descriptor is worthwhile.

Recipes and Gotchas

Put descriptors at the class level

For descriptors to work properly, they must be defined at the class level. If you don't, Python doesn't automatically invoke the __get__ and __set__ methods for you:

In [4]:
class Broken(object):
    y = NonNegative(5)
    def __init__(self):
        self.x = NonNegative(0)  # NOT a good descriptor
        
b = Broken()
print "X is %s, Y is %s" % (b.x, b.y)
X is <__main__.NonNegative object at 0x10432c250>, Y is 5

As you can see, accessing the class-level descriptor y automatically calls __get__. However, accessing the instance-level descriptor x returns the descriptor itself, sans magic.

Make sure to keep instance-level data instance-specific

You might be tempted to write the NonNegative descriptor like this

In [15]:
class BrokenNonNegative(object):
    def __init__(self, default):
        self.value = default
        
    def __get__(self, instance, owner):
        return self.value
    
    def __set__(self, instance, value):
        if value < 0:
            raise ValueError("Negative value not allowed: %s" % value)
        self.value = value
        
class Foo(object):
    bar = BrokenNonNegative(5) 
    
f = Foo()
try:
    f.bar = -1
except ValueError:
    print "Caught the invalid assignment"
Caught the invalid assignment

That seems to work fine. The problem here is that all instances of Foo share the same bar instance, leading to this flavor of sadness:

In [16]:
class Foo(object):
    bar = BrokenNonNegative(5) 
    
f = Foo()
g = Foo()

print "f.bar is %s\ng.bar is %s" % (f.bar, g.bar)
print "Setting f.bar to 10"
f.bar = 10
print "f.bar is %s\ng.bar is %s" % (f.bar, g.bar)  #ouch
f.bar is 5
g.bar is 5
Setting f.bar to 10
f.bar is 10
g.bar is 10

This is why we used the data dictionary in NonNegative. The first argument to __get__ and __set__ tell us which instance to consider. NonNegative uses this argument as a dictionary key, to keep data for each Foo instance separate.

In [9]:
class Foo(object):
    bar = NonNegative(5)
    
f = Foo()
g = Foo()
print "f.bar is %s\ng.bar is %s" % (f.bar, g.bar)
print "Setting f.bar to 10"
f.bar = 10
print "f.bar is %s\ng.bar is %s" % (f.bar, g.bar)  #better
f.bar is 5
g.bar is 5
Setting f.bar to 10
f.bar is 10
g.bar is 5

This is the most awkward aspect of descriptors (full disclosure: I don't actually understand why Python doesn't let you define descriptors at the instance level, and always dispatch to __get__ and __set__. There must be some reason why this doesn't work. UPDATE: Thanks to Louie Dinh who pointed me to the reason why: see this post if you're interested).

Beware unhashable descriptor owners

NonNegative uses a dictionary to keep instance-specific data separate. This normally works fine, unless you want to use descriptors with unhashable objects:

In [8]:
class MoProblems(list):  #you can't use lists as dictionary keys
    x = NonNegative(5)
        
m = MoProblems()
print m.x  # womp womp
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-dd73b177bd8d> in <module>()
      3 
      4 m = MoProblems()
----> 5 print m.x  # womp womp

<ipython-input-3-6671804ce5d5> in __get__(self, instance, owner)
      9         # instance = x
     10         # owner = type(x)
---> 11         return self.data.get(instance, self.default)
     12 
     13     def __set__(self, instance, value):

TypeError: unhashable type: 'MoProblems'

Because instances of MoProblems (which is a subclass of list) aren't hashable, they can't be used as keys in the data dictionary for MoProblems.x. There are a few ways around this, though none are perfect. The best approach is probably to "label" your descriptors

In [2]:
class Descriptor(object):
    
    def __init__(self, label):
        self.label = label
        
    def __get__(self, instance, owner):
        print '__get__', instance, owner
        return instance.__dict__.get(self.label)
    
    def __set__(self, instance, value):
        print '__set__'
        instance.__dict__[self.label] = value
        

class Foo(list):
    x = Descriptor('x')
    y = Descriptor('y')
    
f = Foo()
f.x = 5
print f.x
__set__
__get__ [] <class '__main__.Foo'>
5

This relies on a highly non-obvious detail of Python's method resolution order. We label each descriptor in Foo with the same name as the variable that we assign the descriptor to (for example, x = Descriptor('x')). The descriptor then stores instance-specific data in f.__dict__['x']. This dictionary entry would normally be what Python returns when we ask for f.x. However, because Foo.x is a descriptor, Python doesn't use f.__dict__['x'] normally, and the descriptor can safely store stuff there. Just make sure you don't label the descriptor anything else:

In [10]:
class Foo(object):
    x = Descriptor('y')
    
f = Foo()
f.x = 5
print f.x

f.y = 4    #oh no!
print f.x
__set__
__get__ <__main__.Foo object at 0x10432c810> <class '__main__.Foo'>
5
__get__ <__main__.Foo object at 0x10432c810> <class '__main__.Foo'>
4

I don't love this pattern, since it's fragile and subtle, but it's fairly common. And it works for unhashable owner classes. David Beazley uses it in his books

Labeled Descriptors with Metaclasses

Because descriptor labels match the variable name they are assigned to, some people use metaclasses to take care of this bookkeeping automatically:

In [15]:
class Descriptor(object):
    def __init__(self):
        #notice we aren't setting the label here
        self.label = None
        
    def __get__(self, instance, owner):
        print '__get__. Label = %s' % self.label
        return instance.__dict__.get(self.label, None)
    
    def __set__(self, instance, value):
        print '__set__'
        instance.__dict__[self.label] = value

        
class DescriptorOwner(type):
    def __new__(cls, name, bases, attrs):
        # find all descriptors, auto-set their labels
        for n, v in attrs.items():
            if isinstance(v, Descriptor):
                v.label = n
        return super(DescriptorOwner, cls).__new__(cls, name, bases, attrs)

        
class Foo(object):
    __metaclass__ = DescriptorOwner
    x = Descriptor()
    
f = Foo()
f.x = 10
print f.x
    
__set__
__get__. Label = x
10

I won't explain the details of metaclasses -- David Beazley's tutorial at the bottom of this article covers them. The main point is that the metaclass auto-assigns descriptor labels, to match the variable name that each descriptor is assigned to.

While this solves the problem of mismatched descriptor labels and variable names, it does so by adding all the complexity of metaclasses. You can decide if this is worth the hassle, but I have my doubts.

Accessing Descriptor Methods

Descriptors are just classes, and you may want to add other methods to them. For example, descriptors are a great way to implement callback properties. Say we want a class to notify us whenever part of its state changes. Here's most of the code to do that

In [3]:
class CallbackProperty(object):
    """A property that will alert observers when upon updates"""
    def __init__(self, default=None):
        self.data = WeakKeyDictionary()
        self.default = default
        self.callbacks = WeakKeyDictionary()
        
    def __get__(self, instance, owner):
        return self.data.get(instance, self.default)
    
    def __set__(self, instance, value):        
        for callback in self.callbacks.get(instance, []):
            # alert callback function of new value
            callback(value)
        self.data[instance] = value
        
    def add_callback(self, instance, callback):
        """Add a new function to call everytime the descriptor updates"""
        #but how do we get here?!?!
        if instance not in self.callbacks:
            self.callbacks[instance] = []
        self.callbacks[instance].append(callback)
        
class BankAccount(object):
    balance = CallbackProperty(0)
    
def low_balance_warning(value):
    if value < 100:
        print "You are poor"
                
ba = BankAccount()

# will not work -- try it
#ba.balance.add_callback(ba, low_balance_warning)

This is a promising pattern -- we can attach custom callback functions to respond to state changes within a class, without having to modify the class code at all. That's a lovely separation of concerns. All we need to do now is call ba.balance.add_callback(ba, low_balance_warning), so that low_balance_warning is called whenever balance changes.

But how do we do that? Descriptors always call __get__ when we try to access them. It would seem that the add_callback method is unreachable! The trick is to take advantage of the special case that, when accessed from the class level, the first argument to __get__ is None:

In [5]:
class CallbackProperty(object):
    """A property that will alert observers when upon updates"""
    def __init__(self, default=None):
        self.data = WeakKeyDictionary()
        self.default = default
        self.callbacks = WeakKeyDictionary()
        
    def __get__(self, instance, owner):
        if instance is None:
            return self        
        return self.data.get(instance, self.default)
    
    def __set__(self, instance, value):
        for callback in self.callbacks.get(instance, []):
            # alert callback function of new value
            callback(value)
        self.data[instance] = value
        
    def add_callback(self, instance, callback):
        """Add a new function to call everytime the descriptor within instance updates"""
        if instance not in self.callbacks:
            self.callbacks[instance] = []
        self.callbacks[instance].append(callback)
        
class BankAccount(object):
    balance = CallbackProperty(0)
    
def low_balance_warning(value):
    if value < 100:
        print "You are now poor"
                
ba = BankAccount()
BankAccount.balance.add_callback(ba, low_balance_warning)

ba.balance = 5000
print "Balance is %s" % ba.balance
ba.balance = 99
print "Balance is %s" % ba.balance
Balance is 5000
You are now poor
Balance is 99

Fin

Hopefully, you now have an understanding of what descriptors are, and when they are useful. Go forth and refactor.

Acknowledgements

The CSS on this page is adapted from Cam Davidson-Pilon's awesome and gorgeous book.

There were some relevant talks and tutorials about descriptors and properties at PyCon 2013:

In [1]:
#This makes everything pretty

from IPython.core.display import HTML
from urllib import urlopen
def css_styling():
    styles = open('custom.css', 'r').read()
    return HTML(styles)
css_styling()
Out[1]:
In [ ]: