Trying to understand what is refactoring, follow the guide from Real Python. https://realpython.com/python-refactoring

1. Functions That Should Be Objects

Without reading the code, you will not know if it will modify the original image or create a new image.

def load_image(path):
    with open(path, "rb") as file:
        fb = file.load()
    image = img_lib.parse(fb)
    return image

def crop_image(image, width, height):
    ...
    return image

def get_image_thumbnail(image, resolution=100):
    ...
    return image
imagelib.py

To call the codes:

from imagelib import load_image, crop_image, get_image_thumbnail

image = load_image('~/face.jpg')
image = crop_image(image, 400, 500)
thumb = get_image_thumbnail(image)
usage

Symptoms of code using functions that could be refactored into classes:

  • Similar arguments across functions
  • Higher number of Halstead h2 unique operands (All the variables and constants are considered operands)
  • Mix of mutable and immutable functions
  • Functions spread across multiple Python files

Here is a refactored version of those 3 functions, where the following happens:

  • .__init__() replaces load_image().
  • crop() becomes a class method.
  • get_image_thumbnail() becomes a property.

The thumbnail resolution has become a class property, so it can be changed globally or on that particular instance:

class Image(object):
    thumbnail_resolution = 100
    def __init__(self, path):
        ...

    def crop(self, width, height):
        ...

    @property
    def thumbnail(self):
        ...
        return thumb
imagelib.py refactored

This is how the refactored example would look:

from imagelib import Image

image = Image('~/face.jpg')
image.crop(400, 500)
thumb = image.thumbnail
usage refactored

In the resulting code, we have solved the original problems:

  • It is clear that thumbnail returns a thumbnail since it is a property, and that it doesn’t modify the instance.
  • The code no longer requires creating new variables for the crop operation.

2. Objects That Should Be Functions

Here are some tell-tale signs of incorrect use of classes:

  • Classes with 1 method (other than .__init__())
  • Classes that contain only static methods
class Authenticator(object):
    def __init__(self, username, password):
        self.username = username
        self.password = password

    def authenticate(self):
        ...
        return result
authentication class

It would make more sense to just have a simple function named authenticate() that takes username and password as arguments:

def authenticate(username, password):
    ...
    return result
authenticate.py

3. Converting “Triangular” Code to Flat Code

These are the symptoms of highly nested code:

  • A high cyclomatic complexity because of the number of code branches
  • A low Maintainability Index because of the high cyclomatic complexity relative to the number of lines of code
def contains_errors(data):
    if isinstance(data, list):
        for item in data:
            if isinstance(item, str):
                if item == "error":
                    return True
    return False
function

Refactor this function by “returning early”

def contains_errors(data):
    if not isinstance(data, list):
    	return False
	return data.count("error") > 0
function refactored

Another technique to reduce nesting by list comprehension

Common practise to create list, loop through and check for criteria.

results = []
for item in iterable:
    if item == match:
        results.append(item)
loop

Replace with:

results = [item for item in iterable if item == match]
result

4. Handling Complex Dictionaries With Query Tools

It does have one major side-effect: when dictionaries are highly nested, the code that queries them becomes nested too.

data = {
 "network": {
  "lines": [
    {
     "name.en": "Ginza",
     "name.jp": "銀座線",
     "color": "orange",
     "number": 3,
     "sign": "G"
    },
    {
     "name.en": "Marunouchi",
     "name.jp": "丸ノ内線",
     "color": "red",
     "number": 4,
     "sign": "M"
    }
  ]
 }
}
dictionary

If you wanted to get the line that matched a certain number, this could be achieved in a small function:

def find_line_by_number(data, number):
    matches = [line for line in data if line['number'] == number]
    if len(matches) > 0:
        return matches[0]
    else:
        raise ValueError(f"Line {number} does not exist.")
find by line
find_line_by_number(data["network"]["lines"], 3)
result

There are third party tools for querying dictionaries in Python. Some of the most popular are JMESPath, glom, asq, and flupy.

import jmespath

jmespath.search("network.lines", data)
[{'name.en': 'Ginza', 'name.jp': '銀座線',
  'color': 'orange', 'number': 3, 'sign': 'G'}, 
 {'name.en': 'Marunouchi', 'name.jp': '丸ノ内線',
  'color': 'red', 'number': 4, 'sign': 'M'}]
jmespath

If you wanted to get the line number for every line, you could do this:

>>> jmespath.search("network.lines[*].number", data)
[3, 4]
result

You can provide more complex queries, like a == or <. The syntax is a little unusual for Python developers, so keep the documentation handy for reference.

Find the line with the number 3

>>> jmespath.search("network.lines[?number==`3`]", data)
[{'name.en': 'Ginza', 'name.jp': '銀座線', 'color': 'orange', 'number': 3, 'sign': 'G'}]
result