2.3. Organising code

In the introduction writing different versions of a “Hello world!” program (Hello world), we have already seen quite a number of ways how to organise our code, namely:

  • scripts

  • functions

  • classes

  • modules

  • packages

Generally, there are two concepts that need to be distinguished: organising code in terms of files in the file system, and logically organising code in terms of namespaces, the latter referring to the scope of variables and alike, i.e. where and how they are accessible.

A third way to organise code (and eventually much more, namely entire programs and applications) is creating data structures, be it in form of dictionaries or classes.

2.3.1. Logical organisation: Namespaces and scope

Generally, every variable, function, and class is (only) available in the local context (e.g., file) it is created in. To avoid name clashes, it is therefore an excellent idea to provide local scope, e.g. by defining local variables within a function. In this case, the variable is only accessible from within the function, and the function works as a gatekeeper for information getting in and out.

2.3.1.1. Scripts and the Python interpreter

Usually one of the first things to do when starting to program in an interpreted language is to either fire up the interpreter aka command line or, in a next step, to create a file (script) with a list of commands to call one after the other.

When working directly with the Python interpreter, every variable, function, and class you define there will be accessible for future use, as long as the session lasts. Usually this is not much of an issue, as you will only use the Python interpreter for rapid prototyping, not for serious programming.

If you start writing simple Python scripts, i.e. lists of Python commands in a file, the scope of the variables (and functions and classes) you define will be global to that script. Why is this important? Suppose you’ve created a variable, assigned it some value, and afterwards use that variable in different lines of your code. Whenever a line of code changes the value of the variable, all following lines will be affected. This makes it hard to know what the variable actually contains. A simple example, going back to the “Hello world!” task:

greeting = "Hello world!"
print(greeting)

greeting = "Good bye world!"
print(greeting)

The output of the above code would be, not surprising:

Hello world!
Good bye world!

As you can see, the scope of the variable “greeting” we defined is global in the given context (of this script or command line). Hence changing the text assigned to it changes the behaviour of the print statement.

No surprise so far, and with only four lines of code pretty obvious. But any sensible code will contain more than only a few lines of code, and everything that is longer than about 25 lines gets tricky, as it does not fit on the screen. Actually, the screen size is less of a problem here, the actual limit is our capability to deal with information. The pieces of information we can process in parallel are clearly less than ten, and with much more than 25 lines of code, we cannot see what happens at once. Hence, we will be pretty surprised what happens and have to go through the code step by step to figure it out.

This is why functions are the first line of organising our code, as they provide a local context and behave as independent units.

2.3.1.2. Functions

Functions are the first line of organising our code. Everything that happens inside a function is per se not known to the outside world, i.e. everything outside the function. The only way to get information into and out of functions is to provide arguments (for the input) and a return value (for the output).

In its simplest form a function has neither input nor output. We’ve seen such an example already:

def hello():
    print("Hello world!")

Technically speaking, it is not absolutely correct to say that this function has no output, as it has a clear side effect (printing “Hello world!”). But it definitely has no return value. We can generalise and extend the function, introducing an argument as input:

def greet(greeting):
    print(greeting)

To call this function, we may type:

greet("Hello world!")

No surprise so far. We can define a variable my_greeting outside the function, assign some text to it, and call the function with that variable:

my_greeting = "Hello world!"
greet(my_greeting)

Again, the same output as above. To make things more interesting, let’s define a function that takes an argument as input and returns something:

def create_message(text):
    message = "Your message was: " + text
    return message

You can call this function with an input, and assign its return value to a variable for later use:

output = create_message("Hello world!")
print(output)

This will result in:

Your message was: Hello world!

Still no surprise. But what if we had defined the variable message outside our function? Would that affect how our function behaves? Let’s try and find out:

message = "Goodbye!"

print("Message before function declaration: " + message)

def create_message(text):
    message = "Your message was: " + text
    return message

output = create_message("Hello world!")

print("Message after function declaration: " + message)
print("Output: " + output)

Running the above code will result in:

Message before function declaration: Goodbye!
Message after function declaration: Goodbye!
Output: Your message was: Hello world!

As you can see, the variable message defined outside the function does not affect the variable with the same name defined within the function, nor vice versa.

To wrap up:

  • Functions provide a local context (“scope”) for variables: Variables defined within functions are not accessible nor visible from the outside, and any variable defined outside a function will not be accessible within the function. In other words: the (local) namespace of the function is separate (disjunct) of the namespace the function is called from.

  • Names of variables defined as arguments in a function declaration (input) or returned via a return statement (output) are part of and restricted to the function namespace.

The second bit is important, as sometimes newbies get confused. Whatever you name the variable you provide as an input argument to a function, it doesn’t matter, as it will be accessed within the function by the name given in the function declaration. In our above example: The input argument is named test in the function declaration, hence its value is accessed by using the variable test within the function, regardless whether you created a variable with whatever name that you provide as an argument to the function call or whether you did not create any variable but called the function directly with a string (as shown above).

2.3.1.3. Classes

Classes are another nice way of organising code, as they combine properties (variables) and behaviour (functions, here called methods) into one entity (a class). This is often seen as a more natural approach to programming, as we are used to (real-world) objects to have both, properties and corresponding behaviour.

Remember the class from our “Hello world!” example?

class Hello:

    def __init__():
        self.greeting = "Hello world!"

    def hello(self):
        print(self.greeting)

Here, the method hello() is local to the class Hello. To call it, we need to first create an instance of the class, i.e. an object, and afterwards call the method of the newly created object, like so:

say = Hello()
say.hello()

This will result in the familiar output of “Hello world!” on the command line. As said before, the property greeting belongs to the class Hello and is thus local to this class. The difference between this property and a variable within a method of a class is that every (non-static) method will have access to this class property. For variables defined within a method and not prefixed by self, the same rules apply that have been described above for functions. Note that it is a convention to declare all the class properties in the constructor, although technically speaking nobody prevents you from creating them in any other method. The reason for declaring all class variables in the constructor is simple: there is only one place to look for them, thus making your code much more readable. For a more thorough introduction, see the primer on object-oriented programming.

2.3.1.4. Modules and packages

Python (as well as other languages) provides an additional way of organising the scope of variables, functions, and classes: namespaces. In its simplest form, you can think of a single file containing code as a namespace, with the name of the file as the name of the namespace. In Python, code is organised in modules that can be grouped into packages. Packages are actually the way to distribute and (re)use code. The Python core consists of a long list of modules for different purposes, and whenever you do something slightly more special than just printing “Hello world!”, you will import some module or at least something (a “symbol”, typically a function or a class) from a module. Those import statements are usually located at the very top of your files. A simple example may look as follows:

import os


print(os.curdir)

This will typically print a single dot “.” and thus is not particularly helpful. To get a bit more information, as our original goal might have been to get the full path we are currently in, you may use:

import os


print(os.path.abspath(os.curdir))

This will provide you with the full path of the file system you are (or, to be more precise, the code you are calling is) currently in.

Sometimes, there are reasons not to import an entire module, but just some selected symbols from it, and this shortens your code lines. However, be aware that you will hide the information from which namespace the symbol you are using eventually came from. Let’s see an example:

from os import curdir
from os.path import abspath


print(abspath(curdir))

Generally, I would advice to use this sparingly, but there are good reasons to selectively import symbols from modules, particularly if these modules are quite large. However, never ever use a statement in form of from <module> import * as you may find sometimes online. This is considered bad practice, and code containing such statements will be regarded as of low quality. The reason is pretty fair: such statement clutters your namespace, and you will not know what symbols you actually have imported and whether there are some name clashes. All this can lead to hard-to-find bugs.

It is worth mentioning, however, that there are conventions for a number of packages how to import them, using a special import syntax including renaming. Prominent examples in the scientific programming and computing world are numpy and matplotlib that would usually be imported like this:

import numpy as np
import matplotlib.pyplot as plt

Afterwards, you can (and need to) call individual symbols from these packages using the abbreviation defined upom import, such as:

np.ndarray()
plt.plot()

Details will be given in later parts of this course dealing explicitly with the Python scientific software stack.

If you have followed through the introduction stretching the “Hello world!” scenario quite a bit, you may even have created a Python package called “helloworld” that you can import, given that you have installed it locally. To be precise, you’re not importing the package but a module from the package:

import helloworld


say = helloworld.Hello()
say.hello()

Here, you can see that the notation for accessing symbols of a module and of a class/object are actually identical in Python (and for good reasons, as internally, everything in Python is an object): To access the class Hello from the imported module helloworld, you write helloworld.Hello(), as you do to access the method hello of the object say that is in turn an object of the class Hello from the helloworld module you have imported.

2.3.2. File organisation

Here again, we have different issues to cope with:

  • Organising naming and location of files (modules, packages)

  • Organising content within a file in a sensible way (e.g., import statements at the top, from simple to complex, from abstract to concrete)

2.3.2.1. Modules

  • group code dealing with similar things together in a module

  • name modules sensibly (PEP 8 conventions)

  • if the project grows really large: think of submodules

  • Beware of cyclic imports

2.3.2.2. Packages

  • for distributing and reusing code

  • name modules sensibly (PEP 8 conventions): same as modules, but underscores discouraged

    • Underscores in package names are regularly used, though, in “subpackages” extending and depending on base packages. Prominent examples are the different extensions to sphinx or flask.

  • briefly mentioned in the Hello world section

  • details in own dedicated packaging section

2.3.2.3. Organisation within a file

e.g.:

  • module docstring at the very top

  • import statements at the top

    • sort them in the correct order (alphabetically, standard library first, packages second, own modules third)

    • use empty lines between the three blocks of import statements (if applicable)

  • from simple to complex, from abstract to concrete

    • true both, for classes and functions as for the methods within classes

    • sometimes referred to as “newspaper style”

  • group similar classes/methods/functions together

    • typical examples: getter/setter; input/output

2.3.3. Data structures: grouping information together

One key aspect of both programming and much more so of software development is to create data structures that are useful representations of the real-world objects and scenarios we work with. Essentially, data structures are a way to organise the relevant information and to make it intellectually manageable – as any relevant problem is usually too complex to have all details in mind at once. As such, data structures are abstractions, and choosing sensible abstractions almost always depends on the local context. Nevertheless, the techniques used to come up with these abstractions are pretty generic.

Todo

Reasons for using data structures – is there a nice example to show here?

A function with very many (positional) arguments – that nobody can remember and whose sequence is not obvious from the function name: use a dict instead (only reason not to use a dict but keyword arguments: optional arguments)

2.3.3.1. Dictionaries: simple data structures

different names in different programming languages: struct (MATLAB), hash (PHP), dict (Python); generic name: associative array

key aspects of dictionaries:

  • key-value pairs

  • keys provide sensible means (for humans) to access information (hence: choose the names of your keys carefully)

    • allows to add a semantic level to code

  • can be (deeply) nested

  • key lookup is a cheap operation

Creating a dictionary in Python:

phone_numbers = {'John': 4221, 'Jack': 2142}

Creating an empty dictionary in Python:

empty_dict = dict()
empty_dict = {}

Things to show:

  • assigning key–value pairs

  • accessing the value using a key

  • getting all keys and all values

  • deleting a key from a dict

  • checking whether a key is (not) present in a dict

Looping through dictionaries:

phone_numbers = {'John': 4221, 'Jack': 2142}
for name, number in phone_numbers.items():
    print(name, number)

2.3.3.2. Classes: properties and behaviour

  • difference to dictionaries: properties and behaviour

  • reasons (and situations where) to prefer classes over dicts and vice versa

    • classes: when you need to have properties and behaviour as a unit, as you operate on the properties and need/want to ensure some consistency

    • dictionaries: when it suffices to have only (structured) properties

    • if in doubt: start with a dictionary

For a more thorough introduction to the topic, see the chapter Object-oriented programming (OOP): a primer.