2.2. Language fundamentals¶
This section tries to present general concepts of programming languages, while using Python as a concrete example. The reader is not expected to have any prior programming experience nor detailed knowledge of the underlying concepts. Of course, only some very basic aspects of Python are presented here. For a different approach much more directed to the average programmer, the reader is referred to the official Python tutorial.
Table of contents
Programming languages are, as the name implies, languages much like human languages with grammar and syntax, words and something similar to sentences. However, there are a few important differences. To name perhaps the most important: programming languages are not ambiguous, but formal languages. They follow strict rules, and not obeying these rules when programming leads at best to immediate errors thrown, at worst to surprising results that go unnoticed for long time.
Important
Computers cannot read our mind, they are mindless machines that simply do what we told them to do. Therefore, it is irrelevant what we thought we would have told them to do, but what we actually told them to do.
What does that mean for programming? Well, every single character in a program counts – except it is inside a comment (and even there, it can be crucial). It is a bit like maths: a single missing minus sign can make all the difference, while it is pretty hard to hunt down sometimes.
While as an absolute minimum, programs should be understandable by a computer to actually do something, they should be readable by human beings as well. Therefore, a working program is not a program that successfully performs a task it was written for, but a program that does so and communicates how it does it with the reader of its source code.
Important
Programming is about communicating – and communicating with the reader of your source code, not the computer. Therefore, invest time and effort to make your code as obvious as possible. Code that doesn’t tell its reader what it does is worse than non-working code – it is actually dangerous, as it is a black box. We will discuss this topic later in more detail.
Having that said, let’s start with the language fundamentals – with a particular focus on Python. Of course, there is much more in Python, and those interested are referred to the official Python tutorial. However, we will go through the most important things here as well.
2.2.1. Syntax¶
Remember what we have said above: programming languages are formal languages, hence every character counts. One of the beauties of Python is its rather simple syntax, leading to readable code and making it easy to learn.
Let’s go back to our first piece of Python code, the line printing “Hello world!” on the command line:
print("Hello world!")
Here, you see already lots of syntax. We have the name of the function, “print”, brackets surrounding its argument, and the quotation marks denoting the argument as text (i.e., a string). Furthermore, Python does not have any sign specifying the end of a line, such as a semicolon in many other programming languages.
2.2.1.1. Case sensitivity, spaces, line endings¶
Note that Python is case-sensitive. Hence, “Print” is a different function from “print”. If you have a standard Python installation (or a virtual environment without any additional packages installed), using the function “Print” with capital “P” would result in an error message:
>>> Print("Hello world!")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'Print' is not defined
But what about spaces between the function name and the brackets, or between the brackets and the quotation marks? And how about (accidentially) adding a semicolon at the end? Best to try it out by yourself.
Exercise
Try adding spaces between the function name and the brackets or the brackets and the quotation marks in the line print("Hello world!")
and see what happens. Furthermore, try adding a semicolon at the end of the line.
Note, however, that code the Python interpreter doesn’t complain about is not equal to good or readable code. The Python community is quite specific regarding the formatting of Python code and what counts as “pythonic”. Interested readers are referred to PEP 8 for a first impression. We will deal with those things later in more detail.
2.2.1.3. Line breaks and empty lines¶
The Python community imposes rather strict rules on source code formatting, as laid out in PEP 8. Basically, lines should be no longer than 79 characters (and docstrings no longer than 72 characters, according to PEP 257). On the other hand, we would like our code to be as readable as possible and the names of variables and functions as expressive as possible. Taken together, this makes it sometimes necessary to break lines.
Usually, line breaks can be achieved by simply indenting the following line(s). If that is not possible, a backslash \
needs to be used. For long statements that are joined with operators such as + `` or ``-
, it is preferrable to have line breaks before the operator, with each operator at the start of the new line. The interested reader will find a more detailed discussion in PEP 8.
2.2.1.4. Indentation¶
Indentation is not optional in Python, as the language relies on the indentation level rather than brackets for defining blocks. The usual indentation level is four spaces, according to PEP 8.
Indenting a single line on the Python command line shows how Python reacts to this unexpected indent. Note the two spaces between the prompt >>>
and the print()
function:
>>> print("Hello world!");
File "<stdin>", line 1
print("Hello world!");
^
IndentationError: unexpected indent
This is a nice example of the way the Python interpreter tells the user about errors. These error messages consist always of three parts:
The line the error occurred
The statement that led to the error
The actual error message
In our case, as we’ve used the Python command line, the line is always “line 1”. The error messages are usually quite telling. Here, “IndentationError: unexpected indent” should give the user a clear hint what went wrong.
Important
Always read the error messages of the Python interpreter carefully and try to understand what they mean and want to tell you. In case you have no clue, simply use the web search engine of your choice and see what others have to say.
Indentation will become much more important when we deal with loops, conditionals, functions and alike later on.
2.2.2. Variables¶
As we have seen in our first example, we can make Python print “Hello world!” on the command line. Similarly, we can use Python as a pocket calculator:
>>> 2 + 4 - 3
3
The Python interpreter even knows about general operator precedence, i.e., the mathematically correct treatment of the following expression:
>>> 2 + 4 * 3
14
Of course, we can always use brackets, as we would do in Mathematics:
>>> (2 + 4) * 3
18
And generally, it is always a good idea to explicitly use brackets to group statements and not to rely on the operator precedence, as this is different for different programming languages, and the reader of the code needs to assume things that may not be fulfilled. More on operators and operator precedence later.
So far, we have only used explicit strings (“Hello world!”) and numbers. However, programming usually aims at generalising a given problem and providing a solution for this general problem. This is similar to mathematics, where symbolic mathematics is key to providing solutions for a set of similar problems. To do so, in mathematics we introduce variables, usually something like x, and create functions depending on these variables, such as f(x).
The same we will do in programming in general, with one notable exception: While in mathematics, variables are usually symbols consisting of single characters, sometimes with added subscripts and superscripts for clarity, in source code of programs variables should have concise and telling names.
Important
Source code should be as readable and explicit as possible. Therefore, concise and telling names of variables, functions, and alike are of outstanding importance. Finding good names requires a thorough understanding of the problem at hand and experience. Hence, the earlier you start caring, the faster you learn finding good names.
As a rule of thumb, source code should never contain explicit strings or numbers, but only variables that you operate on. Of course, at one point, the actual (initial) content needs to be assigned to the variables, but that should usually happen at one central point for your programs, or via direct user input.
As we have seen already, we can use strings or numbers in Python. And of course, we can assign either strings or numbers to a python variable, like that:
>>> message = "Hello world!"
>>> print(message)
Hello world!
or like that:
>>> h = 6.62607004e-34
>>> pi = 3.14159265
>>> h_bar = h / (2 * pi)
>>> print(h_bar)
1.0545718013441366e-34
For the latter piece of code, we can discuss again a few things. First of all, we realise that only by using print()
we get the value assigned to the variable h_bar
displayed. Furthermore, the naming of the variables works well in a reasonable physicsy context, where people are well aware of “h” being the usual abbreviation of Planck’s constant, and “h_bar” being the reduced Planck constant used quite often in quantum mechanics. While depending on the context, “h” may or may not be a good name for a variable, probably “pi” is such well-known that naming it differently would be more disturbing than helpful.
Actually, taking together all the examples above we have seen already three different types of variables: strings, integer numbers and (ir)rational numbers, usually called “floating point number” or short “float” in programming languages. This naturally leads to the concept of “types” for variables in programming languages.
2.2.3. Types¶
As we have seen already, two fundamental data types in most programming languages are strings and numbers, with numbers being already a superset of different types. Further data types present in many programming languages are lists and more complex data types such as dictionaries. Eventually, extending the programming language by creating own types is usually the task of object-oriented programming, where each class defines a type of its own. We will come back to that later.
Note
You could think of types as an analogon to Platonic ideas. Types are prototypes of things defining certain characteristics.
But why are types such an important concept of programming languages? Variables of different types behave differently, but in a predefined way. A simple example: numbers are accessible to arithmetic manipulations, whereas the division of two strings is usually not defined. Hence, the type of a variable conveys a (sometimes even complicated) concept of what operations are possible and how to process the content of a given variable.
For now, we will only deal with strings, numeric data types, and lists. All other data types will be introduced later on when we need them.
2.2.3.1. Strings¶
Strings can be thought of a list of individual characters, much like words and sentences. A notable difference in Python (and as well in Python 3 compared to Python 2) with respect to many other programming languages: Strings in Python can contain arbitrary characters of all alphabets and languages, not only the very restricted set of 127 ASCII characters traditionally used. Why is this noteworthy? Because it makes handling of strings containing words in non-English languages (and hence characters not covered by the traditional ASCII character table) in Python very easy.
Note
For those more interested in the technical details of Python strings: Python uses unicode encoding for strings. Hence, every character contained in the unicode character sets can be used in Python strings. The user of Python strings needs not to care of the actual encoding (and hence the length of each individual character representation in bytes).
As we have seen already with our very first example in Python, strings are characterised by surrounding single or double quotes:
>>> "Hello world!" # This is a string
'Hello world!'
>>> 'Hello world!' # Python does not distinguish between single or double quotes
'Hello world!'
Sometimes you would like to add two or more strings together. This can be done using the +
operator:
>>> "Hello " + "world!"
'Hello world!'
Or even more explicit, to highlight that there is a space between the two words:
>>> "Hello" + " " + "world!"
'Hello world!'
While subtracting two strings is not an allowed operation, multiplication does work:
>>> 2 * "bye"
'byebye'
You may not need this feature too often, but it can be quite useful sometimes.
2.2.3.2. Numeric types: integer, float¶
As mentioned already, there are two quite different types of numbers in most programming languages: integers and what is called “floating point numbers” or short “floats”. Why this distinction? For two reasons: Integers are exact representations with infinite precision, while floats are (nearly) always limited in their precision. This has a mathematical background. While rational numbers can be described as fractions of integer numbers, irrational numbers cannot. Hence, with the generally limited resources of a computer (both in terms of memory and computing power), those numbers can never be infinitely precise. Therefore, we need a way to represent those numbers in the memory of computers: floats.
Note
It was not until 1985 that an international standard – IEEE 754 (ANSI/IEEE Std 754-1985) – was agreed upon for representing floating point numbers in computers. Scientists should definitely read the article by Goldberg [Goldberg, 1991]. More details on the IEEE norm can be found on Wikipedia.
But how to distinguish between integers and floats in Python? Integers are given as pure numbers, while each number containing a .
(dot) as decimal separator is treated as float. Look at the different answers from the Python interpreter:
>>> 42 # an integer
42
>>> 42. # a float
42.0
One special case that should be mentioned here: Dividing two integer numbers always resulsts in a float in Python:
>>> 4 / 2
2.0
While the result would be perfectly representable by an integer, the actual result is a float. There are some good reasons for this behaviour of Python, but we need to be aware that here, an implicit type conversion from two integers to a float takes place.
Of course, there is ways to represent complex numbers as well in Python. However, this is best done using NumPy and covered later.
2.2.3.3. Lists¶
The next fundamental data type is a list. A list in Python is very versatile, as its elements can be of different data types. Usually, however, lists will contain elements of the same data type. A list is defined by square brackets, the individual elements get separated by commas. The first six elements of the Fibonacci series in Python:
>>> [0, 1, 1, 2, 3, 5]
[0, 1, 1, 2, 3, 5]
Similarly, we can define lists of strings:
>>> ['Lorem', 'ipsum', 'dolor', 'sit', 'amet']
['Lorem', 'ipsum', 'dolor', 'sit', 'amet']
In both cases, the Python interpreter simply returns the list on the command line. To actually work with lists, it is therefore much more useful to assign the list as a value to a variable:
>>> fibonacci_numbers = [0, 1, 1, 2, 3, 5]
>>> lorem_ipsum = ['Lorem', 'ipsum', 'dolor', 'sit', 'amet']
This allows all sorts of manipulations of these lists. First of all, we can access elements of the list:
>>> lorem_ipsum[1]
'ipsum'
This may look odd in the first place, as one would expect the statement to return the first element of the list, not the second. However, Python, as most other programming languages, starts counting with zero. Similarly, we could ask Python to return a range of values (as a new list):
>>> lorem_ipsum[0:2]
['Lorem', 'ipsum']
Again, this seems odd. Didn’t we ask for more? In fact, we didn’t. The range includes the start and excludes the end. Another way to look at this is to place the indices not at the positions of the actual elements of a list but at the separators. This is best shown for a list of integers, such as the variable fibonacci_numbers
we have defined above:
+---+---+---+---+---+---+
| 0 | 1 | 1 | 2 | 3 | 5 |
+---+---+---+---+---+---+
0 1 2 3 4 5 6
-6 -5 -4 -3 -2 -1
This way, we can immediately understand the following results of the Python interpreter:
>>> fibonacci_numbers[0:1]
0
>>> fibonacci_numbers[2:5]
[1, 2, 3]
We can even do more advanced things, such as getting all elements starting with a certain index:
>> fibonacci_numbers[3:]
[2, 3, 5]
or get the second-last element of the list:
>> fibonacci_numbers[-2]
3
or even get a range of the last three elements from the end:
>> fibonacci_numbers[-3:]
[2, 3, 5]
To a certain extent, strings behave like lists. Hence, all the indexing shown above does work for strings as well.
Exercise
Assign the string “Python” to the variable word
and try out indexing this variable. Use all the variants shown above, such as getting individual characters, ranges of characters, and indexing from both ends of the string.
Now that we have dealt with lists, the next natural step is to ask ourselves: How to perform the same operation on each individual element of a list? There must be a more elegant way than calling the operation separately for each list element. And by the way, isn’t programming all about automating (tedious) tasks?
2.2.4. Loops¶
Loops are one of the two major control structures, besides conditionals that we will deal later with. As mentioned above, loops are an elegant way to perform one and the same action on every element of a list. This is the typical field of application of “for” loops. The other type of loops in Python are “while” loops. We will look into both now.
2.2.4.1. For loops¶
Suppose you have a list and want to perform a certain action for each of the elements in this list. This is what for loops are excellent for:
>>> lorem_ipsum = ['Lorem', 'ipsum', 'dolor', 'sit', 'amet']
>>> for word in lorem_ipsum:
... print(word)
...
Lorem
ipsum
dolor
sit
amet
Actually, these lines deserve a few comments. Generally, it is quite obvious that we created a list of strings and asked Python to print each of the elements. Note that the general phrasing for element in list
is a very natural way to express this and part of the beauty of Python. The colon :
tells Python that we are done with defining the condition of our loop. Hence, if we hit “return”, the command line tells us that we need to continue with our input (the ...
at the beginning of the line). Now it is very important to properly indent what is called the “body” of the loop. Indentation should always be done using spaces and should be four spaces wide.
In our simple case, the body of the loop consists only of one statement, i.e. the call to the print()
function to print out the element.
To tell Python that we’re done with the loop, we need to add an empty line, at least in the interactive mode on the command line. In scripts or functions or alike, we could simply continue with a statement that is indented the same way as the for
starting the loop.
The result of this loop is again quite obvious. We get each element printed on its individual line, as would have happend if we would have called print()
for each element of the list manually.
2.2.4.2. While loops¶
A slightly different situation where loops come in quite handy: Suppose you want to perform a task as long as a certain condition is met. This is the realm of while loops.
To give you an example, suppose we would want to compute Fibonacci numbers and print them on the command line. For those who don’t remember what Fibonacci numbers are: The first two numbers are 0 and 1, and each following element is just the sum of the two preceding ones. This leads naturally to a way to describe the rule for creating these numbers. For all Fibonacci numbers < 10 a possible solution would be:
>>> a, b = 0, 1
>>> while a < 10:
... print(a)
... a, b = b, a+b
...
0
1
1
2
3
5
8
What have we done here? First, we have defined the first two elements of the Fibonacci sequence. Next, we have entered a while loop that runs as long as the first number a
holds the condition a < 10
. The body of the while loop consists of two statements. First the variable a
gets printed, and second the next Fibonacci number is calculated. This is actually an example of multiple assignments that will be covered in more detail hereafter.
Of course there is more to loops, such as nesting and exiting given certain conditions. However, these topics will be postponed to later, when we will encounter situations where we need those things.
2.2.5. Assignments¶
Something we have used already in the examples above is assigning values to variables. Nevertheless, sometimes novices struggle with the different meaning of the equal sign =
in most programming languages as an assignment operator in contrast to its meaning in mathematics. As we will discuss in more detail later on, asking for equality is done using two equal signs, a == b
, while assigning a value to a variable is done using a single equal sign, a = 42
.
2.2.5.1. Simple assignments¶
There is nothing special with assignments, as we have used them already. To assign the value “42” to the variable answer
we would simply write:
answer = 42
In this simplest form, an assignment assigns the value to the right of the equal sign to the variable to the left of the equal sign.
Note that you can use the variable itself on both sides of the equal sign. Usually, to increment the value of a variable, you find code lines like this one:
k = k + 1
What feels odd in the first place and would be mathematically wrong is just a simple way of adding “1” the original value of the variable k
, implicitly assuming in this case k
to contain a numeric value.
2.2.5.2. Multiple assignments¶
Additionally to the simple assignments we have seen so far, Python (along with other programming languages) allows for more complex assigning of values to variables. To assign the same value to two variables, you may write:
a = b = 42
In this case, both variables, a
and b
, would contain the numeric value 42, as we can simply check by printing both variables:
>>> print(a)
42
>>> print(b)
42
Similarly, you can assign a list of values on the right side of the equal sign to a list of variables on the left side of the equal sign, provided both lists have the same length:
a, b = 42, 35
In this case, a
has the numeric value 42, and b
the numeric value 35. Again, we can convince ourselves by printing the value of both variables on the command line:
>>> print(a)
42
>>> print(b)
35
2.2.6. Operators¶
Another fundamental aspect of programming languages are operators. We have come across operators already, in terms of arithmetic operators such as “+” and “-”, as these are the most common ones. Basically, operators are shorthand notations for common operations, usually consisting of one or two characters.
2.2.6.1. Types of operators¶
Different types of operators can be distingished, and the most prominent types are arithmetic, relational, and Boolean or logical operators.
Arithmetic operators are used mainly to do maths. A list of available arithmetic operators is given in Table 2.1. The only surprise or difference with respect to other programming languages may be the “power” operator. While some other languages use the hat ^
, in Python it is the double star, **
.
Operator |
Meaning |
---|---|
|
plus |
|
minus |
|
times |
|
divide |
|
power |
Generally, arithmetic operators behave the same way in Python as we are used to from maths, i.e., multiplication and division first, then addition and subtraction. Nevertheless, see the comments below on Operator precedence.
Next are relational operators mainly used to compare expressions. A list of available relational operators is given in Table 2.2. Again, not much of a surprise here, as we know most of them from mathematics. Only the notation of “not equal” deserves a comment, as this is again different for different programming languages. In Python, negation is done using the exclamation mark !
. To distinguish the assignment =
from the relational operator asking for equality, ==
, in the latter case two equal signs are used.
Operator |
Meaning |
---|---|
|
less than |
|
greater than |
|
less than or equal |
|
greater than or equal |
|
equal |
|
not equal |
|
object identity |
|
negated object identity |
The last two operators, is
and is not
deserve as special comment. Generally, equality and identity is not the same. Equality points towards the same value of a variable, while identity asks for the things compared to be identical – i.e., to point to the same address in memory. We will not go into detail here, but bear in mind that equality and identity are two fundamentally different concepts.
The last type of operators we want to deal here with are logical or Boolean operators. They are mainly used to concatenate conditions, something we will discuss in the next section. They bear their name from the British mathematician George Boole who developed the algebra named after him. The three operators important for us are given in Table 2.3.
Operator |
Meaning |
---|---|
|
true if both are true |
|
true if one is true |
|
negation |
What are they used for? Conditionals are one of the very fundamental control structures of programs besides loops, and testing for rather complex conditions consisting of several parts joint by either and
or or
is quite frequent.
2.2.6.2. Operator precedence¶
If we have several operators, in what order do they apply? Hence: What is their operator precedence? While in mathematics, there are strict rules (and additionally some conventions for typesetting in different disciplines), even there it is always a good idea to use brackets to group statements to make the operator precedence clear.
As stated already, Python follows basic operator precedence for arithmetic operators, as known from mathematics, in particular “multiplication and division first, then addition and subtraction”. However, as a general advice, always make operator precedence explicit by grouping complex statements by brackets. Different programming languages behave differently, and readability of your source code is king.
Important
Never rely on the (implicit) operator precedence of a programming language, but make operator precedence explicit by grouping the statements with brackets. This greatly facilitates reading your code, as different programming languages are different. Furthermore, the less you as the reader have to think about those details, the more you can focus on the more important aspects of the code.
2.2.7. Conditionals¶
The second fundamental control structure of most programming languages besides loops are conditionals. In the simplest case, they take the form “if A, do B”. Of course, they can become more complex, with an “else” clause, and the statement “A” can be a complex condition consisting of several statements joint with Boolean operators. Furthermore, a statement “A” often involves relational operators. As with loops, Python uses indentation to mark the body of a conditional, rather than brackets or else. Therefore, the simplest form of a conditional in Python may look like this:
1if some_condition:
2 # do something
Often, we will want to explicitly deal with two situations, therefore including an “else” clause:
1if some_condition:
2 # do something
3else:
4 # do something else
Of course, conditionals can be nested. As every nesting leads to further indentation and therefore shortens the lines, you can use elif
as shorthand notation for else if
, like so:
1if some_condition:
2 # do something
3elif some_other_condition:
4 # do something else
5else:
6 # do something third
If you compare this to the nested version not using elif
, you will immediately see why using elif
is usually preferrable:
1if some_condition:
2 # do something
3else:
4 if some_other_condition:
5 # do something else
6 else:
7 # do something third
With this we are at the end of our introduction of language fundamentals, and by now you should have seen the most important basic building blocks of programs. While focussing on Python, the principles laid out here are much more generally applicable.
Next we will focus on how to organise your code, as readability counts – both for you and those who will eventually inherit your code and would want to understand and potentially extend it.
2.2.1.2. Comments¶
What else is there in terms of syntax that is important for now? Comments. Often you will want to comment on a line of code, or simply prevent a line of code from being executed. Later on, you will be writing proper documentation for your functions, classes and alike. All this should go straight into your source code.
There are two types of comments, inline comments and block comments. Both have different syntax. A simple inline comment starts with a
#
sign. Everything that follows this sign is (usually) ignored by the Python interpreter.A somewhat silly example of inline comments is shown in the following code excerpt:
What is wrong with these comments? Basically everything. The first comment doesn’t add anything useful to the code and should therefore be omitted, and the second one simply states the obvious, actually repeating the code in human words. While shown here as examples for inline comments, never ever do this in production code!
Note
Inline comments in production code are usually a warning sign. Either the programmer was lazy and didn’t remove lines no longer in use. Or the programmer was not able to express themselves within the code in sufficient clarity and felt the need to add an explaining comment. In any case, inline comments should be kept to an absolute minimum.
Inline comments can be quite useful for systematically debugging code. Furthermore, they can come in quite handy for structuring code into blocks before splitting it into different functions or else. However, inline comments should not appear in production code. Of course, this imposes a certain level of responsibility on the programmer.
Block comments usually start and end with
"""
. While the ending"""
should always be on a separate line, the block comment may or may not start straight after the opening"""
. For details of how to structure block comments used as docstrings – by far their most common use case –, the interested reader is referred to PEP 257.To give you an example how such a block comment could look like, the following is an excerpt of a real-world piece of code:
The actual task being performed by this effective one-line function is pretty simple: checking whether all elements of a list are equal. This is basically what the function name says. However, the way it does it is pretty sophisticated. Nevertheless, all that is important to a user is what the function does (that’s what its name says) and how to call it (that’s what the docstring provides the user with).
For those interested in more details of this particular docstring shown here: It is formatted according to the “NumPy style” and used in conjunction with Sphinx to automatically create a human-readable API documentation.