As a preview of what's to come, here is how you would check if a number is in a range in any other language:
pythonif 3 <= num and num < 10: pass
And here's how you do it in Python:
pythonif 3 <= num < 10: pass
Tuples are immutable lists. Once they are created, they can't be changed. They are used everywhere in Python.
Python REPL>>> exam = ('Final Exam', 91) >>> exam[0] 'Final Exam' >>> exam[1] 91 # Try to change our grade >>> exam[1] = 100 exception: 'tuple' object does not support item assignment # A tuple with one element >>> singleton = ('one element',)
Tuples can be returned from functions.
pythondef compute_statistics(nums): return average(nums), max(nums), stddev(nums)
Tuples can also be unpacked into separate variables.
Python REPL>>> name, grade = exam >>> name 'Final Exam' >>> grade 91
You can also unpack multiple layers of tuples (or any iterable).
Python REPL>>> player = ('Shaquille', 'O\'Neal', (7, 1)) >>> first_name, last_name, (feet, inches) = player >>> first_name 'Shaquille' >>> last_name "O'Neal" >>> feet 7 >>> inches 1
Tuples are often convenient ways of quickly packaging related values together.
However, indexing them by number (ex. exam[0]) may be confusing. Instead, we
can name a tuple's elements by using namedtuple.
pythonfrom collections import namedtuple Height = namedtuple('Height', 'feet inches') Player = namedtuple('Player', 'first_name last_name height')
Python REPL>>> player = Player('Shaquille', 'O\'Neal', Height(7, 1)) >>> player.first_name 'Shaquille' >>> player.height.feet 7 # namedtuples behave like regular tuples (so they can be unpacked) >>> first_name, last_name, height = player >>> last_name "O'Neal" >>> height.inches 1
Python has many commonly used string manipulation functions.
Python REPL>>> names = 'Foo, Bar, Baz' >>> names.split(', ') ['Foo', 'Bar', 'Baz'] >>> ' and '.join(names.split(', ')) 'Foo and Bar and Baz' >>> time_line = 'Time: 23.4' >>> time_line.startswith('Time: ') True # Combining with unpacking provides a powerful way to parse strings >>> _, seconds = time_line.split(': ') >>> seconds = float(seconds) >>> seconds 23.4
Python 3.6 introduced f-strings, a convenient way to interpolate variables into strings.
Python REPL>>> name = 'Forrest Gump' >>> birth_year = 1944 >>> skills = ['ping pong', 'running'] >>> f'{name} was born in {birth_year} and is good at: {skills}' "Forrest Gump was born in 1944 and is good at: ['ping pong', 'running']"
Arguments can be given default values.
pythondef discount_price(price, discount=0.5): return price * discount
Python REPL>>> discount_price(10) 5.0 >>> discount_price(10, 0.75) 7.5
Just be careful! Default arguments are only evaluated once (when the function is defined), not at every function call.
pythondef record_discount(discount, previous_discounts=[]): previous_discounts.append(discount) return previous_discounts
Python REPL>>> record_discount(0.5) [0.5] >>> record_discount(0.75) [0.5, 0.75]
The better approach uses None instead:
pythondef record_discount(discount, previous_discounts=None): if previous_discounts is None: previous_discounts = [] previous_discounts.append(discount) return previous_discounts
Arguments can and should be passed by name to reduce ambiguity. Compare the following:
python# What do these parameters mean? search_tweets('@SwiftOnSecurity', 20, True, False) # Much more clear... search_tweets('@SwiftOnSecurity', retweets=False, popular=True, limit=20)
pythondef search_tweets(query, limit, popular, retweets=True): pass
Using names makes your code more clear and also doesn't require you to remember parameter order!
You can even require that arguments be passed by name by using *.
pythondef search_tweets(query, *, limit, popular, retweets=True): pass search_tweets('@SwiftOnSecurity', 20, True, False)
outputsearch_tweets() takes 1 positional argument but 4 were given
sum FunctionLet's use our newfound knowledge of arguments to make a function that sums its arguments.
pythonsum(1, 2, 3) # 6 sum(0) # 0
pythondef sum(start, *nums): total = start for num in nums: total += num return total
This new * syntax allows us to collect any extra arguments passed to the
function. We can also use it to expand iterables as if they were passed as
individual arguments.
pythonnums = [1, 2, 3] sum(*nums) # is the same as... sum(1, 2, 3)
This syntax even works for tuple unpacking!
Python REPL>>> first, *rest = [1, 2, 3] >>> first 1 >>> rest [2, 3]
In fact, Python already has a built-in sum function that is much more
powerful.
Python REPL>>> sum([1, 2, 3]) 6 # The second parameter is the starting value to add values to >>> sum([[1, 2, 3], [4, 5, 6]], []) [1, 2, 3, 4, 5, 6]
Python provides many more functional tools:
Python REPL>>> min([10, 2, 22, 15]) 2 >>> max([10, 2, 22, 15]) 22 >>> sorted([10, 2, 22, 15]) [2, 10, 15, 22] >>> sorted([10, 2, 22, 15], reverse=True) [22, 15, 10, 2]
All of these functional tools allow you to specify a key which tells the
function what value to compare.
Python REPL>>> Graded = namedtuple('Graded', 'name grade') >>> grades = [Graded('Midterm', 80), Graded('Final', 90), Graded('Project', 0)] >>> min(grades, key=lambda x: x.grade) Graded(name='Project', grade=0) >>> max(grades, key=lambda x: x.grade) Graded(name='Final', grade=90) >>> sorted(grades, key=lambda x: x.grade, reverse=True) [Graded(name='Final', grade=90), Graded(name='Midterm', grade=80), Graded(name='Project', grade=0)]
lambdas are convenient ways of expressing one line functions. The above one
is equivalent to:
pythondef get_grade(x): return x.grade
If you were to port iteration over a list from another language to Python, you may arrive at:
pythonfor i in range(len(nums)): print(nums[i])
But we don't like explicit indices, because they are error prone. Instead, we can just iterate over the list itself:
pythonfor num in nums: print(num)
If we need the index, we use enumerate:
pythonfor i, num in enumerate(nums): print(i, num)
What if we want to consider values from two lists at once? You may be inclined to write:
pythonfor i, name in enumerate(students): print(f'{name} got a {grades[i]} in the class')
Instead, we use zip to pair values from iterables together:
pythonfor name, grade in zip(students, grades): print(f'{name} got a {grade} in the class')
Python is built around the idea of iterables and provides many standard library and language features for interacting with and creating them.
We've already seen that list is an iterable and dict is an iterable, which
also provides other ways of iteration (.items() and .values()). Most
collections in Python are iterables. Even files are iterables!
We know how to iterate through collections, but let's see how we can build our own iterables.
Generators are a powerful tool for building efficient iterables. Consider
replicating Python's range():
pythondef our_range(stop): num = 0 nums = [] while num < stop: print(f'adding {num}') nums.append(num) num += 1 return nums
What's the problem with this approach?
pythonfor num in our_range(3): print(f'received {num}')
outputadding 0 adding 1 adding 2 received 0 received 1 received 2
Generators use the yield keyword to stop their execution and hand a value
to the iterator's consumer.
pythondef our_range(stop): num = 0 while num < stop: print(f'yielding {num}') yield num num += 1
Now, the numbers are generated on the fly:
pythonfor num in our_range(3): print(f'received {num}')
outputyielding 0 received 0 yielding 1 received 1 yielding 2 received 2
This syntax can be expanded to create terse expressions of lists, dicts,
and even generators.
Python REPL>>> [x * 2 for x in range(4)] [0, 2, 4, 6] >>> sum(x * 2 for x in range(4)) 12 >>> {x: x * 2 for x in range(4)} {0: 0, 1: 2, 2: 4, 3: 6}
In many ways, object orientation in Python looks exactly like other languages you may be familiar with like Java. There are a few subtle differences, though.
pythonclass Course: def __init__(self, number, name, instructor): self.number = number self.name = name self.instructor = instructor self._student_grades = {} def add_student_grade(self, student, grade): self._student_grades[student] = grade def curve(self, amount): for student, grade in self._student_grades.items(): self.add_student_grade(student, grade + amount)
Notably, Python uses a lot of __x__ methods (we call them dunder methods).
Above, we see that the constructor is called __init__. Additionally, all
methods take an explicit self parameter, which is similar to this in C++ or
Java.
Python has no visibility controls and instead prefers the convention of
prefixing private/protected properties with an _.
Python REPL>>> csf = Course('601.229', 'CSF', 'Peter') >>> csf.number '601.229' >>> csf.add_student_grade('Johnny Hopkins', 85)
Python supports traditional Java-like inheritance (as well as multiple inheritance, which we won't delve into).
pythonclass PassFailCourse(Course): def add_student_grade(self, student, *, passed): super().add_student_grade(student, 100 if passed else 0)
We use super() to access methods from ancestor classes. Other methods (in
this example, __init__ and curve) are inherited as we would expect.
In Python methods on a class are called class methods, and we denote them by using a decorator.
pythonclass Course: # ... @classmethod def retrieve_from_db(cls, number): # Lookup in your database ... return cls(number, name, instructor) csf = Course.retrieve_from_db('601.229')
Decorators are a powerful Python feature that allow you to augment function behavior. They are functions that are passed another function and return a new function with modified behavior.
pythondef cache(func): func._cache = {} def wrapped(url): if url not in func._cache: func._cache[url] = func(url) return func._cache[url] return wrapped
We then can use cache as a decorator:
python@cache def slow_web_request(url): print('requesting page') # ... return f'page: {url}'
This is equivalent to:
pythonslow_web_request = cache(slow_web_request)
Python REPL>>> slow_web_request('http://www.google.com') requesting page 'page: http://www.google.com' >>> slow_web_request('http://www.apple.com') requesting page 'page: http://www.apple.com' >>> slow_web_request('http://www.apple.com') 'page: http://www.apple.com'
pythonfrom functools import wraps def cache(func): func._cache = {} @wraps(func) def wrapped(*args, **kwargs): frozen_args = (args, tuple(sorted(kwargs.items()))) if frozen_args not in func._cache: func._cache[frozen_args] = func(*args, **kwargs) return func._cache[frozen_args] return wrapped
There are a ton of dunder methods you can define on your classes.
pythonclass Person: def __init__(self, first_name, last_name, age): self.first_name = first_name self.last_name = last_name self.age = age def __hash__(self): return self.first_name, self.last_name def __eq__(self, other): if not isinstance(other, Person): return False return self.first_name == other.first_name \ and self.last_name == other.last_name \ and self.age == other.age def __lt__(self, other): return self.age < other.age def __str__(self): return f'{self.last_name}, {self.first_name} is {self.age} years old'
Defining all of these dunder methods can be tedious, so Python 3.7 introduced dataclasses, which give a much more terse way to express classes.
pythonfrom dataclasses import dataclass, field @dataclass class Person: first_name: str = field(compare=False) last_name: str = field(compare=False) age: int = field(hash=False)
EnumsEnums allow you to replace magic constants with efficient, named placeholders.
pythonfrom enum import Enum, auto class Color(Enum): RED = auto() BLUE = auto() PURPLE = auto()
Python REPL>>> Color.RED <Color.RED: 1> >>> Color.RED.name 'RED' >>> Color.RED == Color.BLUE False >>> list(Color) [<Color.RED: 1>, <Color.BLUE: 2>, <Color.PURPLE: 3>]
Python prefers a "do first, ask for forgiveness later" approach to exceptional cases at runtime. This has the benefit of being safe from data races.
IndexError and KeyError are common when indexing into a sequence or
accessing a key in a dict-like data structure.
Python REPL>>> cities = ['Boston', 'New York City', 'Los Angeles'] >>> cities[4] exception: list index out of range >>> capitals = {'Maryland': 'Annapolis', 'Connecticut': 'Hartford'} >>> capitals['Hawaii'] exception: 'Hawaii'
We handle exceptions in Python using a try/except (and maybe an
else/finally).
ValueError is commonly used when an argument is in an unexpected format.
pythondef get_birth_year(): while True: try: birth_year = int(input('Enter birth year: ')) except ValueError: pass else: if 1900 <= birth_year <= 2018: return birth_year
The Python standard library has a plethora of useful data structures.
Many common tasks in Python can be written as one-liners by combining these data structures in clever ways.
setsPython REPL>>> unique_nums = set([5, 8, 4, 5, 8]) >>> unique_nums {8, 4, 5} >>> unique_nums.add(6) >>> 6 in unique_nums True >>> len(unique_nums) 4 >>> unique_nums - set([5, 4]) {8, 6} >>> unique_nums | set([8, 4]) {4, 5, 6, 8}
Sets (like most other data structures) can even be created by providing an iterable (like a generator):
Python REPL>>> set(x * 2 for x in range(8)) {0, 2, 4, 6, 8, 10, 12, 14}
defaultdictBehaves like a dict, but returns a default value when a key is not found.
pythonfrom collections import defaultdict names = ['Michael', 'Dwight', 'Jim', 'Pam', 'Meredith'] names_by_first_letter = defaultdict(list) for name in names: names_by_first_letter[name[0]].append(name) print(names_by_first_letter)
outputdefaultdict(<class 'list'>, {'M': ['Michael', 'Meredith'], 'D': ['Dwight'], 'J': ['Jim'], 'P': ['Pam']})
Without it, we'd have to write something like:
pythonnames_by_first_letter = {} for name in names: try: letter_names = names_by_first_letter[name[0]] except KeyError: letter_names = names_by_first_letter[name[0]] = [] finally: letter_names.append(name)
defaultdict: Another defaultdict exampleWhen implementing Dijkstra's algorithm, a defaultdict is a convenient way to
store the distances from the source node.
pythonimport sys distance_from_source = defaultdict(lambda: sys.maxsize) distance_from_source[source] = 0
defaultdict: When to not use defaultdictIf we just want to return a default value (and not also store that default
value), we can use a regular dict:
Python REPL>>> nicknames = {'Robert': 'Bob', 'Kate': 'Katherine'} >>> nicknames.get('Brenda', 'Brenda') 'Brenda' >>> nicknames {'Robert': 'Bob', 'Kate': 'Katherine'}
CounterTo keep track of counts of just about anything (including other iterables) use
Counter.
Python REPL>>> from collections import Counter >>> spellings = Counter(['color', 'colour', 'color', 'color', 'colour']) >>> spellings.most_common() [('color', 3), ('colour', 2)] >>> spellings.update(['color', 'color']) >>> spellings.most_common(1) [('color', 5)]
dequelist has constant-time append and pop (from the end). If you need constant
time operations at both ends (a queue, instead of a stack), use a deque.
Python REPL>>> from collections import deque >>> checkout = deque(['oranges', 'blueberries']) >>> checkout.append('strawberries') >>> checkout deque(['oranges', 'blueberries', 'strawberries']) >>> checkout.appendleft('eggs') >>> checkout deque(['eggs', 'oranges', 'blueberries', 'strawberries']) >>> checkout.popleft() 'eggs'
File handling (and resource management) in Python is a breeze thanks to
context managers. These provide automatic cleanup of resources by using a
with statement.
python# By default, files are opened for reading with open('essay.txt') as f: essay = f.read()
This is almost equivalent to the following, but with one important distinction: exception handling!
pythonf = open('essay.txt') essay = f.read() # What if an exception happens here? f.close()
python# By default, files are opened for reading with open('essay.txt') as f: essay = f.read() translated = translate(essay, from_language='english', to_language='french') # 'w' truncates the file, then opens it for writing with open('essay_translated.txt', 'w') as f: print(translated, file=f)
Reading a file line-by-line is also straightforward.
python# Each line in urls.txt contains a url with open('urls.txt') as f: for url in f: download_file(url)
Python makes reading CSVs super simple.
pythonimport csv with open('games.csv') as f: for home_team, away_team, score in csv.reader(f): print(f'{away_team} @ {home_team}: {score}')
And handling compressed data is just requires a slight modification.
pythonimport csv import gzip with gzip.open('games.csv.gz') as f: for home_team, away_team, score in csv.reader(f): print(f'{away_team} @ {home_team}: {score}')
Python provides a serialization method that works out of the box for every standard library class and almost every class you'll create:
pythonimport pickle with open('dump.pkl', 'wb') as f: pickle.dump(f, my_object) # Then later... with open('dump.pkl', 'rb') as f: my_object = pickle.load(f)
JSON serialization can be done in an almost identical way.
pythonimport json with open('dump.json', 'w') as f: json.dump(f, my_dict) # Then later... with open('dump.json', 'r') as f: my_dict = json.load(f)
JSON can be dumped/loaded to a string with json.dumps() and json.loads().
Python 3.4 added a new API for handling file system paths in a cross-platform manner.
pythonfrom pathlib import Path # Finds all files of the form logs/*.log log_files = (Path.cwd() / 'logs').glob('*.log') # If files have form 20181112.log, builds a list of the timestamps log_timestamps = [p.stem for p in log_files] data_dir = Path.cwd() / 'dir' # We can even open files given a path train_file = (data_dir / 'train').open() dev_dev = (data_dir / 'dev').open() test_test = (data_dir / 'test').open()
If you have some computationally expensive process run on an iterable, Python's standard library makes it easy to distribute it across your computer's cores.
pythonfor x in inputs: print(long_computation(x))
For an effortless speedup:
pythonfrom multiprocessing import Pool with Pool() as pool: for x in pool.map(long_computation, inputs): print(x)
Python is often used for building command line utilities, so parsing arguments is a common task. Fortunately, the standard library provides an incredibly comprehensive library for doing most of the heavy lifting.
pythonfrom argparse import ArgumentParser, FileType parser = ArgumentParser(description='processes some input files') parser.add_argument('files', type=FileType('r'), nargs='+', help='input files') parser.add_argument('--mode', choices=list(Mode), default=Mode.CONCAT) parser.add_argument('--limit', type=int, default=10, help='limit the number of lines') parser.add_argument('-v', dest='verbose', action='store_true') # Parses arguments like: # foo.txt bar.txt -v --limit 100 # args = parser.parse_args() # A list of already opened files passed as CLI args args.files # The mode enum args.mode # The limit passed (or the default) args.limit
Learning how to use argparse is a key Python skill.
Python's standard library also makes it really simple to test your code. You have no excuse not to!
pythonfrom my_package import Calculator from unittest import main, TestCase class TestCalculator(TestCase): def setUp(self): self.calculator = Calculator() def test_add(self): self.assertEqual(4, self.calculator.add(2, 2)) def test_divide_by_zero(self): with self.assertRaises(DivisionByZeroError): self.calculator.divide(2, 0) if __name__ == '__main__': main()
We'll combine everything that we've learned to build a tool that uses
requests and bs4 to check if a person is still alive (according to
Wikipedia).
shell$ python3 -m pip install pipenv $ pipenv install requests bs4
/