For Next Time
- Prepare for Architectural Review 2, submit framing document
Getting to “Good Code”
For today’s purposes, we’ll define “good code” to be correct and fast. There are other equally important considerations, such as good documentation and organization, but these are harder to measure quantitatively.
Is the code correct? If your code doesn’t do what it is supposed to, then nothing else matters. We’ve discussed and practiced strategies for ensuring this, such as unit testing. In situations when getting to “correct” proves difficult, it may be helpful to employ more advance debugging strategies.
Is the code fast enough? If the code is doing what it should, the next question is whether it runs fast enough. There are several tools we can use to probe execution performance. The answer to this question depends on context, but if the answer is “no” then we need to dig deeper.
Why not? For code with performance issues, it is important to understand precisely what makes it slow. For this, we often turn to profiling to pinpoint bottlenecks in execution so that they can be fixed.
“Python is slow”, “my CPU is slow”, “I need more RAM”, and similar answers are never acceptable without specific evidence. Faster hardware can sometimes help, but buying a shiny new machine is not the solution when a poor algorithm is the core of the problem (and much cheaper to fix).
For all of these questions, your best indispensable resources are 1) your brain, 2) a peer’s eyes on the problem, and 3) pencil and paper. Assuming you’ve exhausted those resources, the following sections cover some advanced Python tools that you can employ.
Is it correct? Interactive Debugging in Python
One of the methods of debugging we’ve used so far in this class is print statement debugging. Print statement debugging is fantastic, however, there are some shortcomings. Take a minute and at your table discuss some less-than-ideal aspects of print statement debugging.
Another form of debugging is to use a tool that allows interactive engagement with either a crashed or running program. The Python DeBugger (pdb) can be used in roughly these two modes: postmortem analysis and live execution. Today we will only be covering the usage in live execution mode (for a discussion of using pdb for postmortem analysis check out the pdb documentation under “analyzing a crashed program”).
There are a variety of approaches to using PDB in this capacity. One method is to add the
pdb.set_trace command to your program to tell the debugger to pause execution at a particular location and enter interactive mode.
import pdb def factorial(n): """ Computes the factorial of the non-negative integer n """ return_val = 1 pdb.set_trace() for i in range(n): return_val *= i return return_val if __name__ == '__main__': print(factorial(5))
Let’s debug this one together.
Some important commands for pdb:
- c - continue until next set_trace or breakpoint
- n - next line in current function
- s - step until first opportunity to stop (either in current function or a called function)
- l - source code listing
- a - arguments of function
- d - down a level in the stack diagram
- u - up a level in the stack diagram
- p - print the value of an expression
- w - where are you in the stack
There are some good PDB tutorials available online that are worth looking at. Consider checking out Real Python’s tutorial Python Debugging With PDB.
Practice Problem 1
def cumulative_sum(L): """ returns a list where each element in the returned list is the cumulative sum of the elements up to the corresponding element in the original list. L: the original list returns: a new list where element i is equal to the sum of element 0 through i in the original list """ for i in range(len(L)): L[i] = L[i-1] + L[i] return L if __name__ == '__main__': print(cumulative_sum([1, 2, 3]))
Practice Problem 2
def get_primes(n): """ Returns a list of all prime integers in the range [1,n] """ return_val =  isPrime = True for i in range(1, n+1): for j in range(1, i): if i % j == 0: isPrime = False if isPrime: return_val.append(i) return return_val if __name__ == '__main__': print(get_primes(7))
Practice Problem 3
def get_doubles_then_triples(L): """ Returns a new list containing the original list with each element multiplied by 2 concatenated with the original list with each element multiplied by 3 """ doubles = get_multiple_of_list(L, 2) triples = get_multiple_of_list(L, 3) return doubles + triples def get_multiple_of_list(L, n): for i in range(len(L)): L[i] *= n return L if __name__ == '__main__': print(get_doubles_then_triples([1, 4, 8]))
Practice Problem 4
class Kangaroo(object): """a Kangaroo is a marsupial""" def __init__(self, contents=): """initialize the pouch contents; the default value is an empty list""" self.pouch_contents = contents def __str__(self): """return a string representaion of this Kangaroo and the contents of the pouch, with one item per line""" t = [ object.__str__(self) + ' with pouch contents:' ] for obj in self.pouch_contents: s = ' ' + object.__str__(obj) t.append(s) return '\n'.join(t) def put_in_pouch(self, item): """add a new item to the pouch contents""" self.pouch_contents.append(item) kanga = Kangaroo() roo = Kangaroo() kanga.put_in_pouch('wallet') kanga.put_in_pouch('car keys') kanga.put_in_pouch(roo) print(kanga) print(roo)
Fast enough? Tools for benchmarking execution time
Performance is always relative - 30 seconds is great for analyzing a huge research data set and terrible for processing a credit card sale. It is also machine-specific, though relative performance trends are often constant across machines.
There are several tools available for benchmarking program execution to try to scientifically answer the question of how fast your program runs.
1) You can time how long a Linux command takes using the time utility:
$ time python gene_finder.py real 0m13.334s user 0m12.493s sys 0m0.012s
2) To measure at a finer granularity, you can use Python time module to measure the time before and after a function call as we did in the Day 19 algorithmic growth exercises.
3) You may have noticed some variability in your results on the Day 19 exercises. To run many experimental trials of a small snippet of code, you can use the Python timeit module.
timeit takes a string representing the code you want to time and runs it repeatedly to get an average result.
timeit can be run from the command line (like the Linux time utility):
$ python -m timeit '"-".join([str(n) for n in range(100)])' 10000 loops, best of 3: 33.4 usec per loop
It can also be run from within a Python program, and there is an example at https://github.com/sd18spring/TimingProfiling in
Finally, timeit can be run within a Jupyter notebook cell using the
%timeit “magic” command.
timeit is best used for testing very small sections of code (e.g. a single line). For understanding larger programs, you should consider code profiling.
timeit to compare
reverse_complement_2 implementations from GeneFinder. Do the results match your analytical understanding?
Why is this slow? Tools for profiling
Once you’ve determined that your program is too slow for your requirements, it’s time to figure out precisely why.
For this, we can use the Python profile and cProfile modules. For our purposes these are equivalent, so we will use the faster cProfile module.
$ python -m cProfile gene_finder.py 35206053 function calls (35205065 primitive calls) in 18.516 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.000 0.000 <string>:1(<module>) 1 0.000 0.000 0.000 0.000 <string>:1(ArgInfo) 1 0.000 0.000 0.000 0.000 <string>:1(ArgSpec) ...
The profile listing that results shows lots of useful information about your program execution. For each function (and builtin function) in the program, cProfile lists:
- filename:lineno(function): the function being profiled
- ncalls: number of times it was called
- tottime: the total time spent in that function
- percall: tottime/ncalls
- cumtime: time spent in the function, including subfunction calls
You can sort by any of these columns by providing a -s [sort_order] flag (default is function name). You can also dump the profiling results to a data file, so that you can run the program once and study the results at your leisure. Full details are at the profile documentation.
I’ve started writing a palindromic phrase generator, but I’m struggling with performance issues. I’ve started by searching for “mirror pairs” - words that are valid both forward and backward, like “tuba” and “abut”. Unfortunately, I can’t search my entire word list in a reasonable amount of time - checking just the first 100 words takes about 15 seconds:
$ time python mirror_pairs.py ['aa', 'aba'] real 0m14.620s user 0m10.866s sys 0m0.042s
Download the starter code from https://github.com/sd18spring/TimingProfiling. Use cProfile to determine the bottleneck(s) in execution for
mirror_pairs.py and fix them.
You do not need to keep the architecture/functional organization of the original code if it is slowing things down.
Exercise Go back and profile your code for GeneFinder (and the other mini projects). Where does your program spend most of its time? Are there bottlenecks you can improve?
Preparing for Architectural Review 2
AR2 will have a similar format to AR1 with the major changes that it is longer (you will have 20 minutes to present) and you will be presenting to half of the teams in your studio. With this extra time comes extra opportunities for interactive engagement with your audience, so please keep that in mind when planning your review. It also means that the expectations for the audience are ratcheted up, since there are fewer people to offer feedback. As in AR1, if you’d like, you can solicit feedback electronically. If you decide to do this, e-mail a link to your form to your studio instructor at least 2 hours before class.
AR2 Groupings for AC304
Ben visits both rooms Group 1 (these teams will take turns presenting to each other in AC304):
- Eric, Flynn
- Walker, Jamie, Shyheim
- Luis, Anna
- Design coach: Matt
Group 2 (these teams will take turns presenting to each other in AC306):
- Mark, Kawin, Sid
- Aiden, Grace, Ricky
- Nathan, Sherrie, Amy
- Daniel, Raquel, Lydia
- Design coach: Mackenzie
AR2 Groupings for AC326
Amon visits both rooms Group 1 (these teams will take turns presenting to each other in AC313):
- Camile, Seba
- Andrew, Abby, Tommy, Emily
- Jordan, Hadleigh, Cusai
- Quinn, Julian, Alex
- Design coach: Sophia
Group 2 (these teams will take turns presenting to each other in AC312):
- Bryce, Maya
- Alli, Junwon, Corey
- Jonathan, Hyegi
- Design coach: Allen
AR2 Groupings for AC328
Paul visits both rooms Group 1 (these teams will take turns presenting to each other in AC328):
- Visualization of Sentiment of News
- CV Puppets
- Neural Networks for Image Creation
- ASL Recognizer
- Design coach: Nina
Group 2 (these teams will take turns presenting to each other in AC326):
- Accent Detection
- Smart Calendar
- Tron AI
- Design coach: Vicky