Transforming Code into Beautiful, Idiomatic Python

January 26, 2020 ยท 44 minutes
by Raymond Hettinger

You can be sure you’re getting the straight story or they will correct me right in the middle.

So, this was labeled a novice talk, an intermediate talk and an advance talk altogether. I’ve put some of each in. It’s got a lot of code in it. You don’t have to memorize it as it goes by. I’ll actually give you all my slides. They’re already uploaded. They’re actually meant to be something to take back to work and use immediately. Go through all your code base and find everywhere I said, “Don’t do this, do this instead.” and swap it and your code will be beautiful, faster, more idiomatic and more Pythonic and less like C, Java, C++ and other languages. It’s not that those languages are bad, they just have different idioms than we have. I’m curious how many people in the audience have taken a class from me at some point. Wow, I’ve taught over 1000 engineers in the last year and I’ve gained an appreciation for what things are really beautiful in Python and what works great for people. So, it looks like I’ve got a number of former students here. How many perspective students? None. Oh, okay. Ah, lots of them. Good enough! And as all of my students know, practically every example starts out with Raymond equals red, Rachel equals blue, Matthew equals yellow. In case you’re wondering who they are, they’re right at the back door right there. The lovely lady and little boy that is Matthew and Rachel. All right.

I was supposed to introduce myself Raymond Hettinger, Python Core Developer. If you would like to have fun of during this, we can play a game whenever I put up a construct and say, this is awesome, clap if you know whether or not I wrote it. Okay, so I get to talk about some of the things I created. There are some awesome things in there I didn’t create. Your game is to decide which ones. So, clap if you think you saw one that is neat and is something I wrote.

The other interesting thing on the slide is the @raymondh. I use my Twitter account differently than other people. I try and teach Python through Twitter. I don’t tweet when I arrived at an airport or when I left or anything like that. It is technical tweets and so I don’t waste your time. I teach Python in little 140 characters at a time, which is a very interesting challenge because you can get one little example, and sure enough, someone will tweet back, “But in Python 3 it does..,” or “In version 2.5..”. It’s 140 characters. You don’t get to put all the footnotes. Just saying.

So, without further ado, let’s start at the beginning with the novice part and work our way up. Pretty much in every other language I’ve used, you use indices quite a bit to do array lookups. Pretty much when you use an index in Python, unless it’s a fairly exotic circumstance, you’re almost always doing it wrong. We’ve worked really hard to make better ways to do it, so you don’t have to manipulate indices. I’m also going to show you something more advanced. Very few of you probably know about the else clause on for loops. I’ll show you what it’s for. Very few of you probably know that the iter built-in function can take two arguments. I’ll show you what it’s for. But overall, our goal is to aim for fast, clean, idiomatic Python.

How do you loop over a range of numbers? Simple enough, you make a list and loop over the list. What is important about that example? Python’s for is not the same as it is in other languages. If someone comes to my class and says, “I know what a for loop does because I know C.”, that is not what this does. We probably should have named ours for each. What it does is loop over collections. It uses the iterator protocol. It is in no way like the for that you grew up with. Is there a better way? Well, we can use the range function. The range function, the output of it is the list up above. In other words, these two things do the same thing in exactly the same way. Many people come to Python, see the second one and say to themselves, “It’s the same as the for loop that I learned in C our Basic..” or some other language. It’s not. What happens here is range produces that list then the for loop loops over the list.

I know what you’re thinking. If I do range(1000000), that list is going to be kind of big. And, in fact, on a 64-bit build, it’ll consume 32 MB of memory. Is that awesome? No, not awesome. So, there must be a better way. It’s called xrange. xrangecreates an iterator over the range, producing the values one at a time. Which is better range or xrange. xrange. Which is ugly. xrange! xrange is a horrifically bad name. We didn’t know how bad until Python 3 came along and in Python 3, we got rid of the old range and renamed xrange to range. So, I started going through my programs where I used xrange everywhere and I took off the x and they were profoundly more beautiful. I didn’t know how ugly the x was until I got rid of it. You’ll like Python 3 better. It looks better. Remember, beautiful is better than ugly.

Looping over a collection. How would a C programmer do it? Well, they would say, “for i equals zero, i less then n, i plus plus look up the ith color..”. Then they get to Python and say, “how do you do that in Python?” They do it this way. Some of you already know this. How are you supposed to use this information? You’re supposed to, after this talk, go back and look through your entire code base and do a grep for that. Whenever you see that, don’t do that. Do this instead. Even in professional code bases with really good programmers, as I go from company to company looking at their code, I see the first one all over the place. Just fix it. It’s simpler, easier to read, better and in Python it’s faster.

How to loop backwards. By the way somebody clapped for xrange. That predates me. I go back almost 12 years is a Core Developer. Somewhere in their xrange was just before me. I’m not responsible for the x. Okay, how to loop backwards. A C programmer knows how to do this, “for i equal n minus one, while i greater than minus one, count down by minus one..”. That translates directly into Python. The idiom that you learned on the first day of class in C works perfectly well in Python and it’s grotesque. It’s horrific and it’s what we had to do until we introduced a better way. The better way is to use reverse. Which is faster? The first one or the second one. Second one. Which one’s more beautiful? Second one. Which do I see in a lot of code bases? Why would people do that? The answer is they’re gravitating toward back toward the mothership and the mothership, that is where they came from prior to Python. That was the way to loop backwards in almost every other language, you know, and because you’ve learned it, you gravitate toward it. You use indices all over the place and you try and use for loops as if Python’s for meant the same thing. I bet if we renamed it for each, it wouldn’t be as pretty, but everybody would use it correctly. That’s how you look backwards. I heard no claps. What’s special about reverse? I did that. Okay.

Looping over a collection and their indices at the same time. A C programmer would have no problem because they’re already looping over i, they can print the ith position and the ith color. So, the output of this would be zero red, one green, two blue, three yellow. How do you do it in Python without indices? The answer is you use enumerate. Good call, Larry. He wins. I did enumerate. So a enumerator is a simple, clean way to do it. It’s fast, it’s beautiful, and it saves you from tracking the individual indices and incrementing them. What’s the cue here? Whenever you’re manipulating indices directly, you’re probably doing it wrong. Just saying. Go scan your code base. You’ll find this somewhere. Take it out, replace it with enumerate. It’s fast and beautiful and readable.

How to loop over two collections at once. Every C programmer knows what to do; take the shorter of the two lists, the minimum of the lengths, loop over the indices and look up the ith name in ith color. Why would they do such a thing? Because it works in every other language they’ve ever learned. What’s the Python way? zip. Is it really the Python way? Actually zip goes back 50 years. It was in the very first version of Lisp. If you read the original paper that came out on Lisp, zip was in there. zip has a deep history. It is a proven winning performer. It’s what you want. Do you love it? Yeah, I think it’s clean and beautiful. Is there anything wrong with it? What? What’s wrong with it? To loop over this, it manifests a third list in memory. That third list consists of tuples, each of which is its own separate object, with pointers back to the original. In other words, it takes far more memory than the original two lists combined. This is no fun. It doesn’t scale. What’s this whole scaling and speed thing? It used to be if you asked me how to make any program go fast, I’d teach you about loop unrolling and remembering previous calculations and whatnot. But on modern processor, only one thing matters: Is the code running in L1 Cache?. Because if there’s a cache miss, the Intel Optimization Guide has this horrifying line in it that says the cost of a cache miss is that a simple mov becomes as expensive as a floating point divide. It can go from 1/2 clock cycle to 400 to 600 clock cycles. You can lose two and 1/2 orders of magnitude by not being in cache. If these lists are really big, do you think that zip is going to fit into cache? I don’t think so. There must be a better way. And it is izip. So izip is better. Yep. That was me. I did that one. I got in at just the right time. Iterators had just been created. I’m like, “I think I’ll make a generator out of everything.”. Like, wow, that was really smart. Like, no Guido made the iterators. I just put them everywhere. So, uh, it was his brilliant idea. And it’s gone very far. It’s one of the things that makes Python beautiful and fast.

Looping in sorted order. So, we can loop over collections by doing sorted(colors). It’s pretty easy to take any for loop and sorted it. Just drop sorted in it. And now you loop over it in sorted order. Mertz! Okay. Yes, I did sorted too. Okay. So, how to loop backwards. reverse=true. Simple enough.

How do you do a custom sort? This was the traditional way. You made a custom comparison function that compared two keys and returned either -1, 1 or a 0, depending on whether it’s less than, equal to, or greater than. It’s just grimacing. But there’s others who are not grimacing. There’s others who say “That’s the way I learned it in C, that’s the way qsort works..”". And those of you who are older, who you learned with a qsort and comparison functions, you’re going to have a hard time letting go of this. In fact, you’ll fight with me. You’ll come over to me and you will try and invent examples of where you have to have a custom comparison function. And you might not even listen to me when I tell you it’s horrifyingly slow and it’s no fun to write a function like that. You can write one, the shorter gets the job done. And how many times will this function be called? Well, if you have a 1,000,000 items in the list and you’re doing a sort. The number of our comparisons is nlog(n). A log of a 1,000,000, base two is 20 so that’s 20 million comparisons. Is there a better way? So that’s 20 million calls to that compare function, which is long and slow. Here’s a better way. sorted(colors, key=len). The key function gets called exactly once per key, which is better? 20 million or one million? One million? Is the function shorter for a key? It’s almost always shorter. So, those of you who grew up on comparison functions will probably argue with me and say I can invent a comparison function where you can’t make a key function. And if you get really creative and work really, really hard at it after 100 tries, you’re going to find one that I can’t do or can’t do easily. Although, I do have a function that will convert back if necessary. How do we know that key functions are sufficient? Who likes to sort all the time? SQL people. They sort all the time. They’re order by this and that and the other. Do they passing in custom compare functions. No, they have key functions. Order by sum of relative frequency, order by this field plus that field. If they can get by with key functions, you can too. This code is shorter, more beautiful, and faster. And should you abandon your key comparison functions? Absolutely. In fact, we have abandoned them for you. I ripped them out and they’re no longer in Python 3. Goodbye, comparison functions.

All right, how many of you knew all of that stuff already? Okay, let’s see if I can take you someplace you haven’t been before. The traditional way to do a loop over a function call that has a sentinel is you do a while true loop then we do an reading a block of 32 bytes at a time. Eventually, we run out of data and when we do, the returns it’s a sentinel value, an empty string. Whenever, it’s an empty string, you can break out of the loop, so we append the blocks one at a time. By the way, the output of that is a big list of strings. How should you connect them together? join? How should you not connect them together? +. Hey, you guys are all on top of it. So, this is the traditional way. You will see this code all over the place. Did you know that the iter function can take two arguments where the first argument is a function that you call over and over again and the second argument is a sentinel value. This says it will call read(32) over and over again, looping one block at a time. We get to use for loops which should have been called for each, which are fast and beautiful instead of the while. Now, some people would argue because I had to use a partial in here that is slightly less readable than the original. But remember what I made here? I didn’t just have to hand it to a for loop. Yeah, these two are equivalent in terms of what they do. But I didn’t have to use this with a for loop. The moment you’ve made something iterable, you’ve done something magic with your code. What have you done with your code? As soon as something is iterable, you can feed it to set. You can feed it to sorted. min. max. heapq. sum. Many of the tools in Python consumed iterators. As soon as you’ve made something iterable, it works with all of the rest of the Python tool kit. So, the part to concentrate on is not the for loop part. It is the two argument form of iter. Add that to your tool kit. If you’ve not seen iter before, it’s a good time to learn. In order to make it work, the first function has to be a function of no arguments. How many arguments does take? One. How do you go from one to zero? partial. partial takes a function of many arguments to a function of fewer arguments. If you haven’t tried this before, go home and play with it and learn something new. Welcome to the world of functional programming. And the magic of this is there are lots and lots of functions, especially in older APIs, that are intended to be called over and over again until they give you a sentinel value. It’s called a control break style of programming. It used to be very widely practiced until there was a certain little hiccup. There was an insurance company that was processing big decks of punch cards for insurance claims. They put in the deck and then they stick in another deck. More claims, another decks, and at some point they needed to tell it to stop. So they put in a controlled break field, a sentinel value, that told it when to stop. One day they ran through a deck of cards and stopped right in the middle. They re-ran the deck, stopped right in the middle, stopped at the same card every time. This is a true story. I got it from Programming Pearls. The cause was the claim came from Ecuador. The capital is Quito and quit was the control break symbol, and when it hit, the city that said quit, it did. There’s a reason we don’t do this anymore. It’s the same reason that we don’t terminate our strings with nulls anymore because sometimes we want to stick nulls inside the string, fair enough. But if you encounter an API, like that, the two argument form of iter takes it out of the old world and into the new world of generators. Who learned something? All right.

Distinguishing multiple exit points in loops. I didn’t come up with this. Guido didn’t come up with this. This goes back to the goto wars. Lots of people hung on to goto and wouldn’t let it go until every known use case could be replaced. So, Donald Knuth sat down and itemized the most common use cases of the goto’s and he came up with some structured equivalent that would do the same. And, so, one problem was when you do something like a for loop, you need a flag variable to say, “Has something been found or not found?”. Now keep in mind this example. One, we already have a built-in find, so you don’t need this code to begin with. Number two, I could have exited out early with a return. So I know it’s a little simplistic, although it is Knuth’s example. His point was, typically code like this occurs intermeshed with other, more complex code and other operations so that there is not a shortcut out. The usual solution to the problem is to put in flag variables, which slows down your code and makes it less readable. We’ll start with found equal false, if we find the target, value is true, we will change the flag and then act on the flag at the end. There’s a better way that is shocking and jarring to most people coming from other programming languages because they’ve never seen it before. And it was Knuth’s ideas, not Guido’s. And, we actually have else clauses on for loops. Remember the for loop? It’s essentially got an if inside, and it’s saying: if I haven’t finished the loop, keep doing the body, if I haven’t finished the loop, keep doing the body. What construct is normally associated with if? else. So, what the else means is: I finished the body, is there any more of the body of the loop to do, now else. So, you can think of it that way. Inside every for, internally, there’s an if and go to and this is the else associated with that. If some people have a hard time remembering that way. If I could go back in time and talk to Guido: if you give me the keys to the time machine, I would say back when you first made this language else was exactly the right term because it’s what Knuth used and people knew at that time all for’s had an embedded if and goto underneath. And they expected the else. But in the future, no one will know that because we’re all using structured programming already. So, why don’t you call it no break? If it was called no break, everybody would know what it did. There are two ways to exit this loop. You can finish it normally or you can break out. Search your house for the keys. There are two outcomes: you find the keys and come out or you search all the rooms and there are no more. Two possible outcomes. They’re distinguished with the else. If that was called no break, you would know what it did if we finish the loop and didn’t encounter a break, return minus one. If we finish the loop, normally return i. Who learned something new? Now you know where it came from. Donald Knuth. Guess what Guido was reading when he came up with this? Donald Knuth. Guess whether he was thinking about the future, no, he would have called it a no break, at which point everyone would know what it did. Just like if we called lambda make function, no one would say, What does lambda do? It would be called make function.

All right, dictionary skills. Those who have been in my classes before? You know, I start out with dictionaries at the beginning and cover them the second day, the third day and the last day because there’s two kinds of people in the world: people who’ve mastered dictionaries and total goobers. All right. They are the fundamental tools for expressing relationships, linking counting and grouping. Here’s your core dictionary skills.

How to loop over the keys for k and d. Nobody clapped. I didn’t do that, Guido did that one, but I got to, he was sitting on the wire, what should the for loop do with the dictionary? Half of the people wanted for loop to return the key and the value at the same time. The other half just wanted the key. I went and researched what other programming languages did, went back into Smalltalk. greped through a lot of existing code, doing counts to see what people most needed most of the time when they looped over. I look to see what was consistent. If you wanted to treat a list as a dictionary, the indices of a list are parallel to the keys in a dictionary, and I kind of laid out an argument and I believe it tipped the scales, and that’s why it’s for k in d. Is there another way to loop over the keys. Yes, you could just ask for the keys and loop over them. When should you do the second and not the first? It’s when you’re mutating the dictionary. And the first way, if you want to mutate the dictionary, you can’t do that while you’re iterating over. In fact, in any programming language, for the most part, if you mutate something while you’re iterating over it, you’re living in a state of sin and you deserve whatever happens to you. In this case, though, d.keys() calls the keys argument and makes a copy of all the keys and stores them in a list at which point you’re free to go mutate the dictionary and delete all the keys that start with “r” leaving just Matthew. That’s kind of the way it is around the house. Ah, all the keys that started with “r” gone. And now it’s just Matthew. I brought him to PyCon. People just come up to me and “Oh, and they looked right at the baby.” I deleted all the keys starting with “r”.

Looping over the keys and values at the same time. One way is to loop over the key and look up the value. Is it very fast? No, because it would have to rehash every key and go do a look up on it. If you actually need the values, there’s a better way items. And so we’re using tuple unpacking here. If you need both, loop over them directly, no lookups are are involved. Is there a better way? Yep. Because items makes a big, huge list, the better way is iteritems. So iteritems will return an iterator. You might clap on that, I didn’t do that. I think that one was Berry Warsaw. Okay, yeah, for Berry.

Okay. Another Berry, looping over the dictionary from pairs, to loop over them together. Oh, constructing a dictionary. Yes. So zip is fantastic because as it assembles the pairs, the dictionary constructor, you might not have known, will accept a list of pair or any iterable of pairs. So, the easiest way to assemble these two into a dictionaries is to izip them together. If you marvel at this one and I think you should, the thing you ought to go away with is: the parts in Python fit beautifully together. How do you take two lists and join them together seamlessly and construct a dictionary? It is 1, 2, 3, 4 words of Python. It doesn’t get much more beautiful than that. Look back through how you’re building up your dictionaries. If you already have the inputs available, izip is a fantastic way to do. Is izip a better way to do this than zip. Yes. Now it still has to make a tuple in each generation, right? No, I went and put inside. I checked the reference counter after the dictionary has consumed the tuple. We loop back around to make the next tuple we reuse the previous one so it can actually build this without any intervening calls to the allocator. It just takes one tuple and reuses it over and over again. In other words, is this fast? Yeah, absolutely.

I’ve still got 15 minutes left. That’s great. How to count with a dictionary? You guys probably know a number of ways to count when you teach people Python dictionary. Show them get first and show them the most basic methods first. So these are the most basic ways to loop over a dictionary. Loop over the colors, check to see if the color is not there if it’s not there, add it, since a square bracket lookup is conditional, it can fail if the key is not there raising a KeyError. But in this case, we’ve just put the key in. So, we know it’s there. This last time will always succeed. This is a simple, basic way to count. Everyone should know how to do it. Don’t immediately start them with the most advanced thing. Because if a person can’t do this, they will be helpless on any more complex problem with dictionaries. Do you start them right away with defaultdicts and whatnot? No. Start them this way. What’s the next level of improvement over this if I don’t want to use anything exotic and I want to use the core dictionary API. Those of you in my classes, I threaten you all the time with Matthew will remain fatherless unless you know this particular method. I’m trying to keep people an extra day. If they don’t know this, the method is get, yeah. In the case of counting, all we need is get. And so the code up above simplifies to this. We get the color, the color is missing, return zero and add one to it. We don’t need a setdefault. All we need is the zero. We need the look up to not fail. Is there a more modern way? Yeah. What is it? defaultdict. When I answer a question on StackOverflow and I put one of these first two, please don’t immediately go change it to defaultdict. All you’re doing is taking somebody who couldn’t count in the first place and then taking them where they have to import the collections package, learn the distinction between a regular dictionary and default dictionary. They have to know the about factory functions. They have to know that int can be called with no arguments producing the value zero and then when they get something back, it’s not actually a dictionary. It’s a defaultdict and needs to be converted back for some use cases. In other words, if you hand this to a beginner, you usually made them worse off. Make sure they know the first idioms before they drive on. But that said, I use this or I use collections.counter.

Okay. How to group with the dictionaries? Uh, this is an example I’ve used over and over again. How do I group these names together? I forgot what I’ve grouped them by their length or their first letter. The traditional way that a person should learn first is to start with an empty dictionary. The key of the dictionary was what your grouping by. So, Raymond is of length seven that will be the key and the value will be a list of all of the names of length seven. If you’d like to accumulate a lot of points on StackOverflow, know this because this question gets asked about once a week. What should you immediately take them to when someone wants to group. The collection’s module? No. How about show them setdefault. So this one the output of it, by the way, is Roger has five letters, Raymond, Matthew, Melissa and Charlie all have seven. Rachel and Juditha have six. By the way, if you’re grouping by anything else, you only need to change the key line. Maybe the key is name[0] that will group people by the first letter name minus one. Key could be the number of ’e’s in the name. You can group by just about almost anything using this idiom. There’s a better way. Though it’s setdefault. We actually need setdefault because we want to return the list so we can append to it. But we also need it to be inserted in. setdefault is just like get, but it has a side effect of inserting a missing key. For a long time this was the idiom for grouping in Python. I think it is not particularly beautiful Python though. The word setdefault is really bad and everybody thinks it’s awful, but no one can think of a better name. Every other name we’ve ever experimented with had like about 50 letters, well, this goes into a dictionary, looks to see if a key is there, if it’s not, it takes the default value and sorts it into and returns it so you can group with it. That would actually be the best name, but it’s a little long. And now the modern way transform your code into this defaultdict(list) that will create a new list and it is far more beautiful than this original. Six lines and slow. Four lines and fast. That is the new idiom for how to group things in Python. You must know how to do it, but not only must you know how to do it, my presentation is intended as a checklist for you. When you go back, check out your code base and find out where everywhere you’re doing this or this. Replace it with this. If you do all the replacements in my slide deck, it’ll speed up your code quite a bit. It’ll make it more maintainable and more beautiful.

An interesting one is popitem. Now, you might clap on this one or might not. I can’t remember. I made either pop or I made popitem, but I’m not sure which one. It was a long time ago. It was my first or second contribution to Python. It must have been the second because the first contribution was I volunteered to put docstrings in bunch of modules. You guys use any docstrings? Yeah, I put about half of them in there. Originally, they were mostly empty. Is that a good way to join an open source project? Yes, if you go through putting docstrings everywhere. One, people love you for it. Two, you make the code more usable. But as a side effect, you actually learn what every module does. Or you can start another way. How about you take the most popular, important data structure, a dictionary, and mangle it and transform it in some radical way and change its performance characteristics. Is this a good way to start? Someone recently did. I growled at them earlier. It was the only person I growled at it at PyCon. Okay, so, uh, popitem I might or might not have put in. It removes an arbitrary item. The interesting thing about it is its atomic. So you don’t have to put locks around it so it could be used between threads to atomically pull out a task. Who learned something new?

All right, linking dictionaries together. This kind of code is reasonably common. We have one dictionary which has some default values for some parameters. In addition, we call argparse and provide some command line arguments that are optional. So a user can specify on the command line: the user or color or they can not specify it. And lastly, I have a third dictionary, which is not showing here. The third dictionary is OS environment, which is not really a dictionary, but it looks like one which gets environment variables. It is common to want to chain these together and the traditional way to do it, one that I actually found in the standard library, was you copied the dictionary full of defaults. Then you do an update from the environment. That way you have some standard defaults. And then if someone specifies an environment variable, it overwrites environment. Variables take precedence over the internal defaults, but a command line argument should take precedence over the environment variables. This kind of code is a very common. How many of you have written some code like this ever? Is it the right way to do it? Well, it copies data like crazy. If you want your code to be fast, don’t copy like crazy. So ChainMap has been introduced into Python 3 and it links them all together. It leaves the three interpreted dictionaries and just looks up in the first one, then it looks in the environment. If it doesn’t find it there, it falls back to defaults. This way is fast and beautiful. And it’s why config parser is no longer slow. Thanks for the applause on that one. That was me.

Improving clarity. I have so few minutes left, but I was gonna leave five minutes for Q and A. Do you guys want to blow if your Q and A and get more of these slides. Cool. No questions. You don’t even have time for a yes, all right. Wherever you have positional arguments and indices, that’s nice. You could do that in any language. Key words and names are better the first way where you’re using indices, that’s convenient for the computer and fast in a language. Like C. But naming things is how humans think. So a way to improve your oh, did we start with the answer on that one? There we go. Oh, they were out of order. I think so. The top one is the kind of code that I see all over the place and clients and customer code bases. It calls twitter search, Obama False 20 and True, raising the question. What is the 20, the False and, the True? You would have to have memorized the argument signature in order to check that. A simple way to improve the readability of your code is to go find everywhere where you’re making an obscure call like that and just replace it with keyword arguments. It’s an easy thing to do. It slows down your code just a little bit. But really, what are you trying to save? Microseconds? Our hours of programmer time hours. Hours of programmer time. Those are the ones that cost you. So this is a simple transformation. In fact, if your junior programmer just starting out and you would like to improve your company’s entire good base, go through and do this everywhere. Make sure you don’t do it in a middle of a tight loop, but mostly everywhere, and it will make the code better, profoundly better. And who will be the first consumer of this? You. Because you’re the new person to the company and don’t know the code base, you’ll know it really well. After you’ve done this. It’s an easy way to improve code quality.

Named tuples. It used to be that if called doctest.testmod(), it returned (0, 4). But the trouble: is that a good thing or a bad thing? Are you happy or sad when you get (0, 4). You don’t know. That is what it returned for most of its existence. Now it returns: TestResults(failed=0, attempted=4). Are you happy or sad, which is a better output. Is the second output substitutable for the first. Sure. Named tuples are a subclass of tuple, so they still work like a regular tuple. They just tell you what they say and how the way you make the name tuple is simple. You just say we’re defining test results, as having to fields failed and attempted. Easy enough. This is a very easy way to improve your code base. Basically, all over the place. Go put name tuples. And now all of your output messages and error messages will be much more readable. The person who benefits from this will be you. There we go. All right. I was the name tuple guy.

Unpacking sequences. Raymond Hettinger, who’s a young man in hex. Uh, I can pull out the field’s this way. Why would you do it this first way? The answer is that’s what you do in almost every other programming language that you know. Okay, so people do this mainly because it works in all other programming languages and they do it out of habit. The better way are the ones listed here. Tuple unpacking. It pulls out the four fields for you, the second one is more readable and it’s faster. This is an easy change to make. It’s an easy thing to grep for everywhere you see brackets zero, brackets one, brackets two, brackets three, you know what’s going on. Replace it with unpacking. Your code is better and faster. Easy change.

So, how do you do simultaneous state updates? The traditional way to write the Fibonacci is to take a temporary variable for y add up your new y and then use your temporary variable. I hate this code. I’ve written code like it a lot of times because I started with a 1967 version of Dartmouth Basic and it was all I had. But there’s a better way you can use tuple packing and unpacking. Don’t overlook how important it is. It’s profoundly important. The problem with that.. Well first I’ll show you the correct solution. The correct solution, the way you often see it written is with simultaneous variable updates the y in the x + y used the old values of x and y to build the tuple, then they get unpacked and stored in the variables. The x and y are state. The state should be updated all at once if you don’t update the state all at once and put it on multiple lines during in between those lines, the state is currently mismatch. At one point y is the new y and x is the old x. This is a very common source of problems and plus the ordering matters here. If I muck up the order of these three it breaks the code and it’s a hard error to see. The last thing I don’t like about it, besides the risk of order, is it’s too low-level. On the next slide, I’ll talk about taking an atom and breaking it into subatomic particles. This has been broken into subatomic particles. What does this say? This says take y and store it in t, take x + y to y and t to x. The second one says update these variables according to those equations and so you transform one to the other. The second way is a higher level way of thinking it doesn’t risk getting the order wrong and it’s fast and Pythonic. Please transform code like that into this.

Lest I let that go, I’ve got a whole additional slide and half of slides for this. This is such an important problem. Don’t underestimate the advantages for this. If you break this out into pieces, you risk ordering problems. Also, you are making it to atomic. You are losing the ability to chunk your thoughts and to think higher level thoughts. For example, a problem I give when I teach a scientists is I give them the function influence. The influence of one planet over another. All they have to do is plot the orbit of the planet. And you have some people get nice elliptical orbits and other people were it just kind of zigzags away. The ones who get it wrong are the ones who wrote exactly this code except they didn’t use the temporary variables. If the first thing they write is x is equal to x plus dx*t, they’re toast at that point. Why? They’ve updated the x and now this one gets computed with the new x rather than the old x it is a very common problem. The other half of the people who write these temporary variables, how did they know to do that? The answer is, they’ve all been burned by this problem before. The correct answer is this. Do the calculations on the right with the old values of the variable, the old x, the old y, the old dx, the old dy and only then take the partials and then update all of the variables. This is of profound importance, not just for scientific computing. I can give other examples where people are doing a simple mortgage calculation with the principal and interest and whatnot and if the very first thing they do is update the principle by principal minus equal payment, they’re toast because their interest payments going to be wrong. The interesting thing to me is not that they get it wrong when they program in any language, the interesting thing is I can give them the same problem and Excel and they always get it right. Why is it that people get it right in Excel and wrong in programming languages? The answer is in Excel, you take all the state on each row on month one, here’s the current principal interest and whatnot, on month two its this and people naturally take their formulas and they refer up to the previous row. All Excel people do this essentially, they’re doing exactly this operation. You could view this as what’s on the right is referring to the previous row and what’s on the left is the new row and that gets iterated. In other words, it’s a very natural style of thinking. Please don’t write code like that. Write code like that. This is a big deal that we’ve got it in this language. It will save you from a lot of trouble.

One minute? Efficiency. I’ll do efficiency fast. Basically, just don’t move data around unnecessarily.

Concatenating strings. In my classes. I tell a naggy joker all around this in order to hammer home this is quadratic behavior. Don’t add your strings this way. Instead, join them. It’s likely most of you knew that already. Go check your code base and see if you’re code base knows that. Just saying.

Updating sequences. If you see a del[0], a pop(0), or a insert(0) you’re doing it wrong. I go into a customer site and they say here’s a 1,000,000 lines of code and it runs really slow. Can you make it go fast and see if I’m going to be able to read a 1,000,000 lines of code. But 15 minutes later, I come back, said I’ve sped up your code. What did I do? I greped for these three things. Pretty much everywhere they did a del names[0], a pop(0), or insert(0), they were using the wrong data structure. What’s the correct data structure? deque. Yeah, I did that. Anyway, a deque, well let’s you do.. you can delete name[0] efficiently, a popleft efficiently, appendleft efficiently.

Decorators in context managers. I have no seconds left, but we’re going to break time of five minutes. We have time for decorators and context managers which completely rock. We’re out of novice territory and into really good stuff. These are fantastic tools were refactoring your code. But good naming is essential because it provides macro like capability, meaning you can hide all kinds of awful actions behind the macro or you can be very clear. So remember, the spider men rule: “With great power comes great responsibility.”.

All right, so I want to factor out some administrative logic. The business logic here is opening a URL and returning a web page. The administrative logic is, I’m caching it in a dictionary. That way, if I go look up the same web page over and over again, I just simply remember it. You’ll see code like this all over the place in Python where someone was trying to cache through lookups. What I don’t like about it is it mixes the admin logic with the business logic and it’s not reusable. Simple fix: add @cache. It’s actually the LRU cache in Python 3. I have back ported it for people who want to scan for the back port. You can start using it today. That said, these things were pretty easy to write on your own so I really want to demonstrate the decorators here less than what I’ve written. This is reusable. I can put @cache in front of any pure function, a pure function being one that returns the same value every time you call it. random.random() is that a pure function? No, because it gives a different value every time you call it. pal. Is that a pure function? Yep. Same answer every time. The business logic has been separated from the admin logic and I’ve gotten reusability the way I write it is with a simple caching decorator. It only takes a few lines. I would like for your utilities directories to be full of little tools like this. So that elsewhere in your code you put @cache and the problem is solved.

Factoring out temporary context for decimal. So we get we copied the context, change the decimal precision to 50, do a calculation and restore the old context. This is saving the old, restoring the new that happens over and over again. There’s a better way with localcontext. The context manager here makes a copy of the context, puts it in place, does a calculation, restore. Which is easier to get right, the first or the second? Second. It has reusable logic. The pretty much anytime your set up logic and tear down logic get repeated in your code, you want a context manager to improve it. It can profoundly clean up your code base, especially if you’re doing this sort of thing all over the place.

Okay, the traditional way to open and close files. Everybody knew they were supposed to do it. They wouldn’t do it. You had to do try and finally, CPython closes for you anyway. But the simple way, the new way is beautiful. What did the with statement factor out for us? It factored out the set up logic and tear down logic. It’s set up to try finally for us. Most of you knew that one already. Does your code base know. Go fix it.

How to use locks. Simple way to make a lock the old way, is acquired the lock, do a try-finally, do you have to use a try and finally? Absolutely. If you don’t, you don’t release the lock under some situation where an error happens in here. What happens if you don’t release the lock? Do puppies die? Everytime. Dead puppies. So that’s what you’re supposed to do. But you had to indented twice, spell out finally, put colons. People knew they were supposed to do it and probably wouldn’t. The new way is with lock. I’ve actually separated the administration logic of getting the lock separate from the principal. I don’t know who came up with the context managers. I need to go find them and I thank them because they are wonderful.

Factoring out temporary context. By the way, most of you guys knew with lock already, right? And you already knew the way to open and closed files. Here’s one you didn’t know. I would do an OS removed to file and then I catch an error. There’s another way to do this. Of course, you can check and see if the file exists before you do it. Is that the right way? No, because it has a race condition in it and so this is the correct way. It’s also irritating. Here’s a better way with ignored. Oh, you’ve never seen ignored. It says, do this code.. How come you haven’t seen ignored? Aren’t you watching my checkins? I made this check in a few days ago. You guys aren’t working off the head on Python 3.4? Yeah. Anyway, if you want your own, I put it on the next slide. You put those handful of lines in your code. Hey, it’s like 10 words of Python. Stick that in your utils directory and you too can ignore exceptions. It gets rid of the idiom for try except pass. Who learned something? Cool.

All right. More new things. Factor out temporary context. Did you know that help send its helped to standard output. So you’re gonna have to cut and paste it to store it in a file. That’s irritating. Can’t we just redirect it? Sure. What you can do is open a file, redirect standard output temporary assign it, go to a try finally on the help. And then after you’ve done the help, capturing the output to the file, restore it. Is this any fun? No. A better way with redirect standard out to the file. Pretty much you can get back in the business of writing your functions with print statements and then wrap them with withs to send them to files and send them to to standard error. Send them somewhere else. This restores the beauty of using our print everywhere. And this is the context manager for it. You don’t have this and of course you don’t, because I haven’t checked that one in yet, that’ll probably go in on Monday, so, Nick and I have to have a couple words about.. he’s very happy with this part. What were unsure of is whether we should say, in fact, you guys could just vote. The proposal is if file object is equal to none, how about we automatically create for you a StringIO object, captured the string, so that you can do with redirect standard out no arguments as s and then do a s.get value and captured the string? You want that? All right. Nick, it’s decided. Nick is the man for context lib, so we negotiate everything. If somebody wants something in collections. They have to come to me. They want something in itertools, they come to me.

All right. Poor Jackie came to me and said no. He tried to alter one of my combinatorics. I’m now wishing I had said yes, though. All right. So concise, expressive one liners. This the very last one. But it’s an interesting thought. When people first come to Python, we teach: “Don’t put too much stuff on one line.” There’s an infinite amount of vertical space available to you in your code. Take advantage of that. On the other hand, don’t take single units of thought and break them into subatomic particles that actually makes your code harder to understand. I understand every single line, but yes, do you understand the gestalt of that? My rule for what goes on one line in one logical line. Did you remember earlier when we did the tuple unpacking with the planet positions? That was one logical line, even though I actually typed it on four. So one logical line means one statement. So my rule is what goes in one line is what you can express in a single English sentence. Give me the sum of the squares of the numbers up to 10. This is one way to do it. You start with an empty list. This is the way we used to do it. In the olden times when I first came to Python, I’d have you clap on sum, but Alex put sum in there. I’m the one who made sum go fast, though. So, I put all the optimizations in so it doesn’t create a new object at every iteration. It actually just keeps a running total in C. There is a better way, which is to use the square brackets and put this in one line. Why is that better? Well, the first one tells you exactly what to do. Step by step. The second one says what you want. It’s more declared. If it just says: “I want the sum of this.”. I read it left to right. I want the sum of the squares of i taken from 1 to 10 the same way you would write it in mathematics. I contend the second way is better because it’s a single unit of thought and the first one is too busy telling you how to do it and not what it’s doing. Fair enough. Oh, is there a better way? Take out the brackets. Generator expressions. I did those. Yeah. So yeah, my contribution to Python: I came along and saw these square brackets and I took out an eraser and I erased them and it made everything go a lot faster, that creates a generator version of this instead of filling up memory making it go fast. Did you guys have a good time? Cool. Thank you all for coming to the presentation and do me an honor, take these slides, go back to work and have somebody, if not yourself, look at every one of these, find them in your code base, and put them in. It’ll make your code faster, better and more beautiful. Thank you very much.