Naming things
2017-06-01
Naming things – choosing a string of characters to represent a given expression – can be frustrating and difficult. This might be obvious, but the abstraction of referring to a block of code with something memorable and human readable greatly increases the ease of programming. Whoever first realized that we could use memorable words like MOV and GOTO instead of numeric opcodes was responsible for a huge advance in programming language design. In a sense, every time we abstract away a function, script, module, or program using a name, we’re building on that simple idea. Being able to reference other parts of a program is important in its own right, but it isn’t nearly as effective if you have to refer to your function by say, its location in memory, rather than a name that describes what it does. However, when we choose names to take advantage of this abstraction superpower, there’s an inherent conflict between brevity and being descriptive. Deciding how to name things has been referred to as one of the hardest problems in programming1. I’ve been thinking about how to do a better job at this lately, and here are some notes I’ve made on the topic. Some of them might seem sort of contradictory – I think that’s just an indication of how naming things is complex and nuanced.
- Good design makes it easier to name things. Dividing your code up into modular parts is an important part of writing clean, maintainable software. If the name of a component isn’t at least somewhat intuitive based on what it does, it’s possible that you’ve chosen to break up your problem in a less than optimal way. Also, once you have descriptively named and well documented components that have a clear interface and that only do one thing, the problem of naming things used inside the scope of that component becomes easier.
- Consider potential sources of ambiguity, especially from the perspective of somebody who isn’t familiar with the overall structure of the project. In PyCIRCLean, there are times when we have a single method containing separate references to each of: a string representing a path to a file, a Python file object, and a class containing information about a file. If we had static types or type hints, the reader would receive extra information to help understand what each variable refers to. Given that we don’t have type information, we decided on names like “file_path”, “opened_file”, and “file_object”. Always considering your reader’s perspective aligns with good programming practices in general (and is a principle of writing good prose, too).
- Sometimes the right name won’t occur to you right away. It’s ok and worthwhile to wait for the perfect name if one isn’t immediately obvious, rather than forcing a name that isn’t great. But, you also don’t want to get stuck spending an hour optimizing one name. An adequate solution is to just use whatever name you come up with and then come back when you’ve had time to think of something better. However, take care – suboptimal names can proliferate and become more difficult to change with time.
- Generally speaking, the greater the infrequency or “distance” between usages of a component, the longer and more descriptive its name should be. Names that are used infrequently and far apart won’t easily stick in the reader’s memory, and they’ll have to frequently consult the documentation or the location in the program where the name is defined. A more descriptive name will help smooth the process of understanding what the code does. On the other end of the spectrum are examples like the convention of using
i
to count the current loop number or index into an array/list. Somebody reading the code can see the entire context for the variable’s usage inside one screen height of their text editor, so they don’t have to keep any extra information in their head. Thus, a short and un-descriptive name is sufficient. - Be especially careful with global variables, public attributes, or names that are referenced often throughout a program. If something is referenced many times, future programmers (including yourself) will be more reluctant to change it. Also, if you’re going to be exposing an API, your users will not be happy if you make breaking changes. Do the best that you can the first time, and don’t be afraid to take a little extra time to make the decision. As a side note, a variable appearing in many different locations can be an indication that you should consider reorganizing your code. Sometimes, you do need global variables, but having naming issues with a global variable is a code smell that you should at least think carefully about.
- One thing that sometimes comes up in discussions of variable naming is Hungarian notation: the practice of including meta-information about a component in its name. This can include information about a broader category that the component belongs to, or its actual data type. Although many discussions of variable name argue against Hungarian notation, I’d encourage you to read Making Wrong Code Look Wrong by Joel Spolsky. Essentially, his argument is that Hungarian notation as it was originally conceived of provides a useful way of, well, making wrong code look wrong.
- Follow conventions set elsewhere in the codebase. Let’s say you start working on a large Python project for the first time. You’ve previously read that, in Python, variable names are normally lowercase_with_underscores, and every other project you’ve seen has followed that convention. In your new project, however, you see that variable names are in camelCase. What should you do? First, whatever you do, you’ll want to maintain uniformity throughout the codebase – the worst option is to start using lowercase variables in your contributions and not tell anyone. This is because projects that don’t have uniform style are generally harder to read. Depending on how large the codebase is, you could spend some time making everything Pythonic. In the worst case scenario, this could take a few weeks, potentially break things, and annoy your coworkers with the massive diff you will generate. In that case, the best course of action is to imitate the style of the rest of the project, regardless of what it is.