From September 25 reading

The readings were:

Point #1

“I somewhat disagree with the author of “Treating Code as an Essay” when it comes to brevity of code being a good thing. I agree with the overall goal of creating code that is explanatory and easy to understand. While this would discourage clutter, I think extra lines of code can sometimes be useful, especially in scientific computing. For example, using numpy to reshape data and then solve a least squares problem can be done in one line. But splitting up the reshaping steps and the least squares function call is likely better for readability.

Ultimately, I think code should try to balance readability with sufficient context. Documentation can’t make up for cryptic code - and I think programmers should always err on the side of over-explanation, since one programmer’s “obvious” is often another’s “convoluted” or even “impossible.”

Point # 2, on simplicity for users and developers:

“The main themes of both of those chapters seem to be tradeoffs between simplicity for users and simplicity for developers [emphasis mine]. Both authors mention special cases where the constructions were not optimized, even though they knew it would bring (at least non-zero, even if small) performance gains to users, because it would complicate the development of those objects substantially. It is hard for me to tell for certain, but I imagine that Matsumoto might be extremely in disagreement with the decision to compromise at all ever in favor of simplicity for language developers over simplicity/performance for end-users. Also, to be fair/judicious, it is not only Python developers who often seem to have this attitude – I have read several posts/comments by Hadley Wickham, where he explicitly states that he doesn’t feel the need to optimize the dplyr utilities to the extent data-table does, because he imagines that in most use cases the improved performance would not be necessary. Maybe that’s true, but on the other hand that also seems to be sacrificing simplicity for the user (i.e. “why is my code running so slow? and how can I fix it?”) in favor of simplicity for the developer. Also, the developers of data-table also seem to have the same attitude as well – they are willing to choose a syntax which is very unfamiliar and/or non-intuitive and/or difficult to acquire, because it simplifies the process of developing the package for them, but it makes the package much more difficult to use for most end-users. I.e. this type of thinking is clearly present in both R and Python developers. Indeed, Matsumoto certainly couldn’t be the only one if he disagreed with such thinking (I’m not sure that he would be, since the developers do compromise back-end simplicity a little bit for front-end simplicity, just not completely) common for developers of high-level language backends – the whole point/existence of the new Julia language seems to be predicated on researchers frustrated with high-level language developers who weren’t willing to optimize their code completely, in spite of the benefits it would give to end users. (E.g. at one point the NumPy developer literally says “People wishing to get the small speed gain available for contiguous cases are assumed to be knowledgeable enough to write the simple loop themselves (bypassing the iterator entirely).” referring to making a change in the Python/NumPy source/C code. This seems, frankly, more like laziness on the part of the designer/engineer, than good design.) “

Make: automating tasks

We’ll work through the Make tutorial created by Software carpentry.