Simone Conte

Improve your code and not only - Academia and Industry opinions

There is nothing wrong with goto -- don't use it unnessasarily of course. See for example http://koblents.com/Ches/Links/Month-Mar-2013/20-Using-Goto-in-Linux-Kernel-Code/ , for uses of goto in the linux kernel. I teach CS2002 (the first course, I think, where students come across goto), and in C got is often unavoidable without extreme pain.

I think for early students, the most important thing is to try to make their program compile and run (so they can test it) as early as possible, then keep iterating by adding small features and retesting.

Automating these tests will make things easier, but the important thing is to try not to write a few hundred lines of code, and only then try to make their code compile / run.

Chris

Testable definitely - a tested method is not a class test and a tested class is not necessarily one that functions after integration.
Every predicate should be tested at least 2 ways but the combination effect usually means that that is hard to do. Adhoc testing as well as testing to scenarios as well as strict metrics - statements, predicates, code sequences etc and file calls should all be done

Regards,
Ishbel

I have a few suggestions about good programming practices that might help for your workshop. Quite a few of the ideas are taken straight from Clean Code, by Martin Fowler, which I definitely recommend.

1. Think hard about how to name things. Names for objects and variables should describe what it is so that you don't need to refer back to its definition. It should describe what it is now, not what you want it to be eventually. I've worked in projects where one key class was badly named so that it could be interpreted as the superclass of an unrelated class. This makes code hard to understand and can be very hard to undo (particularly if its the name of a production database table).

2. Single Responsibility Principle - objects and functions should do one thing. This means that you should be able to describe them without using 'and' / 'or'. Typically, it means functions shouldn't be more than 5 or 6 lines of code and shouldn't be indented more than once - so only one level of loops or conditional branching. This makes code much easier to read and reuse.

3. You shouldn't need many comments. You should be able to understand the intention of the code by reading it, due to its structure and the variable/function names used. Unit tests should serve as most of the documentation - a good testing framework will encourage you to write your tests in such a way that they document the behavior of the code.

To summarise, write code assuming that it is going to be read. In long term projects, much more time will be spent reading a file than writing it. Code should not be surprising - it should do exactly what it says it's doing.

Hope this helps,
Daniel

I thought a quick way to respond to this would be with excerpts from feedback I've given on coursework over the years. This is all long ago that no one who received it should still be here:

At CS1002 level:
Week1:
We're supposed to have anonymous marking, so please use your matric number rather than your username in folder names and so on. The practical instructions didn't get updated I'm afraid.

Please try to include as few unnecessary files in your submission as possible. For instance this week all I really need is the PDF of your report and MyFirstApplication.java.

Week 2:
"Your report doesn't actually report on what you did. You should include a brief narrative section explaining how you developed the software, what decisions you had to make, any problems you encountered, and whether you are claiming to have attempted the extension material."

"Your comments are mainly not helpful. They either repeat what the code says, or are actually wrong. For instance


        fred.penColor(Color.blue);
        // the colour of the pen has been set to blue

adds little to the code, while

 tom.bodyColor(Color.red);
        // the turtle's body colour has been changed to blue

is just wrong."

"Your level of commenting is somewhat excessive for my taste, and largely simply repeats the code, sometimes incorrectly, for instance:


    DrawingWindow.add(Martin);//imports Turtle Donald into Drawing    Window Object

You should use fewer comments, and try to ensure that they add something to the code, even for a Java literate reader."

"You don't need to include your code in the report for me. I'm happy to read the .java." - different tutors tastes vary in this though

Another common theme is poor code is repeated work. I have seen programs that read a 1000 line file 1000 times, once to process each line.
On a smaller scale the same batch of CS1002 feedback includes:

"There is no need to create the DecimalFormat each time you use it. You could create just one at the start."

Jumping ahead a few weeks through those comments we see more sophisticated issues:
Week 12:
"It would be interesting to know why you chose to use Graham's UML rather than your own. What problems did you foresee in trying to implement your own design and what did you learn from this?
Also, while you say it was hard to work within Graham's design, why? It would be really nice to see examples of the sort of problems you encountered."

I could summarise this one as "report the process, not the finished product"
"Code isn't bad, but some things could be more concise:


        public void addChange(int change) {
        

          this.takings = takings + change;
        

        }

could be


        public void addChange(int change) {
        

          takings += change;
        

        }

Where you use an attribute as an absolute constant like MAX_ROWS it is usual to declare it static (so that only one copy is kept) and final (so that the compile knows it can't be overridden or changed. This allows the compiler to optimize better."

Another common theme is separating the UI from the body of the code. I wrote:
"Extension code is good. It could be improved by separating the calculations from the I/O more. Instead of methods to compute and print the change, define classes to represent bunches of coins and let the UI decide what and where to print.
Generally excellent work. As you move to more complex things over the rest of your degree, try to organize your programs to separate functionality from UI throughout. Write flexible libraries of code to model things and do computations and then when they are working, build your application (with its I/O and user interface) as a thin layer on top of them."

A personal dislike is textual output included as screen shots - which can be hard to read - just cut-and-paste the text into the report.

Some general feedback on an honours-level pen-and-paper theory coursework:
"General Feedback

1. In this kind of work, it is important to use language (and formulae) precisely. Don't use a word because it sounds impressive unless you're sure it means what you want to say.

2. When you write formulae in a proof, make clear their significance: for instance is this something you know, or something you have to prove. Just writing a list of formulae with no words at all is usually wrong.

3. Try to make your descriptions of algorithms distinct from your pseudocode. For instance for insertion sort: The array is maintained in a sorted and an unsorted part. At each iteration the first element of the unsorted part is put in its correct position in the sorted part, other elements being moved along as needed. Avoid just restating the code. Concentrate on things like invariant properties, and a top-level overview.

4. Pick one version of the algorithm and use it consistently in pseudocode, description and example.

5. In writing pseudo-code (and code) use different groups of letters for different types of thing. For instance if i is a loop index, don't use j for a key, use x or a.

6. Quicksort for any serious purpose is done in place, not using additional arrays.

7. You will need to learn to use some form of mathematical word-processor competently. LATEXis free.

8. Do not use swaps in insertion sort. It's a waste.

9. If you're using LATEX. log is \log.

10. To prove something like f + g ∈ O(r + s) you need to FIND a c and an N (and show that they work). To disprove something like f − g ∈ O(r − s) you need to find and f,g,r and s such that NO c and N can work."

I found some comments on an essay on testing that might contain general lessons on essay and report writing:
"Very few people mention any disadvantages of, or alternatives to, testing. Testing costs, and can lead to false assurances of quality (if the tests don't match the specification, or the specification doesn't meet the requirements). It can lead to laziness on the part of the programmers, writing to pass the test instead of meet the specification. Formal specification and correctness proofs are the most obvious alternative approach. There are also radically different testing approaches, such as clean-room.

Many people made arguments about the importance of correct software, but did not connect that to testing. There are other ways to produce correct software, and testing does not always do so."

Which falls into the general topic of making sure an essay makes a coherent argument or narrative, not just a collection of descriptions.

"Please submit essays and similar coursework simply as PDF files, rather than ZIP files. It makes them easier to process."

"You need, to cite sources for far more of the facts you claim. You should, for example, cite a source for the definition of user testing, and perhaps for the fact that modern OS contains millions of lines of code. "

"You have found some good sources, but too much of the essay is made up of quotes from them. Much better to write your own arguments, based on facts obtained from sources, rather than just use quotes from sources as your authority."

"Try and be precise. JUnit as a method of testing does not inherit from junit.framework.TestCase. Classes that use Junit do."

"A Bibliography entry needs to contain much more than just a URL -- author, title, date downloaded at a minimum. You should also strongly prefer high-quality sources such as textbooks or journal articles to unrefereed web pages."

'Don't start "The document aims" Something like "We introduce.." would be better.'

'Your essay presents some concepts and then you reach conclusions expressed by phrases like "in my opinion" and "I feel". In an essay you should prove your conclusions based on evidence, after exploring the arguments for and against, not just give your personal views.'

Finally, some second-level programming feedback:

"Your design doesn't match your code.

Your design implies that the dictionary is loaded for every query (which would be stupid). Your code does the sensible thing and loads it in the constructor.

your design also implies that you don't trust your binary search, (part 4.1.1). This is silly. Get the search right and then you can make your code simpler."

"Your tests aren't bad, but you should explain why you think they are the right set of tests. "

"Generally, you should aim for the simplest, clearest, possible code, with a minimum of special cases to harbour mistakes. Much of the time, if you think it through, you can set things up so that the main case handles everything."

"You should avoid calling compareToIgnoreCase twice in the while loop -- call it once and remember the result. If the strings were very long this could cost you a lot of performance and anyway it's waste -- good programmers hate waste."

"This is horrible kludge, and more importantly an abuse of the interface. I understand what you're trying to do, but the interface documentation says XYZ Your code produces ABC.

a good effect with the actual UI that happens to be on the other side of the interface, but that's not the point. The whole thing about an interface is that I should be able to put different code on the other side of the interface and rely on your class to behave as documented.

Overall, you're good at getting the program to do what you want it to, but that isn't the point (at least in the upper grade ranges where you should be playing). You should produce simple, clean code, relevant comments, matching design and you should, always, always, stick to the documented behaviour of interfaces. "

"The Design/Implementation sections again could be more precise.

You should set out the decisions to be made: how to do the search, when to load the database, etc. and then your choices and reasons.

Similarly, in testing you should actually say what words you tested and why, not just general statements. You should also include the output you got from those tests. Nothing in your report actually shows me that your program works."

"Your design section should report in more detail on your algorithms for the varous problems, especially reverse and so on, which is non-trivial. Design is not just about classes and their relationships. Similarly, you should say more about what you test."

"Your tests run OK, but print lots of messages. This is bad practice. Unit test should normally print nothing, certainly if they pass."

To add to Steve's comments.

0. The marker is not a mind reader.

One of the first pieces of advise I give is that: 'I am not a mind reader'. If it is not mentioned in the report, nor mentioned explicitly then what the student presents is open for interpretation.

1. Report Writing Tool Use, Styling, and Templates

I always advise students in the first year to learn how to use their report writing tool of choice properly, I do not care if it is Word/Writer/Pages/LaTeX/Org Mode/Markdown/AsciiDoc, please learn how to use the tool effectively. This includes using cross-referencing, listings, styles, maths input, figures, bibliography /et cetera/. The default style in LibreOffice doesn't have a code/listing style IIRC.

With that advice, I also advise students to create a report template with the styles required for document elements to be predefined [1]. WIth important elements such as: titles, headings, dates, figures, listings, header/footer /et cetera/. I also advise the creation of a set of default headings for said template. This saves effort in the long run with report generation.

2. Coding Project/Documentation Templates.

I also advise the creation of project templates for coding practicals. These templates contain project boiler plate that is common to all coding projects for example, directory layout, build files, readme files, build instructions, user guides /et cetera/. Such that for each project only a little effort was required to set up the project, and the rest was project development related.

The aim is that with a little effort up front more comprehensive and detailed coding projects can be developed and submitted.

3. Technical Writing

Grammar and Spell checking is something I expect them to do. I do not care which dictionary is used for spell checking (i.e. GB/US/CA/AUS/SA) but at least be consistent in spelling. I sometimes give my students links to the Oatmeal's instructions for common grammar mistakes [2].

Avoid colloquialisms.

Reports can have enclosures (attachments) as well as appendices. Avoid providing complete source files as an appendix in the report, provide them as enclosures. Use code snippets to highlight interesting code samples instead.

Watch out for weasel words, and verbose writing style. I generally point students to the Day and Gastel book for this [3,4]. This book also has advice on the contents of a technical report, and when to add tables and graphs and when not too.

Use fonts appropriately: code in a monospace font (10pt), regular text (10/12pt) in a sans-serif font, headings can be sans--10pt. Latin phrases should be emphasised /et cetera/.

Code listings and terminal output should be included as text in a monospace font and maybe wrapped in a Listing.

4. Status/Results Paragraph

A brief paragraph about project status is good to have at the start of the report, including if the extensions have been attempted and what you have not done. This helps me identify the scope of my marking, and the claims of what you have and have not done.

5. Testing Methodology

When it comes to encouraging students to describe their testing methodology. I advise my students to tell me not 'how they tested their programs', but 'what they tested their programs for'. This is to encourage them to think more about what testing their should involve.

I also advise that salient details should be in the main body of the report, and any detailed descriptions (tables and data) within an appendix.

6. Think about the Markers

When submitting practicals, think of the poor marker who has to assess your work. If you can make their life easier in presentation of your work it can go a long way. For instance, easy means to run and test the program, and status paragraphs. An example would be the Fox and Hounds practical from CS1002 last year. This was tedious to mark, especially, if you had to play the game to determine if the game logic was actually sound. Question to the students would be: 'How could they make it easier for me to mark this?'

7. Avoid Abuse of EasyIn, and write Pure Programs.

This follows on from one of Steve's comments regarding IO/Logic portions of their practical. Obtaining data and using data should be kept separate. I see many students not treating their data structures purely, especially after the introduction of `EasyIn.java`, and thus do not think about providing an API (or programmatic interaction) for their data structures. Some students think early on that `EasyIn.java` is the only way to 'give' information to their programs. Which leads to some students incorporating the UI code directly inside their data structures rather than outside. For students this makes programmatic testing harder, and some students will spend a decent amount of time interacting through a console instead of an API.

Jan

[1] https://help.libreoffice.org/Writer/Templates_and_Styles
[2] http://theoatmeal.com/tag/grammar
[3] http://www.amazon.ca/Write-Publish-Scientific-Paper-Edition/dp/0313391971
[4] https://www.cs.umd.edu/~nspring/software/style-check-readme.html

Some great advice, let me just add a couple of thoughts.
Shipping code wins. Always. Spending time finessing/purifying/re-factoring/cleansing a solution when it could be out adding value to customer's lives is wasteful. Even shipping the wrong thing earlier can help you get to the right thing sooner.
The user doesn't care what the code looks like, whether it has neat indentations or is factored in a way that is easily maintainable or whether there are goto statements, whether you had tied goat at your desk, what development methodology you practiced. They care about their world and how the stuff we write makes their world better.
Good enough is *always* good enough. Pick your battles, don't over-optimise and know what needs optimised.
Get it right the first time. Write the code with an audience of peers in mind - are you be proud of it?
Don't judge people based on code. You don't know under what conditions the code was written, there was likely an idiot manager forcing the developer to compromise.
On existing code, if it is working, don't fix it. Even if it looks wrong, assume the developer had reasons to write it that way. Don't touch it unless you really have to.
If you really have to change an existing implementation, be careful - that intractable code that you are refactoring likely includes several bug fixes that cover edge cases that were discovered a couple of days before ship. Assuming you can do a better job is a sure way to repeat history.
Methodology is not religion - it will not solve any problem you have. If you follow it pragmatically, the best it will do is provide a framework in which you can deliver a project within a team (but there is no guarantee of success).
Loosely coupled systems over tight coupling, I cannot think of one case where this wouldn't be preferred.

Oh, and finally, shipping code wins. Always.
Adrian

I think choosing technology/platforms wisely is quite a big one.
Also timing is important.
For a brand new open source library or new language maybe only use it in your spare time until it is a bit more established. Early adopters are necessary but you have to choose carefully when to jump in.
For a brand new OS or device you want to be using it from day one so that you are seeing what customers will soon see.
But for production code it's often best to let others take the time to find bugs and let the platform stabilise before you start using it. Using more established technologies means there is more chance of commercial support, you get better documentation, stack exchange has FAQs in it, etc. etc.
You can also take that too far and keep using stagnant platforms for far too long.
Like everything single thing in engineering it's a trade off.
Other people have covered programming best practices in detail. I just wanted to go a bit more meta-level.

Neil

If you're talking web services then I'd suggest you look at The Twelve Factor App

Nick

Avoiding globals is a good one. Perhaps one way to think about this is in terms of what happens when someone needs to use your code with two different configurations in the same application at the same time. Each instance's configuration needs to be independent.

Encapsulation is important. A fair bit of this is about avoiding hard-coding tight coupling between implementations. Ideally data is exchanged between classes through well-defined interfaces so that the implementations can be changed over time. When one class can reach into another, that other class now has a dependency on the class that uses it–you can't change how it works without also re-designing the caller.

Unit test, unit test, unit test. Every time I write a body of unit tests, I catch some mistake or oversight. Most of these are rainy day scenarios, but sometimes they're unexpected behavior. One other benefit of unit testing is that it often reveals opportunities to improve factoring. If you're having trouble writing the unit test, you may have a class or method that's doing too much. By breaking the problem down a little more, often you improve the readability, the simplicity, the reusability of the sub-parts, and the testability.

Review each other's code. The discussions that come up will help everyone improve. It's not that there's necessarily a right way, but the back and forth with colleagues about the approach and alternatives can be a great way for both parties to learn.

Always appropriately manage resources. In C++, this means using shared pointers, auto pointers, and stack based resource release techniques. In Java, this means aggressively closing file handles and other system resources when they are no longer needed.

Be clear about the ownership and lifespan of objects. This is most important when reference counting is not used, but it can apply to any language. One shouldn't use an object after it has been cleaned up. In C++ this can happen when an object has been deleted and a pointer to it is still in use. In Java, this would more likely occur with JNI or when an object has undergone a state change to make its use no longer appropriate, such as removing an element from a DOM.

Have a good workshop, Andrew

One of the bibles on the Design Patterns is: http://www.amazon.com/Design-Patterns-Elements-Reusable-Object-Oriented-ebook/dp/B000SEIBB8
-Lee

I've resisted replying up to now because I could rant on for a very long time, so just a few things.

1) Keep functions small and accurately named. The better you are at this the more your code will read like a story and the easier it will be for others (and you) to understand.
2) Think about your class hierarchy, keep it shallow - composition works better than inheritance in a lot of cases.
3) Make class data private if possible, and don't break encapsulation by providing public getters and setters for data without a good reason to do so.
4) Make member functions as private as you can get away with and final unless they need to be virtual.
5) Don't put instance data in interfaces (interfaces are for function declarations).
6) For cross platform code avoid writing #ifdef mac/windows/whatever as much as possible. Your code will be an unreadable mess and it doesn't scale as the number of platforms supported increases. Make interfaces cross platform and provide platform specific implementations in separate files instead.
7) Write code once if possible - if you find yourself writing the same code twice (or cutting and pasting some code) use a function instead.
8) Format (indent, layout) your code in a standard way for the language you are using. If it doesn't look like c++/java/whatever from a distance you're just making it harder for everyone else to read.
9) Never ever write 'goto' in code. (if you ever even contemplate using longjmp and setjmp then software engineering is not the career for you).

Cheers,
Peter

1. Make your dependencies explicit. If you need something, make the caller pass it in (either in a constructor, or in a method being called). Without this you have to get your dependencies from somewhere, and before you know it you have singletons, global functions and statics everywhere, and your code is extremely hard to reason about and completely untestable. Explicit dependencies also make it easy to spot questionable design decisions - if you have to pass 12 dependencies into a class then it's very clear that you have a design issue. If you aren't explicit about dependencies you can have code that reaches out and gets 12 dependencies, the design is still poor, but it's much less obvious (which is a bad thing).

2. Follow the principle of least astonishment - try to write code that wouldn't surprise the caller. E.g. name things to describe what they do, don't do things that a caller wouldn't expect. For example, if you have a method called "GetLoggingFolder()" then the caller would expect it to return a path to a folder that they can log to, they wouldn't expect it to create the folder on disk, possibly fix permissions on it, set up some global state critical to the application, and then return the path to the folder. (Guess where that example came from...)

3. Assume that no one will read your comments/API documentation. If you find yourself writing a very long comment explaining all the different things a method might do, all the things that you have to do before calling it, what you have to do after calling it, etc. then you're probably doing something wrong. Ask yourself if there is a way that you can redesign the code that doesn't need as much explanation.

4. Make it easy to use your API correctly, and hard to use it incorrectly. Every time you add an assertion to check for API misuse, ask yourself if you can make it harder/impossible to misuse your API instead. E.g. if you write a method that takes a pointer to an argument and then asserts that it isn't null, maybe you could take a reference instead? Or maybe you can use a compiler annotation to ensure that the pointer isn't null. If it doesn't make sense to call Foo::doX() until Foo::initialise() has been called, then maybe you can change your API to only give out initialised Foo objects? If you have an API that expects a URL encoded string, can you make it a compiler error if someone passes in a string that isn't URL encoded? Use the type system where possible, failing that, use augmentations like compiler specific annotations.

5. Don't overuse strings. A URL is not a string, a filesystem path is not a string, a Base64 encoded string is not a string, etc. They can all be represented by strings, but try to use different types if possible (and not just typedefs, which boil away to nothing in C++). Otherwise before you know it your passing URLs into things that expect paths, and at that point you might as well not be using a statically typed language.

6. Lean heavily on your tools. If you are writing code in a statically typed language then you're gaining type safety, possibly at the cost of productivity. If you're turning your warning level right down and not running static analyzers etc. on your code then you're missing out on a lot of that safety, and just losing productivity instead. Get your code compiling without warnings, turn on warnings as errors, and turn the warning level right up. Fix warnings as you hit them, suppress any that are bogus (though don't suppress warnings lightly). Do the same for any static analyzers/linters that you can find.

7. Write tests as you develop the code, or ideally before. Your tests are the first client of your interface, if you can't write easy to understand tests for your new code then your interface is poor.

8. Design systems in layers, so that they can be composed, layers can be swapped out, etc. Try to keep your layers simple - the less they do the less likely you are to want/need to replace them at a later date.

1. A method has only 1 return statement (except for find-type methods which can have a quick exit in the loop). This makes use of the debugger easier and keeps methods small.

2. I call the variable I return from the return statement 'result' no matter what. Again, I think this encourages small single-purpose methods where the subject is implicit.

3. I do *everything* I can to eliminate conditional code. Every time you branch the logic you double the surface area to be tested. Polymorphism and small methods are your friend.

4. Lots of indentation (> 2 levels) is a bad smell - I would tend to extract the indented code to another private method.

5. If you find methods that don't rely on the enclosing objects state, they probably belong elsewhere (as instance methods on another class).

6. Avoid null like the plague - it causes more of 3). If a method returns a collection of things, don't return null, return an empty collection (preferably immutable).

7. Coupled with 6) - use preconditions to alert integrators of your expectations. This prevents the proliferation of nulls.

8. Use interfaces at subsystem boundaries, and keep them simple.

9. Prefer meaningful method names over comments (works well with the many, short, methods approach)

10. Log things that should never happen as warnings.

11. Avoid mono-static singletons, prefer a registry of instances that can be gotten by name. These simplify testing.

12. One of my pet peeves; If I see a class called Constants, I'm on high-alert :-) Constants should be an enumerated type, or exposed on the domain object to which they relate IMHO.

13. Use the right tool for the job. Know a dynamic language (preferably ruby) well!!