Living documentation

Introduction

Over the course of last few years I’ve seen many papers documenting various systems. Needless to say, every printed one was obsolete in one way or another. The first way the document gets obsolete is a functional one: during system’s lifetime so many changes were made on the production enviroment that in the event of system failure and required reinstall from scratch by following the original documentation, the system would be considered non-functional because so many components would not work as they did before the crash. The other way the document gets obsolete is by it’s static (printed, remember?) nature: what good is a documentation that instructs you to install obsolete versions of software with known and already successfully exploited security holes?

Let me try to explain how I believe we should mitigate the shortcoming of static documentation mentioned above.

Possible solution

Currently the only possible solution that seems viable is living documentation. What is that?

Living documentation is a system of various documents, source code documentation, configuration documentation etc. with the ability to track changes over time. The ultimate goal of living documentation is to describe the system as it CURRENTLY is. Not as it was at the release date a couple of years ago. No, what is important is the state of the system NOW.

How to achieve

I shall throw inhere some of the buzzwords of late: test-driven development, continuous integration, VCS, configuration management systems, in-line source code documentation, API documentation generators… We will discuss each of these in an appropriate context.

But first off, we can divide parts of a documentation that describes a certain system into several categories:

  • main documentation
  • source code documentation
  • source code changes
  • application configuration
  • application configuration changes
  • inter-system access credentials
  • systems configuration
  • systems configuration changes
  • descriptive VCS commit messages

a) Main documentation

This should be the primary source of documentation for people who are unfamiliar with the system. It should give them a basic overview of the whole system, then give pointers where to look for additional information or in-depth explanations. These pointers can be actually anything: other documents, VCSs, remote URLs. Online collaborative solutions seems best for this particular purpose (Wikis and such). Static documents in binary/propertiary format should be avoided.

b) and c) Source code documentation and changes

All source code documentation should be as close to the source code as possible. Therefore, if you use (for example) git as a VCS for your source code, put documentation right next to the source code itself. If not, the gap between source code and documentation will only widen because developers and sysadmins are lazy by nature:). In reality, maintaining documentation right next to source code consumes far less additional time than updating documention which is located elsewhere. Source code documentation can be split into the following subcategories:

  1. Main documentation or overview
  2. Examples
  3. API documentation
  4. Descriptive method and variable names
  5. Inline comments

Main documentation is a set of documents (preferably plain text ones or something that is not binary format for easier change tracking). It should provide a brief overview of the source code structure and functionality. Also it should contain detailed installation instructions. This documentation is primarily intended for code users.

Examples are one of the best code documentation strategies for people who want to use the software and for the ones which are just familiarizing with it in order to start improving it. These examples can and should be combined with automated code tests (will discuss below).

Specific parts of API documentation should be placed right text to it’s source and later the whole API documentation should be generated (see phpDocumentor for example). The thing is, if you have a method definition and just above it there is the comment which describes what it does, what it’s parameters and it’s return value are, there is a higher probability that the comment’s contents will match the method than if the documentation is in another file, which you have to explicitly open and check for inconsistencies. This comments can be effectively used to generate whole API documentation, which can then be used by system integrators who want to interact with the system.

Furthermore, 60% or more of inline documentation needs can be avoided by using descriptive names for methods and variables. It takes some more typing time but this is considered a non-issue since the availability of IDEs with auto-complete functionality. Inline comments should be there to describe those parts of code which are not intuitive, or are there for some external reason which is not evident by software design itself, or which you think the person who will be modifying the code in future should be aware of. Inline comments should contain the most intimate documentation about the code.

Everything above was said with objective to maintain source code documentation as current as possible. What is missing is a package of whole documentation (without descriptive names and inline comments). There should be a method of gathering the whole documentation and packing in into single archive, with examples included, so that it can be delivered as a standalone package, included in a distribution or accessible online. This package should also contain a change log, which is already kindly provided by VCS.

d) and e) Application configuration and changes

Never should you commit into your VCS repository a configuration file specific to certain production/development installation of your software. You should provide an example configuration files with the distribution (and they should be under VCS control and always in sync with the software). If application configuration does not change much (or not at all), daily backups of the whole system shall provide adequate configuration storage and change control, provided that you describe why something is configured the way it is in the configuration files itself. If more thorough config management is required, a separate repository or separate branch in original software repository should be used. But keep in mind what follows in the next paragraph.

f) Access credentials in configuration files

There may be various sensitive access credentials in production configuration files, so you should choose your system of configuration file change control carefully. Access to source code and access to inter-system credentials is not and should not be considered the same thing. Access to production access credentials (for example IP, username and password for accessing .com registry) should be strictly controlled.

g) and h) Systems configuration and changes

Just like your source code should be contained in a VCS, so should your systems configuration. There are various tools for this task (http://en.wikipedia.org/wiki/Comparison_of_open_source_configuration_management_software). There should be a “solid” documentation included in mentioned VCS which describes show to bootstrap the whole system and then how changes are and should be automatically applied and systems brought to the latest production state. This also eliminates manual work that is needed to keep individual systems in sync. Of course, manual changes to individual systems should also be logged, there are various solutions for this (https://github.com/bostjan/alog etc).

Configuration of networking equipment should also be contained in a VCS and backed up, if possible. But any decent piece of networking equipment has the capability of CLI or API access to export configuration and relevant data, so this should not be a problem.

The whole “syntactic environment description” should be contained in VCS, as said before. This gives us a nice change log to consult in the event of failure and also facilitates automated testing (described below).

i) Descriptive VCS commit messages

I can not stress this one enough. The more descriptive your commit messages are, easier it will be for you and other people involved in a project to track changes. Without this in minds of everybody you have very slim chances of succeeding. Also, commit messages should descriebe exactly what has been done by particular commit. This also implies that commits should be fairly small or should only touch one relevant part of the system. You should avoid single commits that include solutions for multiple completely unrelated problems.

Exportability

Some parts of documentation described above should be easily exportable into chosen portable document format (i.e. PDF or HTML, wording earlier is unintentional:), especially if it has to be provided to third parties which do not have access to all VCSs which can be used as described above.

Test-driven development

The system of documentation described above implies certain fluidity in the systems. Not that this was not the case before, but here we actually try to deal with it in an effective manner. The consequence is that test-driven development is a necessity.

There should be tests for just about any feature available. These test should be executable in an automated fashion with predefined triggers (each VCS commit), every day with bisection if something fails. Of course we are talking about continuous integration. The changes should be “immediately” deployed to testing environment and tests run against them. If something would fail, notifications should be sent out to appropriate people.

Automated deployments

Deployments to staging and production environments should be fully automated. I can say no more here.

Periodic documentation reviews

This becomes necessary for those parts of living documentation that are more static than others (overviews, etc). They tend to get out of sync with other documentation and there is nothing else you can do about it than to periodically review those parts and update if necessary.

Obstacles

There are various obstacles that will stand in your way if you decide to use living documentation in your environment:

  • it is not a new format of documentation, it is a new form of thinking, a different concept of documentation: some people will grip tightly to the notion of documentation being a MS Word document with everything in it. These people have probably never been responsible for maintaining slightly different versions of system for various clients. Maybe they should taste this for a while to learn why static documentation and empty commit messages will not do.
  • time required: for maintaining tests, in-line documentation, overviews in sync with the real thing, it does take some additional time. If people are time constrained, it is very hard to convince them that documentation is important, despite the fact that it does not have any functional result.
  • dislike of trackability and reviews: this is probably the most important one – people involved must have a positive attitude towards who did what and why. Code reviews should be included in this category. This does not mean code reviews only when something goes wrong, but also and especially before code goes to staging environment. This should have a positive impact on responsibility because it will stimulate people involved in writing error-free code. If you succeed in raising general awareness level about these three things (trackability, responsibility, reviews), you are probably on a right track to uninterrupted sleep during the night.

Conclusion

I hope I have done a decent job in outlining what I think a living documentation is all about. It should be a combination of tools and policies where information about systems it describes is as accurate as possible, changes traceable and all this achievable with as little effort required as possible.

Please feel free to respond with your questions and/or comments below.

One Response to “Living documentation”

Dodaj komentar