Code Commenting And PHP Documentation Generation

Code commenting and PHP documentation generation

Why do we need comments in code? How to write them? Where they are necessary and where they are not? How to comment code correctly? How to create the same documentation style for all members of the team? What are the tools for documentation generation? I will try to answer all the questions and share with you my ideas about this question.

So, there are two types of program documentation. The first type is to write comment in the code itself. The second variant is when the third party tool or repository is used, for example WIKI-engine, where the principles of application operation, the usage examples, the interaction modules are described, it is also provided with flowcharts and diagrams, generally speaking everything that you can’t write in code.

Documentation Location Variants

Let’s start with documentation within the program code. Though, this is not the aim of this article. In the open source projects, we often can notice that the documentation articles are kept in the same repository as the base code. For example, in the PHP fake fixtures library the documentation is in the README file, and if you want to read it to the end, you’ll have to scroll a little bit.In Guzzle, popular HTTP client for PHP, usage instructions are kept in a separate docs folder. Keeping documentation close to code is good and very handy. Downloading a vendors package once, you have code and documentation. If your library is not big, if it is stable and doesn’t involve frequent API changes in the future which result in permanent documentation rewriting, you can safely place the documentation in the repository of your project.

But everything has its reasonable limits. For example, if you are planning to create your own framework which is written by the team of developers and plan permanent releases, the framework has to be fully documented and what is more, the documentation must be translated into several languages. In this case putting documentation into repository of the project is not quite correct. Because constant corrections, updates, translations and debuggings would be quite typical. They will cause a large number of commits — fixes that spoil the history of the project. Navigating commit history when code changes are lost documentation changes is complex and uncomfortable.

In this case it is better to create a standalone repository for documentation, for example, as it was made for Symfony. GitHub, GitLab, Bitbucket also provide built-in WIKI tool that is attached to the project and is not a standalone repository. But you can also access it via Git which means you can copy all documentation, edit it in the editor you personally prefer, group modifications into commits, send them to server and get new comments.

Here’s an example of a well-organized WIKI for D3.js visualization library. Of course, it is possible to create a web site for a product and to put its documentation there. But if you use one of the methods described above, then you will be able to generate documentation web pages from your Git or WIKI repository — there’re tools for it. If you prefer overall solutions, you should pay attention to Confluence by Atlassian. It has much more features than WIKI-engine.

Commenting Code Inside The Code

Now let’s get back to documenting code in the code itself. I am writing this article based on my own experience but recently I’ve read “Clean Code” by Robert Martin so I’m going to cite this book when it’s relevant. The first message from Robert Martin is that the comment is the sign of failure. Comments are written only to indicate the wrong of a programmer, who couldn’t clearly express his idea through the programming language. The process of code analysis is a very broad concept and it goes far beyond this article. But let me share with you a trick for writing a really good code: you should write it so that it could be read as if it was sentences. The object-oriented programming is much more easier than functional, a widespread practice of naming classes with nouns and methods with verbs makes code look more natural. For example, we have a rabbit and let’s describe some of its basic functions as if they were an interface:

interface RabbitInterface
{
    public function run();
    public function jump();
    public function stop();
    public function hide();
}

We simply create one object from Rabbit class:

$rabbit = new Rabbit();
$rabbit->run();
$rabbit->stop();

Code is easily readable. Method run makes rabbit run, method stop is an intuitive command, it stops the previous action and rabbit stops. Now let’s teach our rabbit some tricks and make him run a fixed distance, which we will pass as parameter to the method run.

$rabbit->run(100);

And he ran... But we can’t get what 100 means. Does this number mean minutes, meters, or foots? It could be fixed by means of commentary:

// Rabbit have to run 100 metres
$rabbit->run(100);

If a rabbit starts to “run” in several places and strings of your code, each of them needs additional commentaries. Commentaries will be duplicated and should be maintained in several places at once. The first thing you can do to avoid commentaries — is to replace number by variable.

$metres = 100;
$rabbit->run($metres);

In this case you don’t need a comment, because the code readability becomes a little bit better and you can see that the rabbit runs for 100 meters in the code. But the best variant will be to add context to the method name.

$rabbit->runInMetres(100);

Rabbit is a noun, run is an adverb, in metres is a context, which we add to the method so that it captures the essence. It is possible to write methods using this scheme.

$rabbit->runInSeconds(25);
$rabbit->runTillTime(new \DateTime('tomorrow'));
$rabbit->runTillTheEndOfForest($sherwood);

They will capture the essence of the method without additional comments. Just give your variables and methods correct names and you will reduce the amount of unnecessary commentaries in your code. Robert Martin gives a good advice on this point:

Don’t spend your time writing comments, which explain the mess you created, — spend it to fix it.

What if the comment is too long? How to turn it into the method name? You shouldn’t be afraid of long method names. Method length should be appropriate to capture the essence and don’t turn the method into an unreadable text. These methods are OK in this regard:

$rabbit->runUntilFindVegetables();
$rabbit->runForwardAndTurnBackIfMeet([$wolf, $hunter]);

This is too much:

$rabbit->runForwardUntilFindCarrotOrCabbageAndTurnBackIfMeetWolfOrHunter();

This method is hard to read, the architecture is incorrect. It can be refactored, for example, like this:

$conditions = new Condition();
 
$untilCondition    = (new Condition\Until())->findVegetables('carrot', 'cabbage');
$turnBackCondition = (new Condition\TurnBack())->ifMeet('wolf', 'hunter');
 
$conditions->add($untilCondition)->add($turnBackCondition);
$rabbit->run(Direction::FORWARD, $conditions);

There are also exceptions in the length of method names. For example, when you write specifications for phpSpec, you may have no limits in the method length, the main thing is for your code to capture the essence. Here’s a code example from phpSpec documentation:

class MovieSpec extends ObjectBehavior
{
    function it_should_have_john_smith_in_the_cast_with_a_lead_role()
    {
        $this->getCast()->shouldHaveKeyWithValue('leadRole', 'John Smith');
    }
}

In specifications underscore is used in methods names and it’s easier to read long sentences this way. It doesn’t correspond to the PSR standard where camelCase is used, but it will be ok for readability of the tests.

The sense of appropriate length of the methods will come with time and experience. You may also look at examples from popular frameworks and libraries.

Comments Characteristics

Out-of-dateness

Very often programmers change code, but forget to change comments. Especially when several programmers work on the same section. There are comments but they are written by one of the programmers and others don’t dare to change comments written by another programmer or too lazy to do this, or just don’t pay attention. As a result, these old out-of-date comments will only confuse the newcomer. The solution of this problem is quite simple. Alternatively, always pay attention to the up-to-dateness of commentaries but it will require your attention and effort. Or simply delete old comments. No comments is better than old outdated comments.

Redundancy

It means that comment is needless and is written in place where everything is clear without a commentary. Here’s an example of code with extra comments:

// Cut the carrot into 4 pieces
$piecesOfCarrot = $carrot / 4;
// Let the rabbit eat all pieces of carrot one by one
foreach ($piecesOfCarrot as $pieceOfCarrot) {
    $rabbit->eat($pieceOfCarrot); // Rabbit eats the piece of carrot
}

If we remove comments, code will remain clean:

$piecesOfCarrot = $carrot / 4;
foreach ($piecesOfCarrot as $pieceOfCarrot) {
    $rabbit->eat($pieceOfCarrot);
}

Incompleteness

While writing a program, you can quickly write down your idea by placing comment in the code. When later you get back to that piece, the comment will remind you about your thought and you will be able to continue. After your idea was turned into code, you should remove the incomplete commentary or complete it. In other words, don’t make readers wonder what did you mean. As an example let’s start describe how rabbit eats:

public function eat($food)
{
    switch ($food) {
        case 'carrot':
            $this->getCalories(50);
            break;
        case 'cabbage':
            $this->getCalories(100);
            break;
        default:
            // If the rabbit eats unknown food - it dies :(
            break;
    }
}

What does “the rabbit will die” comment mean? It is clear when it happens in the real life. And what about the program? What did the author wanted to do after this? To release the memory taken by rabbit? To mention an exception and then finish it in another piece of code? In this code with rabbit nothing happens, the rabbit simply doesn’t get new calories when eating anything other than carrot and cabbage. But for a newcomer, who will be finishing the code, the author’s idea will remain unclear. It is likely, that a newcomer will delete the commentary and make it in his own way.

Unreliability

All men are liable to make mistakes. But programmers make them not only in code, but also in the comment blocks. Because of inattention, tiredness or lack of foreign language skills, comments are in a mess and confuse others. Unfortunately, it happens. The only advice I can give you is to be responsible for your comments. If you decided to write something, you should write it correctly. You should be a perfectionist while writing comments.

Non-obviousness

When unknown or non-obvious terms are used in a certain piece of code.

// Uses coefficient of rabbit growing per day, which depends on several factors
$rabbit->growInSize();

Here we can see that the growth of the rabbit is determined by some index, which depends on some factors. In this piece of code it is unclear what rabbit growth index is and how we can calculate it. It’s better to remove this comment and place a more detailed one after function.

So, We Shouldn’t Write The Comments At All, Right?

We should write them, but we have to take responsibility for them. There are crucial moments when they are necessary.

Informational Value

In some places comments are required. When it is necessary to explain the algorithm or when a group of programmers had to use “hacks” in the code and to leave a comment about it. To describe the reason why it was made, what the commentary is about and when it has to be corrected. But you should try to choose correct names for your variables and methods.

Regular expressions make me numb and I have to spend lots of time to understand them. In this case the informational won’t be excessive.

// Find all rabbits in locations which
// end on: shire, field, wood
// starts on: yellow, green
// and are not case sensitive
// e.g. Blackshire, Greenfield, Sherwood, SHERWOOD, wood, Yellowstone
$locationsRegExp = '/\b(yellow|green)\w*|\w*(shire|field|wood)\b/i';
$rabbits = $search->findRabbitsInLocations(locationsRegExp);

Intentions

There many ways to solve the same task in programming. Each programmer has his own programming style, that’s why it can be difficult for him to scan through code written by somebody else. If you prefer a certain programming style or if you know from practice that algorithms you use are difficult to read, put some help text before the complex piece of code.

Notifications and warnings

There are some cases when you can’t use certain function, for example:

  • the necessary extension wasn’t installed in production or the vendor wasn’t updated;
  • it takes too long to execute one of the functions and it is better not to launch it;
  • because of the high resource requirements you can’t perform the cycle more than a certain number of times.

If you face such situation, comments will be very useful.

Gain

When a certain line of code is so important that you have to pay extra attention to it. I faced a problem once when multibyte functions encoding wasn’t set on staging and spent lots of time on searching and solving this issue. When the problem was solved, I added to my code a parameter manual with a commentary explaining why I did it:

// Set default encoding for MB functions manually to prevent cases when it is missed in config
mb_internal_encoding('UTF-8');

One more advice from Dean Martin:

Don’t comment a bad code — rewrite it.

When you come across an unclear code and spent a lot of time trying to understand it, and then leave several comments for the next developer, you should understand that the code won’t become better. In this situation, if you understand the code, try refactoring it to make it more readable. The motto of the boy scout is “Leave the campground (code) cleaner than the way you found it”.

Making Documentation With The Help Of Doc.blocks

There is a separate kind of PHP comments that has its own standard — it is called DocBlock). There is a tool phpDocumentor (also known as phpDoc) for processing docblocks. It can read docblocks from code and create documentation based on them. DocBlock is a combination of DocComment and descriptions placed in it based on PHPDoc standard. There is a support of C-type multiple line comments (DocComment):

/*
 * It is
 * a C-style comment in PHP
 */

DocBlock is distinguished by the additional “star” sign /** in the beginning of the comment.

/**
 * It is
 * a PHP docblock
 */

It is possible for DocBlock to have only one line but it must begin with /** nonetheless.

/** It is also a docblock */

PHP Doc standard for DOCUMENTING PHP CODE is based on javaDoc for Java. An important component of Docbloc are tags and annotations which make comments semantic. Tag or annotations begin with @, for example.

/**
 * Login via email and password
 *
 * @param Request $request Request
 *
 * @return Response
 *
 * @throws BadRequestHttpException
 * @throws UnauthorizedHttpException
 *
 * @Rest\Post("/login")
 */
public function loginAction(Request $request)
{
}

In the example above @param, @return and @throws are PHPDoc tags and they will be parsed using phpDocrumentor. @Rest\Post("/login") is an annotation to FOSRestBundle. The difference between annotations and tags is that tags only document PHP code and annotations add or change code. The distinctive feature of PHP annotations comparing with Java ones is that Java annotations are part of Java while PHP annotations are commentaries and to use them you should use reflection. Maybe in the future annotations will become part of PHP but currently to read them you should use this parser. It’s also worth noticing that if we change the beginning of dockblock from /** to /* this won’t be a dockblock, even if it contains tags or annotations, and the parser will ignore it.

Dockblocks are so widely used in the community of PHP programmers, that PSR-5 (PHP Standard Recommendation) is prepared on the dockblock basis. When I have been writing this article it was a draft copy.

In PHP using dockblocks you can document the following elements:

  • functions;
  • conctants;
  • classes;
  • interfaces;
  • traits;
  • class constants;
  • properties;
  • methods.

It is also important that each dockblock can only be applied to one structure element. It means that each function, variable and class has its own dockblock.

/**
 * Rabbit Class
 *
 * @version 0.1.0
 */
class Rabbit implements RabbitInterface
{
    const STATUS_RUNNING = 'running';
 
    /**
     * @var string $status Status
     */
    private $status;
 
    /**
     * Set `running` status for the rabbit
     *
     * @return $this
     */
    public function run()
    {
        $this->status = self::STATUS_RUNNING;
 
        return $this;
    }
}

There are many tags in PHPDoc but not all tags can be applied to all structure elements. Below there is a list of tags, which are already exist and their use and explanation.

  • @api (method) defines the stable public methods, which won’t change their semantics up to the next major release.
  • @author (in any place) defines the name or an email of the author who wrote the following code.
  • @copyright (in any place) is used to put your copyright in the code.
  • @deprecated (in any place) is a useful tag which means that this element will disappear in the next versions. Usually there is a comment with the code you should use instead. Also, most of the IDE highlight places where old methods are used. When it is necessary to clean the out-of-date code for the new release, it will be easy to search by this tag.
  • @example (in any place) is used for inserting a link to a file or a web page where the example of code usage is shown. Currently phpDocumentor claims that this tag is not fully supported.
  • @filesource (file) is a tag which you can place only at the very beginning of the php file because you can apply this tag only to a file and to include all code to the generated documentation.
  • @global (variable) — at this moment this tag is not supported, may be it will be implemented in the next versions when it is updated and reworked.
  • @ignore (any place) — a dockblock with this tag won’t be processed when generating documentation, even if there are other tags.
  • @internal (any place) — often used with tag @api, to show that the code is used by inner logic of this part of the program. Element with this tag won’t be included in the documentation.
  • @license (file, class) shows the type of license of the written code.
  • @link (any place) is used for adding links but according to the documentation this tag is not fully supported.
  • @method (class) is applied to the class and describes methods processed with function __call().
  • @package (file, class) divides code into logical subgroups. When you place classes in the same namespace, you indicate their functional similarity. If classes belong to different namespaces but have the same logical characteristic, they can be grouped using this tag (for example this is the case with classes that all work with customer’s cart but belong to different namespaces). But it is better to avoid such situation. For example, Symfony code style doesn’t use this tag.
  • @param (method, function) describes the incoming function parameters. It’s worth noticing that if you describe the incoming parameters for a certain function using dockblocks, you have to describe all parameters, not only one or two.
  • @property (class) — as well as @method this tag is placed in the dockblock of the class, but its function is to describe the properties accessed with the help of magic functions __get() and __set().
  • @property-read, @property-write (class) are similar to the previous tag but they process only one magic method __get() or __set().
  • @return (method, function) is used for describing value returned by the function. You can specify its type and PhpStorm will pick it and give you different tips, but let’s talk about this later.
  • @see (any place) — using this tag you can insert links on external resources (just like with @link), but it also allows to put relative links to classes and methods..
  • @since (any place) — you can indicate the version in which the piece of code appeared.
  • @source (any place, except the beginning) — with the help of this tag you can place pieces of the source code in the documentation (you set the beginning and the end code line)
  • @throws (method function) is used for specifying exceptions which can be called out by this function.
  • @todo (any place) — the most optimistic tag used by programmers as a reminder of what need to be done in a certain piece of code. IDEhave an ability to detect this tag and group all parts of the code in a separate window which is very convenient for further search. This is the working standard and is used very often.
  • @uses (any place) is used for displaying the connection between different sections of code. It is similar to @see. The difference is that @see creates unidirectional link and after you go to a new documentation page you won’t have a backward link while @uses gives you a backward navigation link.
  • @var (variable) is used to specify and to describe variables similar to those used inside the functions and for the class properties. You should distinguish this tag and @param. Tag @param is used only in dockblocks for functions and describes the incoming parameters and @var is used to describe variables.
  • @version (any place) denotes the current program version in which this class, method, etc. appeares.

Outdated tags, which most likely won’t be supported in the future:

  • @category (file, class) was used to group packages together.
  • @subpackage (file, class) was used to mark the specific groups in the package.

Not all tags are equally popular. @var, @param, @return, @todo, @throws are most widely used. The others are less popular. And I have never met such tags as @property and @method, because it is dangerous to work with magic!

Ease To Use Dockblocks In IDE

If you are developing an open source project it is of course necessary to document public API using dockblocks. It not only gives you the ability to generate the final documentation but also allows other programmers to comfortably use your code in their IDE. As for your private code for the outsource project, dockblocks usage might seem a little bit irrelevant, but anyway I advise you to use them, because they will speed up your development.

Let’s take the most popular PHP IDE for PHP — PhpStorm. And look at the previous example of rabbit search:

$rabbits = $search->findRabbitsInLocations('/Sherwood/');
foreach ($rabbits as $rabbit) {
    $rabbit->doSomething();
}

What do variables $rabbits and $rabbit mean? PhpStorm knows nothing about this. PHP is weakly typed and the type of function result is not strictly specified in the description (hi there, PHP 7 where it will be implemented). That’s why you should tell your IDE how to deal with certain parts of code using dockblocks. There are several variants. You can do like this:

/** @var Rabbit $rabbit */
foreach ($rabbits as $rabbit) {
    $rabbit->doSomething();
}

Or add tag @return in the method findRabbitsInLocations:

/**
 * @return Rabbit[]
 */
public function findRabbitsInLocations($locations)
{
    // some operations here...
    return [];
}

Please pay attention that we specify Rabbit[] and not Rabbit. Brackets make it clear that the array of class objects is returned. If we remove the brackets it means that the method returns one example of the class Rabbit. You can also write it like this @return null|Rabbit[], the vertical stick means OR, in this case we point out that the method will return the array of rabbits or null.

No matter which way of pointing out the type you chose, now PHPStorm will show you some hints and tips after you type $rabbit-> and wait for a moment.

Code commenting and PHP documentation generation

This happens because PHPStorm knows that in the $rabbits variable is returned by Rabbit objects array. Next in foreach cycle $rabbit variable gets one element of the array which is an example of the Rabbit class and PHPStorm shows you all available public methods from this class. This way you can use classes with public methods written by your colleagues without taking your hands off the keyboard.

PHPStorm will provide you with hints and if the method has a clear name, you will be able to use it even without reading the source code and documentation.

One more useful feature available to you when using dockblocks with PHPStorm are warnings about wrong access parameters. Let’s finish writing the dockblock for one of the methods of Rabbit class:

/**
 * Run in metres
 *
 * @param int $metres Metres
 */
public function runInMetres($metres)
{
    // some operations here...
}

Here we indicate that we have to use an integer for access (in PHP 7 it will be possible to set the number by means of syntax). What will happen if we pass an array in this method?

Code commenting and PHP documentation generation

PHPStorm highlights it and gives a hint that int is excepted here while you are using array. Very convenient, isn’t it? Hints will also be shown for mismatched classes and interfaces. If your method supports several types of incoming arguments, divide them using |. In this example if the method runInMetres() can work with arrays, you may write @param int|array $metres Metres and dockblock will stop showing warnings.

PhpStorm can also can generates dockblocks. Place the cursor on the line above the function, class or variable declaration, type /** and push Enter. IDE will generate a dockblock template you can change a little if you want. You can also run dockblock generation using Alt + Insert.

How Follow Commenting Styles

It’s good if all members of the team follow the rules of commenting PHPDoc. But in practice it’s rarely a case. Only perfectionists adhere to the standard, and also those who have been using dockblocks for a long time and it has become a habit for them. Some beginner programmers want to use dockblocks but sometimes forget to use them or don’t fully understand tags. Of course, there are tough nuts who don’t use dockblocks even if the team uses them.

To reduce the discomfort, you each member of the team should turn on dockblock inspection in the PhpStorm: Settings > Editor > Inspections > PHPDoc and mark all the checkboxes:

Code commenting and PHP documentation generation

Of course, you can’t force everyone to follow the rules. For the laziest people I’d like to provide an advice from “Clean Code” once again (it’s about code formatting, but implications are the same):

The rules must be respected by all members of the group. This means that each member of the group should be reasonable enough to know that no matter how the braces are placed, unless everyone agreed to place them in the same way.

Generating Documentation With PhpDocumentor

Now when everybody follows the rules and your code is full of dockblocks, you can generate the documentation. I write much about phpDocumentor documentation, only show you the most necessary commands, the other best practices on documenting php code you can find at the official website.

So you have to install phpDocumentor. It can be installed globally this way:

$ wget http://www.phpdoc.org/phpDocumentor.phar
$ chmod +x phpDocumentor.phar
$ sudo mv phpDocumentor.phar /usr/local/bin/phpdoc
$ phpdoc --version

Or add it as a requirement to composer.json of your project.

$ composer require --dev "phpdocumentor/phpdocumentor:2.*"

And now when you are in the directory of the project which is full of dockblocks, simply run it from console.

$ phpdoc -d src/

As I already mentioned this is the most necessary set of options for documentation generation, the option -d src/shows the way to the files, which you want to work with. The generated information will be located in the folder called output. Of course, this utility has different specifications and lots of features. Here’s a PHPDocumentor example code and you can choose the template which already exists or create your own.

Generating Documentation With Sami

One more PHP documenting tool based on dockblocks is a utility called Sami. Maybe it’s not as popular but I decided to mention it because the Symfony documentation is generated using Sami. And Fabien Potencier who created this utility mentions it in his blog.

Sami documentation generator differs from phpDocumentor because its configuration is stored in PHP files. In general it has more customization potential than phpDocumentor but you will have to configure it all manually, write some code in other words. You can even redefine different pieces of code which are responsible for documentation generation. All the necessary templates can be found at TWIG plus they are easy to redefine. For more detailed information about Sami go to the GitHub page.

Conclusion

This is just a small review of the problem of documenting code in PHP and commenting it. Here you will find a little bit of everything to encourage the desire for digging deeper into this topic. If you are a PHP beginner, I advise you to read a detailed documentation about phpDocumentor. If you are an experienced developer, you can share some of your personal experince by writing a comment below. ^_^