The Web Security Topics Series

Callbacks and Events

[...]

Asynchronous Callbacks

Closures make many elegant programming techniques possible, but their characteristic use in Node.js programs is more prosaic. Passing functions to other functions provides a way of coping with the asynchronous operations that are the essence of Node.js.

Consider what might be called the classical way of interacting with the file system, using synchronous operations. A program calls a function from the standard library, control passes to the function, which performs some work and then returns a value. Node.js does support this style for a few operations, including the well-known stat system call for obtaining information about a file or directory. The fs module include a method statSync, which obtains the information synchronously. Listing 33 shows how it is typically called.

Listing 33

var fs = require('fs')

var stats = fs.statSync(path)

if (stats.isFile())
  // do something suitable to a file
else if (stats.isDirectory())
  // do something suitable to a directory
  else
  // do something else

The object returned by fs.statSync has various methods that allow the calling program to determine whether it is a file or directory, as illustrated in Listing 33, and some other properties that contain useful statistics, such as the number of bytes in the file and the time it was last modified.

While the file statistics are being gathered, nothing else can be done. For this operation, that is unlikely to be a problem, because it should not take long. However, if the same style of call is used to retrieve records from a database, the main program will be stuck until the retrieval is completed, which could be a lengthy process. A worse case arises if a program sends a request to a remote host and must wait for the response. The response may never arrive if there is a network failure, in which case the program will wait forever.

In contrast, when an asynchronous call is made, the operation is performed in the background, using some system-level mechanism. Often, a pool of system threads is maintained and the request for the operation is queued up and executed by a thread when one becomes available. Having passed on the request, the original call returns immediately. The asynchronous equivalent of fs.statSync is fs.stat. If stat is simply substituted for statSync in Listing 33, stats will not be assigned a value, because the call will return before anything has happened. However, once the background operation has completed, Node.js will be notified. Asynchronous methods take a function, known as a callback, as an argument. When it receives the notification, Node.js calls the callback, passing any results from the background process as arguments to the callback. (It is common to refer to the callback “firing” when it is called in response to the notification.) Hence, a program using the asynchronous method fs.stat will have the form shown in Listing 34.

Listing 34

fs.stat(path, function(err, stats) {

    if (err)
      // handle or throw the error
    else {
      if (stats.isFile())
      // do something suitable to a file
      else if (stats.isDirectory())
      // do something suitable to a directory
        else
        // do something else
    }
})

The object containing the retrieved information is passed as the second argument to the callback passed to fs.stat. By convention, callbacks passed to asynchronous methods in Node.js and frameworks built on it take an Error object as their first argument. The callback can examine this argument to determine whether anything went wrong during the operation. Callbacks are almost always written as anonymous functions, appearing inline in the call to the asynchronous method.

It is often the case that callbacks call asynchronous methods and these require further callbacks, which in turn might call asynchronous methods, and so on, leading to deeply nested and hard-to-read anonymous functions. Consider the relatively simple example in Listing 35.

Listing 35

var getPaths = function(path) {
  fs.stat(path, function(err, stats) {
    if (err)
      console.log('!! ' + err.message)
    else {
      if (stats.isFile())
        fs.realpath(path, function(err, resolvedPath) {
          console.log(resolvedPath)
        })
      else if (stats.isDirectory())
        fs.readdir(path, function(err, files) {
          files.forEach(function(f, i, fa) {
              fs.realpath(path + '/' + f, function(err, resolvedPath) {
                if (err)
                  console.log('!! ' + err.message)
                else
                  console.log(resolvedPath)
              })
            })
          })
        else
          console.log('something else')  
    }
  })
}

All this function does is take a string that represents a path, determine whether the path identifies a file and if so print the file’s full path name, or if it represents a directory, walk through it printing the full paths of all the files in the directory. All the methods of fs that are called are asynchronous and require a callback. (You will notice that the methods in this module do not follow the usual convention for naming methods, but are named after the corresponding Unix system call.) The forEach method of Array also takes a callback argument, although it is not asynchronous.

Once you are used to it, this style of programming is not much more difficult to follow than a simple sequential version based on synchronous methods would be. However, the concurrency permitted by asynchronous operations can be confusing. For example, it might look as if the paths of the files in a directory will be printed in the order they are retrieved by fs.readdir, which will be alphabetical, but there is no guarantee that this will be the case. The fs.realpath operations will be called in that order, but there is no reason to suppose they will all take the same time, so they might finish in any order. It is likely that the order will not be the same if the program is run twice in succession.

If you wanted to traverse two directories, you might be tempted to write code like the following:

console.log('==== Images ====\n')
getPaths('./images')
console.log('==== Scripts ====++\n')
getPaths('./scripts')

The output would be

==== Images ====

==== Scripts ====

followed by a list of all the files’ paths in an arbitrary order. All the asynchronous calls inside getPaths would return immediately, allowing the headings to be printed before the callbacks fired. Once you start working synchronously you must do so consistently, so in order to achieve the required result, getPaths would need to be rewritten to take a callback. This callback needs to fire when the method has processed its argument. In the case of a directory, this will not be until the final call to fs.realpath terminates and, as just noted, there is no a priori way of knowing which call will terminate last.

[...]


End of Extract


[Extracted from Javascript on the Server Using Node.js and Express by Nigel Chapman and Jenny Chapman]