Website creation guidelines

Guidelines for website creation aim to :

  • Ease the deployment from test server to production server
  • Better integration into the operating system (FHS compliant)
  • Simplify data backup
  • Lessen security risk
  • Improve scalability
  • Leave room for performance improvements

Introduction

Usually when creating a website we use to create our files and put them all inside the server document root. Whatever the file is generated by scripts for a short time or file is uploaded to the server by any user or is simply a static HTML file to be served as is.

This can be dangerous as files might be wrongly interpreted (uploaded file go through PHP interpreter) or this may waste space (backup of cache or temporary files). Thus these guidelines come into play by giving advice on how to keep data separation while minimizing the burden of it.

Identifying problems

We have several issues that can be spotted :t

  • Uploaded files must be stored outside the server document root and be served through the application or by another light web server without any code interpreter.
  • Cache and/or spool files must be stored in a part of the server file system designed to handle those files.
  • Library must be stored outside the server document root.
  • Configuration files must be stored outside the server document root.
  • Template files must be stored outside the server document root.
  • Except for cache, spool and uploaded files the server process must have read-only access.
  • The server process must have execution rights only on scripts and cgi or binary.

Communication with the server

The major issue with this design is how the web developer knows where the sysadmin has allocated space for cache files, uploaded files, …

One solution is that both can agree ” […] put your cache files in /var/cache/website […] ”, but it won't work well. If the sysadmin want to change his server structure, he'll be bound to what he said (and the developer won't change anything now that he has hard coded the path everywhere in his code).

A more elegant solution is to use environment variables. They can be set on a per-webapp basis, they are easily reachable with simple functions and they are reliable. Sysadmin and developers just need to agree on the name of those variables :

  • WEB_APP_CONFIG_DIR : gives the path to a configuration directory [RO].
  • WEB_APP_TEMPLATE_DIR : gives a path to a template directory [RO].
  • WEB_APP_CACHE_DIR : gives the path to a cache directory [RW].
  • WEB_APP_SPOOL_DIR : gives a path to a spool directory [RW].
  • WEB_APP_DATA_DIR : gives a path to generic data directory (for files uploaded or generated to be served by a separate server or the application) [RW].

That's fine, but the sysadmin is not going to be happy with that. For each application that runs on the server, he need to define five variables and their path. This is cumbersome and error prone (he doesn't script as he doesn't trust the machine to do the job better than him).

Let the sysadmin codename the application :

  • WEB_APP_CODENAME : codename given by sysadmin to the application (never print it, just use it. They are things better not to be known)

The codename is the sub-directory the developer will use (for example, $WEB_APP_CACHE_DIR/$WEB_APP_CODENAME)

Try to create sub-directory

The sysadmin is smart, but he may have not created the WEB_APP_CODENAME sub-directory. On read-only directory, you can't, but it's unlikely that they have not been created as the developer provides all the files. But on read-write directory, great chances that sub-directory doesn't exist.

The code

We talk about data, but what about the code ?

The code reside naturally in the server document root. The sysadmin will care of the deployment, but the developer cares about separating dynamic code (like PHP), static code (like HTML) and cgi-bin (like compiled binary).

For the sake of simplicity, the document root is the dynamic directory. Static files are stored as a direct sub-directory (to present an absolute URL like /static/css/design.css). cgi-bin is a sub-directory too and its URL is also first from root (/cgi-bin/query.cgi). This is on the developer side.

The sysadmin knows that static and cgi-bin can be moved around the file system hierarchy as long as the URL is the same. He knows that the index document is inside the dynamic directory.

Even it the index file is a static file (index.html), it must be put in the dynamic directory (but only that file).

Tools

If the hierarchy needs to be cleaned up, a tools directory should be created and a README and/or INSTALL file should be present.

Example

projet-RELEASE1
 |
 +- content  # Dynamic content
 |   |
 |   +- cgi-bin
 |   +- static
 |   +- .htaccess
 |   +- index.php
 |
 +- tools
 |   |
 |   +- clean.sh
 |   +- Makefile
 |
 +- ro
 |   |
 |   +- config
 |   +- lib
 |   +- templates
 |
 +- rw
 |   |
 |   +- cache
 |   +- data
 |   +- spool
 |
 +- .htaccess
 +- README
 +- INSTALL

When the sysadmin check out the project, he knows that :

  • He must read the README file.
  • content is the document root.
  • tools contains some code to be run after check out and that all is explained in the README file. A Makefile is the best option, as the sysadmin has already a Makefile and he can run the developer's one from its own.
  • ro contains read-only data to be copied.
  • rw contains nothing (or testing data that have made their way to the repository) and will be deleted.

About library

Libraries are designed to be used on more than one application, thus should be a separated project. If theses are just “library alike” used only by this project, it should be in content. Logically lib should be empty and README should list which others projects are to be checked out and put in the include path.

But if your project is going to be distributed to non-technically skilled people, fill this directory with libraries in the archive.

.htaccess

Two files .htaccess can be seen.

The one in the project root is designed for people with no others skills than opening the project archive in the server document root and want it to work (set a few access permissions at least). It would contains something like :

<Files README>
Deny from all
<Files>
<Files INSTALL>
Deny from all
<Files>

RewriteRule ^/(.*) /content/$1 [L]

The second contains all that the developer want. It's up to the sysadmin to let htaccess processing or to include this file in the server configuration.

... in tools

Obviously tools should have this .htaccess :

Deny from all

Example (code)

Here is an example how to deal with that structure. This example uses smarty as a template engine :

/* index.php */
 
if(($codename=getenv('WEB_APP_CODENAME'))!==false) {
   if(!empty($codename)) {
      define('WEB_APP_CODENAME', $codename);
   }
}
if(!defined('WEB_APP_CODENAME')) {
   define('WEB_APP_CODENAME', 'project');
}
 
$S = new Smarty();
 
$S->compile_dir = dirname(__FILE__) . '/../rw/cache/compile';
$S->cache_dir = dirname(__FILE__) . '/../rw/cache/cache';
$S->config_dir = dirname(__FILE__) . '/../ro/config';
$S->template_dir = dirname(__FILE__) . '/../ro/template/';
 
$S->plugins_dir[] = dirname(__FILE__) . 'smarty/plugins';
 
if(($cache_dir=getenv('WEB_APP_CACHE_DIR'))!==false) {
   if(file_exists($cache_dir) && 
         is_dir($cache_dir) && 
         is_writable($cache_dir)) {
      $cache_dir .= '/' . WEB_APP_CODENAME;
      if(!file_exists($cache_dir)) {
         mkdir($cache_dir);
      }
      if(file_exists($cache_dir) && 
            is_dir($cache_dir) && 
            is_writable($cache_dir)) {
         mkdir($cache_dir . '/cache');
         mkdir($cache_dir . '/compile');
         if(file_exists($cache_dir . '/cache') &&
               is_dir($cache_dir . '/cache') &&
               is_writable($cache_dir . '/cache')) {
           $S->cache_dir = $cache_dir . '/cache';
         }
         if(file_exists($cache_dir . '/compile') &&
               is_dir($cache_dir . '/compile') &&
               is_writable($cache_dir . '/compile')) {
            $S->compile_dir = $cache_dir . '/compile';
         }
      }
   }
}
 
if(($config_dir=getenv('WEB_APP_CONFIG_DIR'))!==false) {
   if(file_exists($config_dir) &&
         is_dir($config_dir) &&
         is_readable($config_dir) &&
         !is_writable($config_dir)) {
      $config_dir .= '/' . WEB_APP_CODENAME;
      if(file_exists($config_dir) &&
            is_dir($config_dir) &&
            is_readable($config_dir) &&
            !is_writable($config_dir)) {
         $S->config_dir($config_dir);
      }
   }
}
 
if(($template_dir=getenv('WEB_APP_TEMPLATE_DIR'))!==false) {
   if(file_exists($template_dir) &&
         is_dir($template_dir) &&
         is_readable($template_dir) &&
         !is_writable($template_dir)) {
      $template_dir .= '/' . WEB_APP_CODENAME;
      if(file_exists($template_dir) &&
            is_dir($template_dir) &&
            is_readable($template_dir) &&
            !is_writable($template_dir)) {
         $S->template_dir = $template_dir;
      }
   }
}

As the test server doesn't have the same structure as the production server or that the project is going to be deployed on a server with no sysadmin but anybody with sysadmin skills or enough access to the server, we set default values that work almost with no configuration and can be deployed directly in the root document.

Library inclusion

As the library might be accessible by an include path or not, a mechanism like the following could be set :

/* index.php */
 
@include('myLib/myLib.php');
if(!function_exists('my_lib_function')) {
   require(dirname(__FILE__) . '/../ro/lib/myLib/myLib.php');
}

Structure on the production server

We serve our files … we should serve our files from /srv/. As this is web stuff, we would serve from /srv/www or /srv/web. A hierarchy that would allow to simplify the upgrade would look like :

/srv/web
  |
  +- projects
  |   |
  |   +- httpd-includes
  |   |   |
  |   |   +- project1.conf -> /srv/web/projects/project1/httpd-project1.conf
  |   |   +- project2.conf -> /srv/web/projects/project2/httpd-project2.conf
  |   |
  |   +- project1
  |   |   |
  |   |   +- tools # Admin tools
  |   |   +- RELEASE1 # Structure defined
  |   |   +- httpd-project1.conf
  |   |
  |   +- project2
  |       |
  |       +- tools
  |       +- RELEASE1
  |       +- RELEASE2
  |       +- RELEASE3
  |       +- httpd-project2.conf
  |
  +- webroot # root alone looks like /root/, it,s confusing
      |
      +- project1 -> /srv/web/projects/project1/RELEASE1/content/
      +- project2 -> /srv/web/projects/project2/RELEASE3/content/

We use symbolic link to put in production (or a physical link). We can still provide a pre-production server on the production server.

Spool, cache, ...

Spool

Spool should be stored in /var/local/spool :

/var/local/spool/
  |
  +- webapps
      |
      +- project1
      +- project2

Cache

Cache should be in /var/local/cache

/var/local/cache/
  |
  +- webapps
      |
      +- project1
      +- project2

Configurations

Configurations should be in /usr/local/etc

/usr/local/etc
  |
  +- webapps
      |
      +- project1
      +- project2

Templates

Templates should be in /usr/local/share/misc

/usr/local/share/misc
  |
  +- webapps
      |
      +- project1
      +- project2

Data

Data should be in /srv/web

/srv/web/data
  |
  +- project1
  +- project2

Library

The programming language used should already have a directory for that, use the same hierarchy but in /usr/local.

Conclusion

This structure aims to simplify everyone's work, if it doesn't, don't use it.

If you have only static data to serve, the structure can be kept, there's just less directory in use.

 
website/guideline/en.txt · Last modified: 2010/08/05 11:04 by tchetch
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Debian Driven by DokuWiki