By John Maddock.
The following proposal consists of three sections: A list of requirements and objectives that the chosen structure must meet, a set of tools to facilitate working with boost, and an actual proposal for a structure that meets those requirements. In the past I have argued vociferously for a "do as little as possible" approach, however I have somewhat surprised myself by coming out in favour of a radical reorganisation here. In many ways though, the proposed directory structure is less important than its ability to meet the requirements listed below, nor is it the only structure that could arguably meet these requirements (especially as some requirements are contradictory). Finally a couple of caveats: All opinions expressed herein are my own; all ideas expressed herein belong to over people (especially the good ones!). Where possible credits are given, but my memory is far from infallible so speak up if you've been missed out.
Comment: this should speak for itself.
That is a casual user browsing the directory structure should be able to immediately tell what belongs where.
Rationale: some users read the documentation, others wander around aimlessly saying: "I wonder what's in here?", speak up if you recognise anyone!
Rationale: automated tools should be able to glean most of the information they need direct from the directory structure.
Comment: This is probably the most important requirement and guides the choice of many others.
From an end users perspective boost should appear to be a single library, with a single integrated build process etc.
Rationale: This makes life much more comfortable for end uses.
Rationale: some libraries have an existence of their own outside of boost, this should be able to continue.
Rationale: different developers maintain individual boost libraries.
Rationale: as boost grows it may be necessary to split the library into multiple zip file downloads, each download should encapsulate one domain, and provide all the files necessary for that domain (that may mean that some files appear in more than one zip file).
Rationale: some users will want to split off (and maybe freeze) those parts of boost that are being used by a particular project. These sub-libraries can then be checked into the users own version control system (for example into a local cvs repository as a vendor branch), and maintained alongside the users own source for that project.
Implication: that there exists some mechanism for locating and separating off all the files associated with a particular boost library, this should also take into account dependencies (both for headers and for binary dependencies).
For example "
cvs checkout regex
" would check out the regex library alone.Rationale: This makes maintenance much easier especially when working with cvs-branches.
Implication: we could isolate libraries into separate directories, however that's only a partial solution which takes no account of library dependencies (something that's likely to become increasingly important). A better solution is to use cvs module-aliases: as a test case I've defined the regex library as a module-alias (this seems to work very well). In this case I had to specify dependencies by hand (an error prone process), much better would be a tool that produced a list of library aliases to insert directly into the cvs modules file.
There are three kinds of dependency possible:
- Libraries may depend upon the headers from other boost libraries; these dependencies can be worked out automatically.
- Libraries may depend upon binaries from other boost libraries; these dependencies can be worked out automatically (hint: if library X depends upon header H, and header H is from a library Y which has mandatory source code associated with it, then there is a binary dependency from X to Y).
- Some domain specific libraries may depend upon third party libraries (the python library for example). These dependencies can not be deduced, and will require meta-data to describe.
Rationale: these dependencies already exist in the boost library.
That is the library should be usable directly from the checked out cvs tree, or the extracted zip file, without a mandatory install process.
Rationale: For single user installations it is sufficient and often easier to work directly from the zip/cvs structure.
Rationale: For "occasional developers" this simplifies their ability to port/debug parts of the library, and then submit patches based on changes made, without having to get involved with "wrapper compilers" and other tools that have been suggested, which may or may not function on their platform with their toolset.
Implication: that all header files are located together, and not split between multiple library paths.
Comments: during the recent discussion it was suggested splitting the header files into separate directories under "boost-root/src/libname/boost", however this involves specifying a large number of -I options on the command line in order to be able to use boost direct from the cvs tree. One suggested workaround was to use a wrapper-compiler to pass the long list of includes to the compiler semi-automatically. However some compilers are integrated with their respective IDE's (this would make boost almost impossible to use from that IDE), other platforms/compilers have a restricted command line length (mingw32 is a particular culprit), the command line in such cases could easily become longer than the maximum permitted.
We currently use:
#include <boost/something.hpp>
which immediately informs a casual browser of the code that something.hpp is a part of the boost library and separates it from:
#include <rw/thread.h> // this is Rogue Wave library
Rationale: This has worked well up to now and should be continued.
Implication: The boost-root/boost/ directory must continue to exist (although there are possible arguments in favour of making it boost-root/include/boost).
There are several kinds of header that come into this category:
Power user headers: headers that should only be used by experts.
Headers for library reuse: these headers can be used by other boost libraries, but should not be used by end users.
Domain specific headers: large domain specific libraries may have a large number of headers that should not make it into the main boost-root/boost/ header directory (graph for example).
Implementation headers: libraries may have headers that contain implementation code, these headers should never be included by anything except other headers in this library.
Implication: the main header directory may contain sub-directories as follows:
boost-root/boost/library-name/ for all non-end user headers, including domain specific headers.
boost-root/boost/library-name/detail/ for all implementation detail headers.
For example we may want to combine multiple math-related libraries into a single "numeric" domain. In this case each library in the domain would have it's own directory under the domain name directory - for example headers for the rational library may end up in boost-root/boost/numeric/rational/.
Rationale: the aim here is to prevent the number of top level libraries growing to an unmanageable number, and to allow a logical group of libraries to be accessed with a single name (for cvs checkouts or for building part of boost).
That is the name of the root directory in the zip file reflects the boost version number "boost_1_1_9/" etc, subsequent directories - like the boost header file directory - then split off from this.
Rationale: Allows developers to have multiple versions coexisting on their machine within a single directory structure, developers can switch between versions with a by changing their compilers include and library search paths only.
If there exists development or non-reviewed code in the cvs tree then it should not interfere with release code or exist in the same directory tree as the release code. Nor should development code appear in zip files.
Rationale: developers will typically work with either the latest release code, or the latest development code, they should be able to switch between them fairly easily.
Rationale: end users don't generally need to see development code, it unnecessarily duplicates what's already in the library and may lead to confusion as to what's release code and what's still in development.
Implication: There are a couple of ways of dealing with this.
Method 1: provide a subdirectory "
boost-root/development/library-name/
" that internally mirrors the directory structure ofboost-root/
, to contain development code for library "library-name". This has the advantage of being easy to work with, but requires setting multiple include and library search paths, it also complicates multiple development versions of the same library (for example multiple ports to new platforms may proceed in parallel).Method 2: provide a separate top-level CVS directory for development code, development code could then be checked out with "
cvs checkout development"
instead of "cvs checkout boost"
, otherwise this method is the same as Method 1 above, and has the same pros and cons.Method 3: use a cvs branch for development work. This allows multiple development efforts to proceed in parallel, but may be harder to work with and keep in synch with the main branch.
Ideally I see no reason why either method 1 or 2 can't coexist with method 3, depending which method is easier for the task in hand. Personally I prefer (2) to (1), but that's just personal preference.
That is that there is some central directory (let's call it boost-root/src/) that contains all mandatory source files for a particular library in its sub-directories: boost-root/src/library1/, boost-root/src/library2/ etc.
Rationale: This ensures that the source is easily discoverable by the user; for example if a user suspects that there may be a bug in library X, and decides to try and debug the problem, they may want to add all the source code for library X directly to their project to facilitate debugging. (I appreciate that the build process may provide debugging versions of the library, but it is still often easier to add the source direct to the IDE's project, depending upon how well the IDE handles debugging of external libraries).
Rationale: some IDE's have search paths for source files as well as headers etc, this structure shortens the paths to mandatory source files (this is more of a feature request than a requirement).
Rationale: Some file browsers (KFM for example) will automatically display documentation when they see either index.htm or index.html in the current directory. Any other files located in that directory effectively become "hidden" from the user. Whether this is an annoyance or a great feature depends upon your point of view. Separating documentation into it's own sub-directory solves this problem (it happens to make installation of the documentation easier as well).
Footnote: actually KFM is usually quite intelligent about displaying documentation, however it does sometimes get it wrong.
Rationale: Currently most boost libraries are "headers only", those that are not have their own build processes or none at all. This is confusing for the end user, especially as boost is likely to get much larger.
Rationale: Building boost as a single monolithic library is likely to put end users off - especially as boost grows in size - few users will use all of boost in a single project (even if they use all of it at some time or another).
Implication: Build each boost library separately using a consistent naming scheme incorporating the library name and the compiler name: libboost_timer_gcc.so, libboost_regex_gcc.so, lib_boost_thread_gcc.so etc. Provide a monolithic version of the library as an option for those that want a simple life (this is mainly more appropriate for static libraries where unused library code doesn't make it into the executable).
Rationale: some compilers ship with multiple run-time libraries. For example the Borland C++ compiler comes with 6 different runtimes, any third party libraries must be built with the same runtime options as the executable to which it will be linked, failure to observe this rule leads to hard to track down runtime crashes.
Implication: boost libraries must each be built multiple times with the same runtime variants that the compiler ships with. As before name mangling separates the variants:
boost_regex_bc55_cw.lib boost_regex_bc55_cwi.lib boost_regex_bc55_cwi.dll boost_regex_bc55_cwm.lib boost_regex_bc55_cwmi.lib boost_regex_bc55_cwmi.dll boost_regex_bc55_cp.lib boost_regex_bc55_cpi.lib boost_regex_bc55_cpi.dll
(for non-Borland users the suffixes chosen here reflect the names of Borland's own runtime libraries).
Rationale: some meta-data is likely to be required, but to reduce maintenance requirements this should be as small as possible. Generally speaking the smaller the meta-data requirement the more likely it is that the build system is in synch with the library. The worst case would be hand-crafted makefiles (hard to maintain), the best case no meta-data at all; for example the directory structure describes the library well enough that makefiles (or their equivalent) can be automatically generated.
Rationale: most unix variants more or less require an install step before using third party libraries, this also allows network installs (for multiple compilers and/or platforms if required), from a single source tree.
Implication: Keep the boost directory structure as close as possible to the install structure to simplify the installation process (strictly speaking this is not an absolute requirement, but cross-platform installation is hard enough with making it any harder than it needs to be). The easiest way is to keep the documentation/header/build trees separate.
This is a nebulous requirement that is based as much on personal preference as anything else.
Rationale: the directory structure is more "discoverable" if it branches consistently - that is with no directories with a massive number of entries.
Implication: where appropriate combine related libraries into domains.
Implication: avoid directories with a single sub-directory entry (redundancy).
While writing the requirements above one theme kept reoccurring; that of interdependency of boost libraries, and the need for an automated tool to deal with this problem. In fact from a code-reuse point of view, we need a library that describes the boost library and determines library dependencies that can then be reused in multiple tools. In my view the gains in ease of management, and automatic generation of makefiles etc, means that these tools should be developed regardless of the actual directory structure chosen (although the code will probably be dependent upon the directory structure chosen).
This library would define two types:
Library: defines the files that belong to a particular library, plus header file dependencies and a list of binary dependencies to other boost libraries.
Libraries: a collection of Library objects, also maintains a database of which header belongs to which library (used to calculate binary dependencies).
As far as is possible, these types should be able to load themselves directly from the boost directory structure, with only a minimal amount of meta-data used to describe the unusual cases.
In order for the dependency library to do it's job it is necessary to iterate over a directory structure, join and split path names, and convert path names to/from a platform specific format. For example to insert relative-paths into makefiles which may be used on platforms other than the one on which the makefile is generated. Some, but by no means all, of this functionality is already covered by Dietmar Kühl's dir_it library.
This is a short program that just iterates through a Libraries collection and prints out the dependencies, so that the result can be cut and pasted into the cvs modules file.
This is almost the same program as the alias generator, but copies files to a new location instead of printing them out. Used to "distil" out a subset of the boost library (including dependencies). This can be used to: split boost into multiple (domain specific) zip files for easier download, or split out that subset of boost that is being used by a particular project (for integration with the project without getting the whole of boost).
By combining the description of the boost library contained in a Libraries object with a description of the compiler/platform in use, it is possible to do one of two things: directly build the library, or output compiler/platform specific makefiles for distribution with boost. For brevity I'm going to skip over a description of this here - my pencil and paper sketch has a list of around 14 points of variation between compilers, and another list of 7 options for each compiler configuration (release, debug, static, dynamic etc). Probably even this fairly long list is not complete.
I'm assuming that the build system will probably output makefiles in the first instance; apart from anything else, most compilers come with some kind of make, using this avoids the need for the end user to have to build/install any tools that do not ship with their compiler. Here I'm assuming that the boost library maintainers periodically generate the makefiles, and then ship them with the library.
Directory | Description | ||
Boost-root/boost/ | All entry point boost headers, mainly these should be called "library-name.hpp" | ||
Boost-root/boost/library-name/ | All domain specific headers, all "expert-user" non-entry point headers. | ||
Boost-root/boost/library-name/detail/ | All implementation private headers. | ||
Boost-root/src/library-name/ | All mandatory source files. | ||
Boost-root/src/library-name/config/ | Any private configuration code (for example autoconf scripts), if these grow then we could move to an integrated configure system in Boost-root/config/ but that isn't currently necessary. | ||
Boost-root/src/library-name/build/ | Temporary location for private build systems, until the boost-wide integrated build comes on line. | ||
Boost-root/docs/ | All common documentation. | ||
Boost-root/docs/library-name/ | All documentation for "library-name"; must include an index.htm file. | ||
Boost-root/licence | A "generic" boost licence that describes the minimal guarantees made by all boost libraries (free for commercial use etc), with sub-directories for those boost libraries that have their own licences (currently just regex and graph, but this number is likely to grow). | ||
Boost-root/tests/library-name/ | All test programs for "library-name". These may be either: a single (multi-file) test program, multiple single file test programs, or multiple sub-directories (one for each test program). | ||
Boost-root/examples/library-name/ | All example programs for "library-name". These may be either: a single (multi-file) example program, multiple single file example programs, or multiple sub-directories (one for each example program). | ||
Boost-root/tools/tool-name/ | Contains all files required to build and use the specified tool (makefile generators etc). | ||
Boost-root/build/ | The boost build system. Consists of a collection of makefiles (one for each supported compiler), plus subdirectories: libs/ for built libraries, bin/ for built dll's (win32 only) and obj/ for object files. |
There are a couple of myths surrounding this structure that need exploding:
Not true: if the submission arrives as a zip file containing the directory structure described above, then the command:
cvs import boost library-name library-name-sub
will import the whole of the current directory tree and "intermingle" it with the existing boost tree in the repository.
There is one caveat to this however: if the imported source contains some files that were already in the boost directory tree (probably not a common situation), then an additional merge and resolve conflicts step arises:
On the main branch working copy:
cvs checkout -jlibrary-name-sub boost
Resolve any conflicts, and then:
cvs commit
The latter two steps should not be necessary in most cases, and occur whatever directory structure is used (it is probably easier in most cases to resolve such conflicts manually before importing the new sources).
By using cvs aliases (defined in the modules file) this situation does not arise, just specify the module/alias name when performing a checkout/commit.
This is probably the hardest and most painful part of the whole process. I'm going to suggest a migration method as follows:
The whole process described above is quite likely to take 1-2 weeks, during which no changes can be committed; this is going to require a fair amount of co-ordination between developers (actually this applies to any major change to the directory structure, irrespective of what the change is).
You will note that I haven't mentioned a time scale for the associated tools that I have suggested, probably these will need to be developed after the directory structure changes - although I believe it is possible to develop a minimal subset (the library description and alias generator) before making the changes if that is required.
There were a couple of other directory structures that were evaluated while preparing this document:
The "half way house structure":
This is the same as the current structure, but moves mandatory source files to boost-root/src/libname. This is easier to migrate to from the current structure, but was felt to be neither one thing nor the other.
The "skinny root structure":
This was proposed by John David, and Lois Goldthwaite, and moves the contents of the current boost-root/libs/ directory into boost-root/boost/. My main objection to this proposal is that it is less "discoverable" than the one presented here - my immediate reaction was "where has everything gone" - I also dislike mixing headers and non-headers in the same tree. However I'm prepared to accept that this could just be due to personal bias.
The following people have had their ideas reused, reconstituted and reformulated :-)
Beman Dawes, Ed Brey, Walter E. Brown, John (EBo) David, Jeff Garland, Lois Goldthwaite, Jens Maurer, Jeff Squyres, Gary Powell and Daryle Walker.
By Jens Maurer
I favor the following structure, which puts different emphasis on the some of the requirements.
Directory | Description |
---|---|
Boost-root/include/boost/ | All entry-point boost headers, mainly these should be called "library-name.hpp". |
Boost-root/include/boost/.../ | Domain-specific subdirectory; the "..." can be empty or arbitrarily nested while observing the "optimally branched" requirement. |
Boost-root/include/boost/.../library-name/ | All domain-specific headers, all "expert-user" non-entry point headers. |
Boost-root/include/boost/.../library-name/detail/ | All implementation private headers. |
Boost-root/libs/.../ | Main directory for a given subdomain; the "..." can be empty or arbitrarily nested while observing the "optimally branched" requirement. The "..." must correspond to some "..." in the header tree. The directory should contain a "index.html" which links to all libraries and subdomains contained. |
Boost-root/libs/.../library-name/ | Main directory for a given library. |
Boost-root/libs/.../library-name/src/ | All mandatory source files for the library. |
Boost-root/libs/.../library-name/build/ | Temporary location for private build system, until the boost-wide integrated build becomes available. |
Boost-root/libs/.../library-name/config/ | Any private configuration code (for example, autoconf scripts). |
Boost-root/libs/.../library-name/doc/ | All documentation for the library. |
Boost-root/libs/.../library-name/test/ | All regression tests for the library, suitable for the regression test suite. Due to test execution time constraints, not all of the tests may actually be added to "regression.cfg". |
Boost-root/libs/.../library-name/example/ | All example programs for "library-name". These may be either: a single (multi-file) example program, multiple single file example programs, or multiple sub-directories (one for each example program). |
Boost-root/tools/tool-name/ | Contains all files required to build and use the specified tool (makefile generators etc). |
Boost-root/build | The boost build system (user front-end; tools go in the "tools" hierarchy). Details still hazy. |
Boost-root/more/license.html | A "generic" boost license that describes the minimal guarantee provided by all boost libraries. This should get a prominent link on the main boost page. |
Note that the "include" path component contains only one subdirectory "boost" and thus violates the "optimally branched" requirement. It helps with discoverability, though, because people know what to expect under any directory named "include", i.e. header files.