dir_it
iterator to get all files in a directory
Abstract
The Standard C++ Library does not have any way to access the directory structure of a
computer. This is due to the missing notion of directories at all on some C++ target
platforms. However, many important platforms do have a notion of a directory but the
system interface is very different between these platforms. This class provides a standard
interface which is extensible to suit specific needs on the platform (when it comes to the
need to access file attributes).
Synopsis
|
#include <boost/directory.h>
std::string dirname(...);
boost::filesystem::dir_it begin(dirname);
boost::filesystem::dir_it end;
boost::filesystem::dir_it it(begin);
it = begin
*it
++it
*it++
it == end
it != end
prop::value_type v = boost::filesystem::get<prop>(it)
boost::filesystem::set<prop>(it, value)
|
Description
The class boost::filesystem::dir_it (dir_it for short) is an input
iterator which iterates over the entries in a directory. A begin iterator is constructed
from a valid directory name using the platform specific notation, an end iterator is
constructed using the default constructor of the class. The two function boost::filesystem::get()
and boost::filesystem::set() are used to access specific properties of a file.
The exact list of available properties depends on the system. Below is a list of common
properties and lists of properties supported on specific systems.
Since the file properties differ between systems, an extensible interface was choosen
to allow different sets of properties to be accessed. It is even possible for the user to
add special properties. To define a new file property, a struct is defines which
gives the name and the type to the property. Of course, it is also necessary to define the
get() and/or set() functions. Details for this are given below.
Basic Functionality
The main functionality of the class dir_it is to iterate over the entries in a
directory. Here is an example how the class can be used to print the files in a directory:
|
#include <iterator>
#include <iostream>
#include <algorithm>
#include <boost/directory.h>
int main(int ac, char *av[])
{
if (ac == 2)
{
typedef boost::filesystem::dir_it InIt;
typedef std::ostream_iterator<std::string> OutIt;
std::copy(InIt(av[1]), InIt(), OutIt(std::cout, "\\n"));
}
return 0;
}
|
Of course, it is also possible to do this loop manually: The class dir_it is
just an input iterator. Note, that the post increment operator only returns a proxy object
which can be used for dereferencing (using operator*()) as required by the input
iterator specification. However, the proxy object cannot be used to access other file
attributes than the name.
dir_it Members
Lifecycle
- Default Constructor
- The default constructor is used to create the "past the end" iterator. This
construction never fails and the resulting iterator cannot be deferenced.
- Constructor taking a std::string
- A std::string naming a directory can be used to construct a "begin"
iterator. If the argument does not name an accessible directory, the resulting iterator
compares equal to the past the end iterator constructed with the default constructor. On
most system it is no problem how this failure is indicated because even an empty directory
has entries, e.g. on POSIX systems the directories "." (the directory itself)
and ".." (the parent directory).
- Copy Constructor
- The copy constructor creates a new instance which is always positioned on the same
current entry as the original dir_it instance. This means, that advancing either
the original or the newly created iterator will advance both iterators. It is not possible
to copy a dir_it to iterate over the same directory entries twice. To do this,
two objects of type dir_it have to be constructed from the directory name.
- Destructor
- The destructor releases the resources associated with the dir_it. However, if
the dir_it was copied, associated system resources are released when the last
copy is destroyed. This is because the various copies share the same system resources.
- Assignment
- The assigned dir_it is always position on the same entry as the original
iterator. Thus, the same restriction on the assigned iterator apply as those for iterators
created with the copy constructor.
Operations
- Dereference (operator*())
- Dereferencing a dir_it returns the name of the current directory entry as std::string.
It is only possible to derference a dir_it if it does not compare equal to the
past the end iterator.
- Pre Increment (operator++())
- The major means to advance a dir_it is the pre increment operator. This
operation moves the object to the next directory entry, if there is another entry.
Otherwise, the dir_it object compares equal to the past the end iterator after
the pre increment. The pre increment operator returns the object itself.
- Post Increment (operator++(int))
- The post increment advances the dir_it to the next entry and returns a proxy
object which can be dereferenced as if it were an object of type dir_it. However,
nothing else can be done with this object. This method of advancing the iterator is
normally less efficient such that the pre increment operator should be used if possible.
- Equals Operator (operator==())
- The equals operator determines whether two objects of type dir_it are either
both indicating a current directory entry, or both objects are past the end iterators.
Because every directory turns into a past the end iterator once all entries in the
directory have been seen, this can be used to test whether there are any more entries.
However, it is not possible to determine whether a dir_it is positioned on a
specific directory entry (but this can be done by comparing the results of the dereference
operator).
- Not Equal Operator (operator!=())
- The not equal operator returns the exact negation of the equals operator. Thus, this
operator returns true if one of the two iterators indicates a current directory
entry while the other iterator is a past the end iterator.
File Properties
Using the functions get() and set() it is possible to access file
properties. Here is an example which prints the file sizes in addition to the name:
|
#include <iostream>
#include <boost/directory.h>
int main(int ac, char *av[])
{
if (ac == 2)
{
using namespace boost::filesystem;
for (dir_it it(av[1]); it != dir_it(); ++it)
std::cout << std::setw(10) << get<size>(it)
<< " " << *it << "\\n";
}
return 0;
}
|
Each property constists of two major components
- A struct which gives the name to the property and which defines the type
accessed using the property. The type of the property is defined using a typedef
defining the type value_type in the corresponding struct. For the
standard properties, the corresponding structs are defined in the namespace boost::filesystem.
- Access functions which are just specializations of the functions boost::filesystem::get()
and boost::filesystem::set(). Of course, if the property can only be read or only
be written, only the corresponding access function is defined.
Example Property
The size property used in the above example might be defined as follows:
|
namespace boost {
namespace filesystem {
struct size
{
typedef size_t value_type;
};
template <>
size::value_type get<size>(dir_it const &it)
{
return ... /* environment specific code */
}
}
}
|
The properties which are already provided by the implementation normally access some
data structure internal to the dir_it objects to avoid multiple system calls.
Details
- Property Selection
- The file property to be accessed is selected using a template argument to the get()
or set() function. The template argument is a type which defines the type value_type
as a subtype. The get() and set() functions are specialized for the
properties provided by the system. By specializing addtional versions of these functions,
the user may extend the set of accessible properties.
- Property Type
- The type of a file property is determined from a typedef called value_type
in the type selecting the property.
- Reading a Property
- To read a file property, a dir_it is passed as argument to the template
function boost::filesystem::get(). The template argument prop selecting
the file property to be accessed is explicitly specified. The return type returned from
the get() function is prop::value_type.
- Setting a Property
- To set a file property, a dir_it and the new value of the property are passed
to the template function boost::filesystem::set(). The template argument prop
selecting the file property to be accessed is explicitly specified. The type of the second
argument to the set() function is prop::value_type const &.
Standard Properties
The organization of files differ heavily between different system. As a result, the
sets of file properties defined on different systems vary. The property interface is
choosen such that it is obvious how specific properties are accessed except that the names
and the exact types are still open. To enhance portability, some common file properties
are always defined:
- is_directory
- A boolean read only property which can be used to determine whether a directory entry is
itself a directory.
- is_hidden
- A boolean property indicating whether the file is "hidden". By default, hidden
files are not shown to the user. However, with appropriate options, these files may be
shown anyway. On some systems, there is a special flag for the files which indicates that
the file is hidden. On such systems this flag is a read/write property. On other systems,
e.g. on POSIX systems, files starting with a dot (".") are considered to be
hidden. On such systems this flag is a read only property.
- size
- A read only property of type size_t returning the size in bytes of a file. Note
that the size returned is not necessarily identical to the number of characters retrieved
from an ifstream created for this file: In text mode, some character sequences
are replaced by single characters during reading. However, the number of characters in
binary mode should normally match the size of the file.
- mtime
- A read only property of type time_t returning the last modification time of the
file. On some systems, e.g. POSIX, it is possible to write this property to set the value
to an arbitrary value.
POSIX Properties
WinNT Properties
Future Directions
In computer systems there are other structures than the system's directory which can
also be viewed as directories. An obvious example are archive files which store copies of
directory hierarchies, like ZIP or tar files. It might be useful to extend the class dir_it
to consider such structures also to be directories and somehow add support to iterate of
these.
A potential approach might be the definition of a CORBA interface which is used
internally by the class dir_it to determine directory entries and to figure out,
whether an entry itself a directory. This way it would be possible to even extend what is
considered to be a directory and have the same class iterate over very different
structures.
Whether this approach is reasonable whill have to be evaluated in the future.
Personally, I think this is an interesting direction and I hope that I will find time to
test this in the near future.
See Also
POSIX: opendir(3), readdir(3), closedir(3), stat(2)
Standard Template Library: Input Iterator Requirements
Dietmar Kühl
<dietmar.kuehl@claas-solutions.de>