This is an high level documentation not only to some of the ideas and gudilines, that I have tryed to follow when programming the pao method, but also tecnical aids that I have used. Form discussions with Matthias, I have thought that they could be useful even to others. I hope this will be so.
I am new to fortran, and the thing that I have missed most are templates (parametric types and fuctions, that should come in fortan 200x). There were moments where I had the impression of hitting a wall trying to implement the things using a nice high level design in fortran. On the whole I think that the design that came out is not too bad, and as language Fortran 90 isn't so bad either. There are some things in it, that I quite like (but where are the templates??;). Well anyway I am quite happy with my solutions. Probabily some of my ideas do not reflect the standard practices of the fortran programmers. If you have suggestions, I will be glad to hear them (and maybe even apply them!;).
For the moment I will not give a real description of the pao method, it is basically the one developed by Gerd Berghold, only that I use another parametrization of the unitary transformation and I made the projection/injection explicit. The rest is basically the same. A more in depth description should come at some point in the future.
At the moment the pao component has different files, these are more global:
cp_prep_globals.h
the preprocessor macros for
error handling.cp_log_handling.F
the routines to log, errors,
notes,... (contains the logger object and routines to convert
numbers to strings).cp_error_handling.F
the routines to perform
error handlingcp_output_handling.F
the routines to write some
output datacp_lbfgs.F
the lbfgs quasi newton optimizer
used by the pao methodand these are more pao related:
cp_pao_types.F
all the (global) types that are
related to the pao methodcp_pao_types_tests.F
some tests for the pao
typescp_pao_utils.F
various utility routines, some
are quite general and should be moved to a less pao specific
placecp_plain_rot.F
the plain rotation routines, and
the one that calculates the unitary trasformation with
themcp_plain_rot_tests.F
some tests forthe plain
rotationscp_pao_obj_function.F
the functional (objective
function) that pao tries to optimizecp_pao_optimizer.F
the code that optimizes the
objective function (calling the real optimizer)These are the ideas behind the developement of the pao method. These have been thought during the developement, and I try to guarantee that they are respected in all the new things and modification that I do, but there is always some code that has been written before I decided the new guideline. In this case I do not always go back fixing the code. In fact I think that functioning code is the most important thing and changing introduces new bugs and and needs time, so the spped of my fixes depend on how much the problem disturbs me. If something disturbs you this is a BIG reason to do the changes, so tell me... anyway I like to have clean code and refactor often (that one of the reasons of all my error checking...)
I like object orientation, and I think that it is very useful in doing big programs. Not every aspect of it can be done efficiently in fortran, and this has already been discussed elsewere, but some aspects of it can, and in particular you can:
I think that for a library also the last thing is very important: it makes the library more consistent, and after a while you know "intuitively" how the things are supposed to work, you don't have to lookit up every time.
Well with this idea I choosed a set of generic procedure names that a type can implement. The type would implement them with private names and have a public interface with the public generic name.
This uses generic function, and does not impose any slow down because the function is choosen at compile time. But there is a drawback: normally the error messages of the compiler are less helpful, and the search of the function def can be a little more complex. On the plus side generic names are easy to remember, and if you change the type of some variable (or the name of the type), you don't have to change the code that uses it.
These are the names the I have choosen, I have put a two letter prefix to avoid name clashes (I like to import a module without renaming). The prefix was cp, but then I was told to use qs, and then cp again, but I don't find so funny to change it, so for now it stays qs (but I am ready to chage it again upon request).
These are methods that every type is expected to implement, without exception (even if it doesen't need them, to keep the consistence).
call qs_init(obj,...)
initializes obj
(the first argument) using the following (normally keyed)
arguments. It
must be called prior to every other operation on the
object. Some initialization argument may be requiredcall qs_dealloc_ref(obj,...)
releases all the memory
that was allocated by obj
. Must be called when an
object is no longer nedeed (otherwise there could be
leaks)There are types that are better always seen as pointers (as linked list for example). In this case the previous method are not so suited. In this case two other method can be used instread:
qs_create(obj,..)
allocates and initializes a
type. obj
will point to the new typeqs_dealloc(obj,..) deallocates the memory
allocated by obj
, and obj
itself.
These methods are not required, but are often present.
call qs_get(obj,...)
returns various attributes
of obj
. All the other arguments are
optional. Some attributes might not be elements of obj (they
might be calculated or returned from sub elements of the
structure)call qs_set(obj,...)
sets the value of various
attributes of obj
(not all the attributes must be
settable, in fact some of my types are a little too complex
because I made too many attributes changeable)qs_valid(obj,...)
returns true if the object is
valid. Only minimal testing should be performed. This method
is expected to be called quit often.qs_validate(obj,...)
returns true if the object
is valid, an extensive series of tests should be performed. It
is expected that this method will be called only once.next(iterator,...)
moves the iterator to the
next element, and returns true if the iterator is still valid
(not past end). Useful when you need more than one thing from
the iterator.get_next(iterator,...)
moves the iterator to the
next element, and return a pointer to the actual
element. Returns a disassociated pointer when at end.
Useful when you need just one thing from the iterator.
Depending on the implementation the iterator after a call to
get_next
could be on the next or the previous
element.qs_did_change(obj,...)
tells the stucture
obj
that some of its internal parameters have
changed and (if obj caches some values) that these could now
be invalid.Obviosly every type has his specific operations. One can be stantarized is a function to directly get an attribute (without needing a variable). This should also be a generic function so that another object with the same attribute can use the same function name. I.e.
qs_get_something(obj,...)
returns
something from obj
If you use pointers, or allocate memory dinamically thenyou should have a memory policy to avoid leaks. I am accustomed to have at least reference counting, but it is quite some work to implement it well in fortran, so for the moment I choose a very simple policy. I still don't know if it will be really appliable always, but here is it:
qs_init
before using it, and qs_dealloc_ref
after being
done with it. There are no exceptions to this rule, so you can
see a leak even without fully understand the code.I found a rather subtle and unpleasant drawback to these rules: if a type has subobjects in it (not pointer to them) and gives back pointers to them, it should be a target in almost all the function calls, otherwise it could give back pointers to temporary objects (if copy-in happened), and that might not be what you want. Note that putting a target everywhere is not a bat idea if your structure is big, because you avoid copy-in/copy-out. So maybe I should add the following rule:
As I use pointers I should say a couple of things about them:
I find testing and error and output handling very important in software developement, so I have put a couple of facilities to make it easier. I have defined three modules, one that does logging (qs_log_handling), and one that does error handling (qs_error_handling), and one that deals with the output of data.
The idea with them is to centralize or delegate the decision about what to log and where to log, so that you don't have to decide it in the code you write. This way your computing code does not become mingled with code that has to decide what to log, where to log, if it should stop,...
For the moment these module are very simple in their implementation, but they give the necessary hooks to be extended in the future.
Important to use them you must put a call
qs_init_default_loggers
at the beginning of your
programm
I will discuss the most mature (and the most used by me at moment) of these module: qs_error_handling. The idea behind it is largely drawn from the error handling of the way it is done in the fortran 90 version of the nag numerical library. See for example the error handling part of the introduction to the nag library. The idea is that almost every function has an optional parameter named error to control its behaviour in the case of error.
To keep the handling of errors simple I have written a couple of procedures that can help, especially if your routines have a standard form:
call qs_assert(condition, level, error_nr, fromWhere,
message, error,failure)
checks
condition
, if it is false then the optional
argument failure
is set to true.
This can be used to do many assertion one fater the other and
check at the end if one failed.call qs_error_message(level, fromWhere, message,
error)
writes an error message (if the printLevel of
the error
is not too high.call qs_propagate_error(iError, fromWhere, message,
error, failure)
if the internal error iError of a
subroutine is set (i.e. if there was an error in the
subroutine) propagates the error to the actual error
(i.e. sets error level and number to the ones of iError) and
sets the optional parameter failure
to trueTo make the error logging more efficient I have defined some
macros in "qs_prep_globals.h". One of the most useful macros is
QSSourceFileRef()
, this macro inserts the actual
filename and line number as string. It can be used to make
error messages much more useful: you can write "error in
"//QSSourceFileRef()
and it is expanded like error
in /actual/file/path.F line 183
. Be careful with
this macro: it exmands inline with the full Path of the actual
file, if the path is long and the line where you expand it grows
longer than the maximum line length of the compiler you could
have problems (i.e. use it in short lines).
There are other macros that can save a little typing and
guarantee that the test is performed inline (without a function
call). They are thought for short checks, and must be all in one
line. These are QSPrecondition, QSPostcondition, QSInvariant,
QSAssert. They all simply call qs_assert
, but
guarantee inlining, for example
QSPrecondition(n>0,qs_warning_level,"some_module:nsqrt",error,failure)
is equivalent to
call qs_assert(n>0,level=qs_warning_level,&
error_nr=qs_precondition_failed, fromWhere="some_module:nsqrt",&
message="PRECONDITION(n>0) failed in someFile.F line 675",&
error=error,failure=failure)
For a more in depth discussion of the functions see qs_error_handling.F.
Now some examples on how to use the error handling routines. First a very simple function with error handling:
function nsqrt(n,error)
integer :: nsqrt
integer , intent(in) :: n
type(qs_error), optional, intent(inout) :: error
logical::failure
character(len=*), parameter :: routineN='some_module:nsqrt'
failure=.false.
QSPrecondition(n>0,qs_failure_level,routineN,error,failure)
if (.not.failure) then
nsqrt=floor(sqrt(real(n)))
QSPostcondition(nsqrt*nsqrt==n,qs_warning_level,routineN,error,failure)
else
nsqrt=-1
end if
end function nsqrt
Then a routine that calls it...
subroutine s1(n,error)
integer , intent(in) :: n
type(qs_error), optional, intent(inout) :: error
print *, nsqrt(9,error=error)
end subroutine s1
Then a routine that calls nsqrt and checks if it fails
subroutine s2(n,error)
integer , intent(in) :: n
type(qs_error), optional, intent(inout) :: error
logical::failure
character(len=*), parameter ::
routineN='some_module:s2'
integer :: mysqrt
type(qs_error) :: iError
failure=.false.
call qs_init(iError,template_error=error)
mysqrt=nsqrt(9,error=iError)
if (failure)
print *, "there was an
error"
else
print *,
mysqrt
end if
call qs_dealloc_ref(iError,error=error)
end subroutine s2
Finally a routine that calls s2 but suppresses the printing of warnings.
subroutine s3(n)
integer , intent(in) :: n
type(qs_error) :: error
call qs_init(error,print_level=qs_failure_level)
call s2(n,error=error)
end subroutine s3
I have tried to develop the code trying to make it easy to parallelize it. It is not parallel, but basically all the operations are local to one atom. So I have tried to write the functions in such a way that they work one atom at time, and only with local data. That is for example the reason of the existence of qs_local_angles. This means that if the atom are distribueted between the processors then it should be easy to make my code parallel.
I have used no saved or global variable, and I have tried to make all my functions mutithread complient (at least I think they are).
Documentation is really important for code, and especially so for a library. I know that I have not the coerence to do much of it after I have finished coding something. So it is either right from the beginning or (almost) not at all. As cp2k uses robodoc I have also used it. I have some gripes with it (that it does not extract the function name,... from the source (as f90doc) and that it is quite prolix and hevyweight. On the plus side it makes for more professional looking code that my usual standard, and for a higher number of lines written ;).
Anyway I have used robodoc and on the whole it works quite well. There is some kind of highlevel description of the code (like this document) that isbetter done separately, but otherwise all my doc is in the source (smaller probability of getting out of sync).
Normally the documentation of the type declaration should contains all the attributes (also the ones that are not stored) and a little description of the purpose of the type. The get/set init/... methods often describe only differences, or special arguments, and make reference to the type declaration documentation for the rest of the arguments.
Debugging is one place where if you are not careful you can loose much time, and as I like to refactor my code, and unfortunately my changes are rarely fully bug free I have a couple of things that can help to find the errors more easily:
qs_debug
defined in
qs_error_handling. It should be true in debug code and false
in optimized code. I also have a private variable
(debug_this_module
) in every module to make the
error checking more selective. So I enclose the more expensive
checks in if(qs_debug .and. debug_this_module) then
... end if
or something similar
break qs_error_handling@qs_common_error_stop
break qs_error_handling@qs_handle_error
and then I print a stack trace of the place where the error happened.
If you want to follow the guidelines then you are in for quite some typing, and either you are much more disciplined than me or you won't do it. I know I am lazy, and I I will anyway going to forgot something. So used extensively some facilities of xemacs (my editor of choice).
The emacs f90-mode does syntax highlighting, indenting,
expansion of the end
statement with the
corresponding first token (just hit tab). Unfortunately the
standard setting interprets .F
files as fortran 77
files. See my .emacs file about how to
change the default.
You should use long and descriptive names, but then it is a pain to type them. Dynamic expansion can be a boon in these occasions
M-/ (meta key and "/") search for a complection of the current word backwards in your file, then forward, and finally in other open files (hitting M-/ again cicles between the different complections).
So you have the robodoc header,and you also need the name of
the function in a variable, the optional variable error, and
maybe a logical variable to keep track of failures that should
be initialized to false... that is quite long to type... copy
and paste is the right way to drag the inconsistencies that you
had not fixed in the old code in the new... so maybe I don't
need it for this small function?...
Well it is easy to get sloppy, to force myself on THE right way
;) I use abbreviations: type "`mstruct", and you have the module
structure, "`sstruct" gives a subroutine structure, "`fstruct"
the function structure, "`tstruct" the structure of the
types,...
Abbreviations need to be actived (see my .emacs file), and new ones can be defined with "C-x a l" (I have many more). To change them "M-x edit-abbrev" is useful.
I have built a TAGS table (with etags) of the files in cp2k/src and then with "M-." and "C-u M-." I jump to the function definitions.
Search and replace can be very useful, with emacs you have "C-s", "C-r", "M-%", "M-C-%", the and the mighty "M-x tags-query-replace" and dired ("C-x-d") where after marking your files with "m" you can do e query regex substitution on the selected files by typing "Q". In qery replace it can be useful to stop a little to do some editing (with "C-r") and then continue the query replace with "C-M-c"
I you use vi (not my case) there is a vi emulation in
emacs. You can activate it with a menu, but to activate it by
default you should put a (viper-mode)
in your
~/.emacs
I am in no way a lisp expert, (altough I find lisp a very interesting language), and I know that my comments don't conform with the standard lisp way of commenting. Anyway here is my .emacs file in the home that others find it useful. If you want to use it copy the following in your ~/.emacs.
;;;; Fawzi Mohamed .emacs ;; makes abbreviations persistent between sessions (if (not (file-exists-p "~/.abbrev_defs")) (write-abbrev-file "~/.abbrev_defs")) (read-abbrev-file "~/.abbrev_defs" ) ;; uses Fortran f90-mode for .F files (setq auto-mode-alist (cons `("\\.F\\'" . f90-mode) auto-mode-alist)) ;; Fortran f90 mode: activate abbreviations and syntax highlighting (setq f90-mode-hook '(lambda () (abbrev-mode t) (turn-on-font-lock))) ;; quiet the beep (setq-default bell-volume 0) ;; modified mode line (setq mode-line-system-identification (substring (system-name) 0 (string-match "\\..+" (system-name)))) (setq default-mode-line-format (list "" 'mode-line-modified "Line %l-" '(-3 . "%P") "--" "%14b" " " 'default-directory " " "%[(" 'mode-name 'minor-mode-alist "%n" 'mode-line-process ")%]--" "<" 'mode-line-system-identification "> %-")) ; sets new mode line as default. (setq mode-line-format default-mode-line-format) ;; mouse wheel scrolling (a little too coarse) and in the ative frame ; (not where the mouse is in), but better than nothing ; is actived only with xemacs (cond ((string-match "XEmacs\\|Lucid" emacs-version) (global-set-key 'button5 'scroll-up) (global-set-key 'button4 'scroll-down))) ;; use the TAGS table to search for function defs in cp2k (setq tag-table-alist '(("~/cp2k/src/" . "~/cp2k/src/"))) ;; default compile command ; (global, but I don't need a local mode hook for the moment) (setq compile-command "cd ~/cp2k/makefiles/;make sdbg")
I use much cvs to be able to look at the old versions of my files. I check in (in my local repository) the files even when they don't compile (sometime trying to fix a bug you do many superfluos or wrong changes, that you will want to undo). I strongly encurage anyoune to do the same.