GDE Interface and EditorDESCRIPTION
Starts the GDE Editor designed by Steven Smith.
See next chapter of this text for the original help text.
As GDE originally used its own built-in database, it had to be
slightly modified to run under ARB. So
**** READ THE WARNINGS/BUGS CAREFULLY ****
| |
|
WARNINGS
As soon as you start GDE, it creates a copy of the selected
sequences. That means that you may change the sequences
with either GDE or ARB, but not both. Therefore, if you have started
GDE, do nothing but sequence editing in GDE till you quit GDE.
To really save sequences to disc, you have to send the sequence
changes to ARB and then use ARB to save the ARB database.
| |
|
BUGS
Many functions, especially
-
deleting,
-
moving,
-
duplicating,
-
creating,
-
importing,
species do not work correctly.
********* Part of the Original GDE HELPTEXT ******************
| |
|
Introduction
The Genetic Data Environment is part of a growing
set of programs for manipulating and analyzing
"genetic" data. It differs in design from other
analysis programs in that it is intended to be an
expandable and customizable system, while still
being easy to use.
There are a tremendous number of publicly available
programs for sequence analysis. Many of these
programs have found their way into commercial
packages which incorporate them into integrated,
easy to use systems. The goal of the GDE is to
minimize the amount of effort required to integrate
sequence analysis functions into a common
environment. The GDE takes care of the user
interface issues, and allows the programmer to
concentrate on the analysis itself. Existing programs
can be tied into the GDE in a matter of hours (or
minutes) as apposed to days or weeks. Programs
may be written in any language, and still seamlessly
be incorporated into the GDE.
These programs are, and will continue to be,
available at no charge. It is the hope that this
system will grow in functionality as more and more
people see the benefits of a modular analysis
environment. Users are encouraged to make
modifications to the system, and forward all changes
and additions to Steven Smith at
smith@bioimage.millipore.com.
| |
|
What's New for this Release
GDE 2.2 represents a maintenance release. Several
small bugs have been fixed, as well as new editing
features and user interface elements. Also, I have
tried to update all of the contributed external
programs to their latest release. Updated programs
include:
-
Phylip
-
Treetool
-
LoopTool
-
Readseq
-
Blast
-
Fasta
Improved versions of printing, and translate are
included as well. As for new editing features, a
useful "yanking" feature has been added by Scott
Ferguson from Exxon Research, and the capability
to export the colormap for a sequence (see
appendices A/C). Among the bugs fixed in this
release are:
Selection mask problems when exporting to
Genbank (fixed in 2.1)
Memory leaks (fixed in 2.1)
Correct handling of circular sequences
More liberal interpretation of Genbank formatted
files. (not column dependent)
| |
|
System Requirements
GDE 2.2 currently runs on the Sun family of
workstations. This includes the Sun3 and Sun4
(Sparcstation) systems. It was written in XView,
and runs on Suns using OpenWindows 3.0 or MIT's
X Windows. It runs in both monochrome, and color,
and can be run remotely on any system capable of
running X Windows Release 4. You should have at
least 15 meg of free disk space available. The binary
release for SparcStations was compiled under
SunOS 4.1.2 and Openwindows 3.0.
We are also supporting a DECStation version of
GDE. This is running under XView 3.0/X11R5. We
encourage interested people to port the programs to
their favorite Unix platform. There are informal
ports to the SGI line of unix machines.
| |
|
Note to Motif users
GDE2.2 can be run using different window
managers. The most common alternative to olwm is
the Motif window manager (mwm). The only
problem in using another window manager is that
the status line is not displayed. We have added a
"Message panel" as an option under "File-
>Properties" which displays all of the information
contained on the status line.
People using other window managers may also
prefer using xterm, and xedit as default terminals and
file editors. This can be accomplished by replacing
all occurrences of 'shelltool' and 'textedit' with
'xterm -e' and 'xedit' in the
$GDE_HELP_DIR/.GDEmenus file.
FastA and Blast need to have the properly formatted
databases installed in the $GDE_HELP_DIR under
the directories FASTA/PIR, FASTA/GENBANK,
BLAST/pir BLAST/genbank. For FASTA, simply
copy a version of PIR and Genbank into the proper
directory. Alternately, the PIR and GENBANK
files can be symbolic links to copies of Genbank
held elsewhere on your system. You may need to
look at the .GDEmenus file in $GDE_HELP_DIR to
verify that you are using the same divisions for
these databases.
Blast installation involves converting PIR and
GENBANK to a temporary FASTA format (using
pir2fasta and gb2fasta) and then using pressdb for
nucleic acid, and setdb for amino acid to reformat the
databases again into blast format. The .GDEmenus
file is currently set up to search with blast using the
following databases: pir, genpept, genupdate, and
genbank. If you wish to divide these into
subdivisions, then the .GDEmenus file will have to
be edited.
The most up to date release of blast can be obtained
via anonymous ftp to ncbi.nlm.nih.gov. The most
recent release of FASTA can be obtained via
anonymous ftp to uvaarpa.virginia.edu. It is
strongly recommended that you retrieve these copies,
and become familiar with their setup.
| |
|
Using the GDE
It is assumed that the user is familiar with the Unix,
and OpenWindows/Xwindows environments. It is
also assumed that people running standard MIT X-
Windows will be using the OpenLook window
manager (olwm). Other window managers work
with varied success. If you are not certain as to how
your system is set up, please contact your systems
administrator.
The GDE uses a menu description language to
define what external programs it can call, and what
parameters and data to pass to each function. This
language allows users to customize their own
environment to suite individual needs.
The following is how the GDE handles external
programs when selected from a menu:
Each step in this process is described in a file
.GDEmenus in the user's current or home directory.
The language used in this file describes three phases
to an external function call. The first phase
describes the menu item as it will appear, and the
Unix command line that is actually run when it is
selected. The second phase describes how to prompt
for the parameters needed by the function. The third
phase describes what data needs to be passed as
input to the external function, and what data (if any)
needs to be read back from its output.
The form of the language is a simple keyword/value
list delimited by the colon (:) character. The
language retains old values until new ones are set.
For example, setting the menu name is done once for
all items in that menu, and is only reset when the
next menu is reached.
The keywords for phase one are:
menu:menu name Name of current menu
item:item name Name of current menu item
itemmeta:meta_key Meta key equivalence (quick keys)
itemhelp:help_file Help file (either full path, or in GDE_HELP_DIR)
itemmethod: Unix command
The item method command is a bit more involved, it
is the Unix command that will actually run the
external program intended. It is one line long, and
can be up to 256 characters in length. It can have
embedded variable names (starting with a '$') that
will be replaced with appropriate values later on. It
can consist of multiple Unix commands separated by
semi-colons (;), and may contain shell scripts and
background processes as well as simple command
names. Examples will be given later.
The keywords for phase two are:
-
arg:argument_variable_name
Name of this variable. It will appear
in the itemmethod: line with a dollar
sign ($) in front of it.
-
argtype:slider,chooser,choice_menu or text
The type of graphic object
representing this argument.
-
arglabel:descriptive label
A short description of what this
argument represents
-
argmin:minimum_value (integer)
Used for sliders.
-
argmax:maximum_value (integer)
Used for sliders.
-
argvalue:default_value (integer)
It is the numeric value associated with
sliders or the default choice in
choosers, choice_menus, and choice_lists
(the first choice is 0, the second is 1 etc.)
-
argtext:default value
Used for text fields.
-
argchoice:displayed value:passed value
Used for choosers and
choice_menus. The first value is
displayed on screen, and the second
value is passed to the itemmethod
line.
The keywords for phase three are as follows:
-
in:input_file
GDE will replace this name with a
randomly generated temporary file
name. It will then write the selected
data out to this file.
-
informat:file_format
Write data to this file for input to
this function. Currently support
values are Genbank, and flat.
-
inmask:
This data can be controlled by a
selection mask.
-
insave:
Do not remove this file after running
the external function. This is useful
for functions put in the background.
-
out:output_file
GDE will replace this name with a
randomly generated temporary file
name. It is up to the external function
to fill this file with any results that
might be read back into the GDE.
-
outformat:file_format
The data in the output file will be in
this format. Currently support
values are colormask, Genbank, and
flat.
-
outsave:
Do not remove this file after reading.
This is useful for background tasks.
-
outoverwrite:
Overwrite existing sequences in the current
GDE window. Currently supported with
"gde" format only.
Here is a sample dialog box, and it's entry in the
.GDEmenus file:
Using the default parameters given in the dialog
box, the executed Unix command line would be:
(tr '[a-z]' '[A-Z]' < .gde_001 >.gde_001.tmp ; mv .gde_001.tmp CAPS ; gde CAPS -Wx medium ; rm .gde_001 ) &
where .gde_001 is the name of the temporary file
generated by the GDE which contains the selected
sequences in flat file format. Since the GDE runs
this command in the background ('&' at the end) it
is necessary to specify the insave: line, and to
remove all temporary files manually. There is no
output file specific because the data is not loaded
back into the current GDE window, but rather a new
GDE window is opened on the file. A simpler
command that reloads the data after conversion
might be:
item: All caps
itemmethod: tr '[a-z]' '[A-Z]' <INPUT > OUTPUT
in: INPUT
informat: flat
out: OUTPUT
outformat: flat
In this example, no arguments are specified, and so
no dialog box will appear. The command is not run
in the background, so the GDE can clean up after
itself automatically. The converted sequence is
automatically loaded back into the current GDE
window.
In general, the easiest type of program to integrate
into the GDE is a program completely driven from a
Unix command line. Interactive programs can be
tied in (MFOLD for example), however shell scripts
must be used to drive the parameter entry for these
programs. Programs of the form:
program_name -a1 argument1 -a2 argument2 -f inputfile -er errorfile > outputfile
can be specified in the .GDEmenus file directly. As
this is the general form of most one Unix commands,
these tend to be simpler to implement under the
GDE.
As functions grow in complexity, they may begin to
need a user interface of their own. In these cases, the
command line calling arguments are still necessary
in order to allow the GDE to hand them the
appropriate data, and possible retrieve results after
some external manipulation.
| |
|
Appendix C, External functions
ClustalV - Cluster multiple sequence alignment
-
Author: Des Higgins.
-
Reference: Higgins,D.G. Bleasby,A.J. and Fuchs,R. (1991)
-
CLUSTAL V: improved software for multiple sequence alignment. ms. submitted to CABIOS
-
Parameters:
k-tuple pairwise search Word size for pairwise comparisons
Window size Smaller values give faster alignments,
larger values are more sensitive.
Transitions weighted Can weight transitions twice as high as
transversions (DNA only).
Fixed gap penalty Gap insertion penalty, lower value, more gaps
Floating gap penalty Gap extension penalty, lower value, longer gaps
-
Comments:
ClustalV is a directed multiple sequence alignment algorithm that
aligns a set of sequences based on their level of similarity. It first
uses a Lipman Peasron pairwise similarity scoring to find "clusters"
of similar sequences, and pre-aligns those sequences. It then adds
other sequences to the alignment in the order of their similarity so as
to produce the cleanest alignment.
-
Warning:
ClustalV only uses unambiguous character codes. It will also
convert all sequences to upper case in the process of aligning. Clustal
does not pass back comments, author etc. Be sure to keep copies of your
sequences if you do not wish to lose this information.
MFOLD - RNA secondary prediction
-
Author: Michael Zuker
-
Reference:
-
M. Zuker
On Finding All Suboptimal Foldings of an RNA Molecule.
Science, 244, 48-52, (1989)
-
J. A. Jaeger, D. H. Turner and M. Zuker
Improved Predictions of Secondary Structures for RNA.
Proc. Natl. Acad. Sci. USA, BIOCHEMISTRY, 86, 7706-7710, (1989)
-
J. A. Jaeger, D. H. Turner and M. Zuker
Predicting Optimal and Suboptimal Secondary Structure for RNA.
in "Molecular Evolution: Computer Analysis of Protein and
Nucleic Acid Sequences", R. F. Doolittle ed.
Methods in Enzymology, 183, 281-306 (1989)
-
Parameters:
-
Linear/circular RNA fold
-
ct File to save results
-
Comments:
-
MFOLD passes it's output to a program Zuk_to_gen that translates the secondary
structure prediction to a nested bracket ([]) notation.
This notation can then be used in the Highlight Helix, and Draw
Secondary structure (LoopTool) functions.
-
MFOLD currently does not support much in the way of additional parameters.
We hope to have all additional parameters available soon.
Blast - Basic Local Alignment Search Tool
FastA - Similarity search
-
Reference:
-
W. R. Pearson and D. J. Lipman (1988),
"Improved Tools for Biological Sequence Analysis", PNAS 85:2444-2448
-
W. R. Pearson (1990) "Rapid and Sensitive Sequence
Comparison with FASTP and FASTA" Methods in Enzymology 183:63-98
-
Parameters:
Database Which database to search
Number of alignments to report
SMATRIX Which similarity matrix to use
-
Comments:
The FastA package includes several additional programs for pairwise alignment.
We have only included a bare bones link to FastA. We hope to include a more
complete setup for the actual 2.2 release.
Assemble Contigs - CAP Contig Assembly Program
-
Author
-
Xiaoqiu Huang
Department of Computer Science
Michigan Technological University
Houghton, MI 49931
E-mail: huang@cs.mtu.edu
-
Minor modifications for I/O by S. Smith
-
Reference
"A Contig Assembly Program Based on Sensitive Detection of
Fragment Overlaps" (submitted to Genomics, 1991)
-
Parameters:
Minimum overlap Number of bases required for overlap
Percent match within overlap Percentage match required in the overlap
region before merge is allowed.
-
Comments:
CAP returns the aligned sequences to the current editor window. The sequences are
placed into contigs by setting the groupid. Cap does not change the order of the
sequences, and so the results should be sorted by group and offset (see sort under
the Edit menu).
Lsadt - Least squares additive tree analysis
-
Author:
Geert De Soete,
'C' implementation by Mike Maciukenas,
University of Illinois
-
Reference:
LSADT, 1983 Psychometrika, 1984,
Quality and Quantity
-
Parameters:
Distance correction to use in distance matrix calculations (see count below).
What should be used for initial parameters estimates.
Random number seed.
Display method (See TreeTool below).
-
Comments:
-
The program has been rewritten in 'C' and will be included with the rRNA Database
phylogenetic package being written at the University of Illinois Department of
Microbiology.
-
Count is a short program to calculate a distance matrix from a sequence
alignment (see below).
Count - Distance matrix calculator
-
Author: Steven Smith
-
Parameters:
Correction method Currently Jukes-Cantor or none,
Include dashed columns,
Match upper case to lower
-
Comments:
Passes back a distance matrix in a format readable by LSADT.
Treetool - Tree drawing/manipulation
-
Author: Michael Maciukenas, University of Illinois
-
Comments: See included documentation for TreeTool usage.
Readseq - format conversion program
| |
|
Copyright Notice
The Genetic Data Environment (GDE) software and
documentation are not in the public domain.
Portions of this code are owned and copyrighted by
the The Board of Trustees of the University of
Illinois and by Steven Smith. External functions
used by GDE are the proporty of, their respective
authors. This release of the GDE program and
documentation may not be sold, or incorporated into
a commercial product, in whole or in part without
the expressed written consent of the University of
Illinois and of its author, Steven Smith.
All interested parties may redistribute the GDE as
long as all copies are accompanied by this
documentation, and all copyright notices remain
intact. Parties interested in redistribution must do
so on a non-profit basis, charging only for cost of
media. Modifications to the GDE core editor should
be forwarded to the author Steven Smith. External
programs used by the GDE are copyright by, and are
the property of their respective authors unless
otherwise stated.
While all attempts have been made to insure the
integrity of these programs:
| |
|
Disclaimer
THE UNIVERSITY OF ILLINOIS, HARVARD
UNIVERSITY AND THE AUTHOR, STEVEN
SMITH GIVE NO WARRANTIES, EXPRESSED
OR IMPLIED FOR THE SOFTWARE AND
DOCUMENTATION PROVIDED, INCLUDING,
BUT NOT LIMITED TO WARRANTY OF
MERCHANTABILITY AND WARRANTY OF
FITNESS FOR A PARTICULAR PURPOSE.
User understands the software is a research tool for
which no warranties as to capabilities or accuracy are
made, and user accepts the software "as is." User
assumes the entire risk as to the results and
performance of the software and documentation. The
above parties cannot be held liable for any direct,
indirect, consequential or incidental damages with
respect to any claim by user or any third party on
account of, or arising from the use of software and
associated materials. This disclaimer covers both the
GDE core editor and all external programs used by
the GDE.
| |
|
|