What does PALM do ?

PALM (Phylogenetic reconstruction by Automatic Likelihood Model selector) is aiming at providing biologists for automatically computing, comparing and selecting the best model among various DNA (JC69, K80, F81, HKY, TrN, TrNef, K81, K81uf, TIM, TIMef, TVM, TVMef, SYM, GTR) and Protein (LG, DCMut, JTT, MtREV, MtMam, MtArt, Dayhoff, WAG, RtREV, CpREV, Blosum62, VT, HIVb, HIVw) models of molecular biological evolution that best fits the given biological sequences. PALM uses maximum likelihood estimation (MLE) method to fit the best model and also provides bootstapping for robust statistical evidence. Similar with Modeltest and ProtTest, PALM can consider DNA or Protein sequences, furthermore, FASTA or PHYLIP format sequences. PALM is developed based on ReadSeq, ClustalW, PhyML, Modeltest, and ProtTest.

PALM is developed based on ReadSeq, ClustalW, PHYML, PortTest, and Modeltest. We strongly recommend users to cite these programs as well when using PALM.

Citation:

D.G. Gilbert. Convert Sequence Formats using ReadSeq. (http://iubio.bio.indiana.edu/soft/molbio/readseq/java/)

Thompson, J.D., Higgins, D.G. and Gibson, T.J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignments through sequence weighting, position specific gap penalties and weight matrix choice.
Nucl. Acids Res. 22:4673-4680 (1994).

Guindon S, Gascuel O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.
Systematic Biology. 2003 52(5):696-704.

Abascal F, Zardoya R, Posada, D. ProtTest: Selection of best-fit models of protein evolution.
Bioinformatics 2005 21(9):2104-2105.

Posada D and Crandall KA. Modeltest: testing the model of DNA substitution.
Bioinformatics 1998 14 (9): 817-818.

 

Input Format

While uploading DNA or protein sequences, they must be consistent with aligned PHYLIP format. If you are not familiar with how to transfer the format, it is workable to submit FASTA format file. PALM will apply ClustalW to do multiple sequence alignment and produce PHYLIP format file for the subsequent computing.

There are examples to show the two type format.

FASTA format

>Triticum
cgcgactagagtttcctcgctagggtttcgtctcccggtccggatccagcagcagcggct
>Humulus
cccccgatatcgtttagatcattcgggcaaagttcgttcagacttctctccgagttctcttccc
>Zea
aagctcccatctttttcctcttcctccgccgccgccgcgcttcctcatctcgctagggtttcct
>Picea
gacaccatcacaacaatcaatcgcagcgccgattctccttcaaaaacttttcctaagtcgtc

 

PHYLIP format

460
Triticum ------cgcgactagagtttcctcgctagggtttcgtctcccggtccggatccagcagcagcggct
Humuluscccccgatatcgtttagatcatt---cgggcaaagttcgttcag---acttctctccgagttctcttccc
Zea---aagctcccatctttttcctcttcc--tccgccgccgccgcgcttcctcatctcgctagggtttcct--
Picea--gacaccatcacaacaatcaatcgc-----agcgccgattctccttcaaaaacttttcctaagtcgtc

 

If you have specific or more evidenced phylogenetic trees to infer the evolution, you can upload your trees on the Starting Tree option. Please be aware of the tree is standard NEWICK tree format.

Here is an brief example of NEWICK format.

( ( Triticum , ( Humulus , Zea ) ) , Picea ) ;

 

Options Description

Input Type

You must select the sequences are in FASTA format or are aligned in PHYLIP format. For more information about FASTA and PHYLIP format, please refer to Input Format.

Sequence Type

Select which type the sequences are. Are they DNA or protein sequences ?

Sequence

You can use demotrative example sequences if there is no sequence file at hand. It is available not only to paste sequences on the blank area, but also you can upload the sequence file.

Number of Bootstrap Data Sets

PALM will generate bootstrapped pseudo data sets by defining the number of bootstrap data sets (100, 500, and 1000) from this original data. It returns the bootstrap tree with branch lengths and bootstrap values. Since it is a time-consuming job if the number of bootstrap is large enough, PALM estimates the overall computing time and slashes the bootstrap value into a smaller one automatically if necessary. The users like to analyze sequences with larger number of bootstrapping can send their files to the corresponding authors individually.

Job Note

This blank is filled out for users to comment on their job and sequences.

Number of Substition Rate Categories

The rate of substitution equals the rate of mutation. We consider that all the sites evolve at the same rate and use four categories. Discrete-gamma distributions can be used to account for variable subsitution rates, and users can define the number of categories. The higher the number, the better is the goodness-of-fit, but higher than eight is not recommended.

Starting Tree

It is better to use BIONJ tree, an improved version of NJ distance-based tree reconstruction algorithm. It is possible to use specific tree if the tree is written in the standard NEWICK format. If the user tree is invalid, PALM will turn this option into BIONJ for running smoothly.

Model Selection Criterion

Once the maximum likelihoods are calculated, different statistic information might show us which model best fits the data. You can select among the Akaike Information Criterion (AIC), the Corrected AIC (AICc) and the Bayesian Information Criterion (BIC). In addition, you can select the LnL, if it is adequate for model selection. Brief introduction of these statistical models are described below.

Akaike Information Criterion (AIC) (Akaike, 1974)

AIC

where K is the number of parameters in the models, and LnL is the maximum likelihood value. Furthermore, Akaike information weight (AICw) is the normalized relative AIC and can be interpreted, from Bayesian perspective, as the probability that a model is the best approximation to the data. cumWeight shown in ModelTest outfile is the cumulative information weight.

The model having the lowest AIC value is expected to be the best model. AIC difference is presented for comparing all models on relative scale. For each model, the difference is its AIC value minus the smallest AIC value among all candidate models.

delta

Corrected AIC (AICc) (Sugiura, 1978) is AIC with a second order correction which may occur where the sample size n is small. If sample size is large enough, AICc is similar to AIC.

AICc

Bayesian Information Criterion (BIC) (Schwarz, 1978) is proposed to overcome overtraining problem that occurs when the number of parameter is too large. BIC have been used widely and been considered a good method in many other fields.

BIC

Optimize Tree Topology and Branch Lengths

You can optimize the tree topology, the branch lengths and rate parameters. Notice that this option would cost lots of time.

Enter Your Email

It is necessary to fill out your email address. Since there may several jobs being on the queue, and the consuming time of each job is variable from some to several hours, you will receive notice mail when job is completed. It is convenient to make schdule without always waiting for the results.

 

Frequently Asked Questions

Q: Why PALM occasionally costs lots of time on my jobs ?

A: It may mostly because of the poor alignment on users' data. If one sequence is apparently different from others, that will lead to poor alignment. It is time-consuming to deal with data in differenct clusters. Therefore, PALM may cost more time on this job. When such phenomenon occurs, we recommend users to expel distinct sequences first for a shorter computation time.

Any input can be used to construct phylogenetic trees, but when using sequences with less than 60% similarity for nucleotide sequences, or less than 25% similarity for amino acid sequences, such trees are of little value.