Keywords

1 Introduction

The branch of bioinformatics is mainly based on analysis of various biological data such as physiological, biochemical, and genetic information with the help of modern software and data already existing as result of numerous observation made by contributors. The large number of data present helps to generate information about living world [1, 2]. Phylogenetic tree is a pictorial representation of progressive relationship among organism. Their branching shows that how much species is evolved with common ancestor. The distance of one group from the other groups indicates the degree of relationship; i.e., closely related groups are located on branches close to one another and vice versa [3]. To identify these close groups, clusters emerge as more robust method.

Cluster analysis commonly represented as clustering is a method in which we can group a set of objects or characters of similar factors more close to each other. Each cluster formed is viewed as class of objects. The objects thus clustered are based on the phenomenon of intra-class maximization in similarity and minimizing the inter-class similarity. Their grouping based on similarities and dissimilarities in their physical and genetic characteristic. The construction of graphical phylogenetic tree reveals similarity as well as dissimilarity among organisms [4]. To make a cluster or group of similar organism in the tree, various clustering methods are applied. Such observations also make the appearance of organism in relation to time. Phylogenetic tree represents branches and nodes. Basically, phylogenetic tree is categorized as follows [5].

  1. (1)

    Rooted Tree: Rooted trees are single node consist of a common sector and a uncommon path emerging from it across evolutionary time to any other node.

  2. (2)

    Un-rooted Tree: Un-rooted trees help in specifying the relationship among nodes and depict nothing about the direction in which evolution happens.

Phylogenetic trees based on sequence data give the more accurate encryption of patterns of relatedness. It also gives the Linnaean classification of new species. Phylogenetical technique is now commonly used to assess DNA evidence presented under law to inform situations. Molecular sequencing techniques with assistance of phylogenetic approaches are now in use to learn more about a new pathogen outbreak. This includes finding out about which species the pathogen is related to and subsequently the likely source of transmission. This gave the direction to new approaches recommended for public health policy. Besides this, phylogenetical recommendations help us by informing conservation policy for various uniquely identified extinct species [6].

2 Methodology

2.1 Distance-Based Method

2.1.1 UPGMA

The phonograms are result of agglomerative or hierarchical clustering applied in bioinformatics. Pair-wise distance matrix (similarity index) is major tool which is to study applying algorithm, which results in the structure and construct of a rooted tree, i.e., genogram [7] (Fig. 1).

Fig. 1
figure 1

Flowchart showing steps of UPGMA method

2.1.2 Neighbor Joining Method

The DNA or protein sequence data obtained after wet laboratory analysis is used for production of trees based on algorithm which generates a pair-wise distances of taxa resulting into construct. Neighbor joining is an application of distance matrix explaining the distance between each pair of taxa. Presently, neighbor joining also known as bottom clustering is frequently used technology to generate phenograms [8, 9] (Fig. 2).

Fig. 2
figure 2

Flowchart showing steps of neighbor joining method

2.2 Character-Based Method

2.2.1 Maximum Parsimony

This technique is the widely accepted and is based on the assumption that the most preferable tree generated depends upon the requirement of minimum number of alteration to depict the data used in alignment [10, 11]. The minimum length of the branches indicates most parsimonious tree and is considered as best representative of the evolutionary pattern. The minimum alterations within the tree construct have always been the basic approach of MP. The minimum number of changes within the tree is generated using post-order traversal activity initiating from the leaf of the tree leading toward the root [12, 13] (Fig. 3).

Fig. 3
figure 3

Flowchat showing steps of maximum parsimony method

2.2.2 Maximum Likelihood Felsenstein

It is one of the most computationally intensive approaches. The model of nucleotide evolution and tree topology is optimal requirement of this methodology. ML is used to specify model of evolution using data available. The process also defines the greatest probability of observed data. Since ML depends upon purely probability or likelihood, it makes no similarity with other existing methods. The probability or likelihood is obtained where L = P (data/tree), indicating probability of observing data. The ML provides advantage of statistical comparisons between topologies. It is one of the robust methods used by the scientists. On the other hand the disadvantage of the process that one can observe multiple maximum likelihood for a given phylogenetic tree, which demands more detailed computational exercise [14, 15].

3 Conclusion

In all exercise, generation of construct using given sequence or data requires high degree of accuracy, thus achieving an optimal alignment along with reduction in complexity of sequences. Such results are obtained with the use of distance-based and character-based techniques. The large number of models can be constructed using distance-based method; on the other hand, phenotypic and genotypic attributes are generated using character-based method (Table 1).

Table 1 Comparison of methods