Data Analytics in Bioinformatics. Группа авторов

Читать онлайн.
Название Data Analytics in Bioinformatics
Автор произведения Группа авторов
Жанр Программы
Серия
Издательство Программы
Год выпуска 0
isbn 9781119785606



Скачать книгу

to high emission of harmful carbon dioxide gas, which is a major cause of global climate change.

       The study of different microorganisms that uses carbon dioxide as their leading source will help to reduce atmospheric carbon dioxide levels.

      Biotechnology

       In bioinformatics, biotechnology is used to identify organisms and micro-organisms which can be useful in dairy industries and food manufacturing companies. For example micro-organisms like Lactococcus lactis involved in dairy industry for the manufacture of buttermilk, cheese, yogurt, etc.

       Study of bioinformatics involves study of DNA, RNA sequence, prediction of function and structure of protein of plant genomes.

       Genetic Knowledge of plants has shown the organisation of genes of plants and this knowledge is used for producing improved insect resistant crops and makes plants more productive and the protein model helps to improve genes of plants.

      Insect Resistance

       Soil-borne bacteria like Bacillus thuringiensis makes proteins that are toxic to some insects.

       Genes of these soil-borne bacteria have been studied and successfully transferred to cotton, potatoes and maize to control many serious pests [4, 5].

       These bacteria facilitate to repel insect attack so the practice of using insecticides in plants can be reduced with the study of the protein produced by them and hence the nutritional content of the plants can be improved.

      Development of Drought Resistance Varieties

       Genetic knowledge of plants helps to develop varieties of crops with a great tolerance of soil alkalinity, iron toxicities and have the capability to grow in reduced water condition. This also allows crop development in substandard soil regions to create more agricultural land and to increase crop production [7].

      Comparative Studies

       To understand the functions of genes, inherited diseases mechanisms and evolution of species we need to analyze and compare the genetic substance of different species.

       Bioinformatics tools are also applied to make comparisons between the numbers, locations and biochemical functions of genes in different organisms [5, 6].

       3.1.3 Issues with Bioinformatics

      Section 3.1.1 discusses different applications of bioinformatics. These applications come with many challenges when it is associated with some issues related with the data or the devices used for collection or analysis of it. So addressing and analysing these issues are required for proper execution and effective result. This subsections below discusses different issues that are faced when the biological study is conducted.

       3.1.3.1 Issues Related to Structure

      Study of DNA and protein includes problems like protein structure prediction as they are represented in 3D data, so structure prediction, alignment and analysis become a difficult task. The prediction of protein three-dimensional structure from sequence can be solved with the application of ANN.

      Most of the biological networks such as protein–protein interaction networks, gene regulatory network, etc. are difficult to interpret and build due to the complexity of biological system. So using graph-theoretic methods these massive range of networks are displayed in graphs which makes classification very difficult using traditional methods.

       3.1.3.2 Sequence Analysis

      Classification of RNA, Protein Sequence and DNA become a challenge because of difference and similarity of many organisms.

      Issue with Genome Sequence

      A Genome denotes to the complete set of chromosomes of an organism consisting of DNA. Genome sequencing, is a way of mapping out DNA or ordering DNA for organizing, processing and interpreting the sequences, which again requires improvements in sequencing strategies. Each sequencing of DNA faces challenges in searching the sequence pattern, designing, analyzing and interpreting the data.

       In gene findings and genome annotation: Gene finding suggests for prediction of nucleotide sequence such as introns and exons in DNA-sequence segments, whereas genome annotation is a process of gene sequencing to find out the gene coding regions to analyze protein sequence [8]. It involves study of the repetitive DNA within the genome, emulated from either same or nearly same sequence.

       In sequence comparison: Sequence comparison is the process of comparing two or more than two sequences. Availability of large amount of sequences in genomic database requires proper categorization of DNA and protein sequence. So sequence comparison helps assigning a hypothetical structure and function to a sequence for identification, design and interpretation of sequence [8].

      Analysis of sequencing or DNA sequencing is an important task because it helps to detect individual genes that are associated with a disease. When a disease affects an individual, its protein or genes get altered, that causes gene sequence alteration. So it becomes very important to detect these genes to find the cure of the disease. Traditional methods of gene detection were based on trial and error method. Now the advancement in Data mining and machine learning like Neural Network (NN) allow more precise study of genes and its sequence to simplify the task [9]. Many machine learning algorithms are used to classify the normal and abnormal genes with a great accuracy.

      Solution to above problems involves following steps

       Collection of Biological Data

       Building Computational model

       Analyze and solve problems of computational model

       Test the computation algorithm

       Evaluate the performance of the model.

      Bioinformatics deals with various biological datasets being collected at different levels of omics data such as

       Genomic Sequence data

       Protein Sequence data

       Microarray data

       Structure data (Structure of RNA and protein)

       Chemical data

       Disease data.

      Based on the type of data Biological database can be divided in to two categories:

       a. Primary DatabaseThese kinds of databases are archival in nature because these databases are created by the experimental results submitted directly by researchers. These databases are populated with protein sequence, nucleotide sequence or macromolecular structure etc. [10].Example: Protein Data Bank (PDB), GenBank, DNA Data Bank of Japan (DDBJ), Gene Expression Omnibus (GEO).

       b. Secondary DatabaseThese databases are either manually created or extracted from result analysis of primary database to create more structured records for easy retrieval of data [10]. Example: Swiss-port (it is protein sequence database maintained by Swiss Institute of Bioinformatics, Switzerland and the European Bioinformatics Institute, UniProt Knowledgebase.