Volume 21 Supplement 9

Selected Articles from the 20th International Conference on Bioinformatics & Computational Biology (BIOCOMP 2019)

  • Introduction
  • Open access
  • Published: 03 December 2020

Current trend and development in bioinformatics research

  • Yuanyuan Fu 1 ,
  • Zhougui Ling 1 , 2 ,
  • Hamid Arabnia 3 &
  • Youping Deng 1  

BMC Bioinformatics volume  21 , Article number:  538 ( 2020 ) Cite this article

10k Accesses

16 Citations

4 Altmetric

Metrics details

This is an editorial report of the supplements to BMC Bioinformatics that includes 6 papers selected from the BIOCOMP’19—The 2019 International Conference on Bioinformatics and Computational Biology. These articles reflect current trend and development in bioinformatics research.

The supplement to BMC Bioinformatics was proposed to launch during the BIOCOMP’19—The 2019 International Conference on Bioinformatics and Computational Biology held from July 29 to August 01, 2019 in Las Vegas, Nevada. In this congress, a variety of research areas was discussed, including bioinformatics which was one of the major focuses due to the rapid development and requirement of using bioinformatics approaches in biological data analysis, especially for omics large datasets. Here, six manuscripts were selected after strict peer review, providing an overview of the bioinformatics research trend and its application for interdisciplinary collaboration.

Cancer is one of the leading causes of morbidity and mortality worldwide. There exists an urgent need to identify new biomarkers or signatures for early detection and prognosis. Mona et al. identified biomarker genes from functional network based on the 407 differential expressed genes between lung cancer and healthy populations from a public Gene Expression Omnibus dataset. The lower expression of sixteen gene signature is associated with favorable lung cancer survival, DNA repair, and cell regulation [ 1 ]. A new class of biomarkers such as alternative splicing variants (ASV) have been studied in recent years. Various platforms and methods, for example, Affymetrix Exon-Exon Junction Array, RNA-seq, and liquid chromatography tandem mass spectrometry (LC–MS/MS), have been developed to explore the role of ASV in human disease. Zhang et al. have developed a bioinformatics workflow to combine LC–MS/MS with RNA-seq which provide new opportunities in biomarker discovery. In their study, they identified twenty-six alternative splicing biomarker peptides with one single intron event and one exon skipping event; further pathways indicated the 26 peptides may be involved in cancer, signaling, metabolism, regulation, immune system and hemostasis pathways which validated by the RNA-seq analysis [ 2 ].

Proteins serve crucial functions in essentially all biological processes and the function directly depends on their three-dimensional structures. Traditional approaches to elucidation of protein structures by NMR spectroscopy are time consuming and expensive, however, the faster and more cost-effective methods are critical in the development of personalized medicine. Cole et al. improved the REDRAFT software package in the important areas of usability, accessibility, and the core methodology which resulted in the ability to fold proteins [ 3 ].

The human microbiome is the aggregation of microorganisms that reside on or within human bodies. Rebecca et al. discussed the tissue-associated microbial detection in cancer using next generation sequencing (NGS). Various computational frameworks could shed light on the role of microbiota in cancer pathogenesis [ 4 ]. How to analyze the human microbiome data efficiently is a huge challenge. Zhang et al. developed a nonparametric test based on inter-point distance to evaluate statistical significance from a Bayesian point of view. The proposed test is more efficient and sensitive to the compositional difference compared with the traditional mean-based method [ 5 ].

Human disease is also considered as the cause of the interaction between genetic and environmental factors. In the last decades, there was a growing interest in the effect of metal toxicity on human health. Evaluating the toxicity of chemical mixture and their possible mechanism of action is still a challenge for humans and other organisms, as traditional methods are very time consuming, inefficient, and expensive, so a limited number of chemicals can be tested. In order to develop efficient and accurate predictive models, Yu et al. compared the results among a classification algorithm and identified 15 gene biomarkers with 100% accuracy for metal toxicant using a microarray classifier analysis [ 6 ].

Currently, there is a growing need to convert biological data into knowledge through a bioinformatics approach. We hope these articles can provide up-to-date information of research development and trend in bioinformatics field.

Availability of data and materials

Not applicable.

Abbreviations

The 2019 International Conference on Bioinformatics and Computational Biology

Liquid chromatography tandem mass spectrometry

Alternative splicing variants

Nuclear Magnetic Resonance

Residual Dipolar Coupling based Residue Assembly and Filter Tool

Next generation sequencing

Mona Maharjan RBT, Chowdhury K, Duan W, Mondal AM. Computational identification of biomarker genes for lung cancer considering treatment and non-treatment studies. 2020. https://doi.org/10.1186/s12859-020-3524-8 .

Zhang F, Deng CK, Wang M, Deng B, Barber R, Huang G. Identification of novel alternative splicing biomarkers for breast cancer with LC/MS/MS and RNA-Seq. Mol Cell Proteomics. 2020;16:1850–63. https://doi.org/10.1186/s12859-020-03824-8 .

Article   Google Scholar  

Casey Cole CP, Rachele J, Valafar H. Increased usability, algorithmic improvements and incorporation of data mining for structure calculation of proteins with REDCRAFT software package. 2020. https://doi.org/10.1186/s12859-020-3522-x .

Rebecca M, Rodriguez VSK, Menor M, Hernandez BY, Deng Y. Tissue-associated microbial detection in cancer using human sequencing data. 2020. https://doi.org/10.1186/s12859-020-03831-9 .

Qingyang Zhang TD. A distance based multisample test for high-dimensional compositional data with applications to the human microbiome . 2020. https://doi.org/10.1186/s12859-020-3530-x .

Yu Z, Fu Y, Ai J, Zhang J, Huang G, Deng Y. Development of predicitve models to distinguish metals from non-metal toxicants, and individual metal from one another. 2020. https://doi.org/10.1186/s12859-020-3525-7 .

Download references

Acknowledgements

This supplement will not be possible without the support of the International Society of Intelligent Biological Medicine (ISIBM).

About this supplement

This article has been published as part of BMC Bioinformatics Volume 21 Supplement 9, 2020: Selected Articles from the 20th International Conference on Bioinformatics & Computational Biology (BIOCOMP 2019). The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-21-supplement-9 .

Publication of this supplement has been supported by NIH grants R01CA223490 and R01 CA230514 to Youping Deng and 5P30GM114737, P20GM103466, 5U54MD007601 and 5P30CA071789.

Author information

Authors and affiliations.

Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii at Manoa, Honolulu, HI, 96813, USA

Yuanyuan Fu, Zhougui Ling & Youping Deng

Department of Pulmonary and Critical Care Medicine, The Fourth Affiliated Hospital of Guangxi Medical University, Liuzhou, 545005, China

Zhougui Ling

Department of Computer Science, University of Georgia, Athens, GA, 30602, USA

Hamid Arabnia

You can also search for this author in PubMed   Google Scholar

Contributions

YF drafted the manuscript, ZL, HA, and YD revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Youping Deng .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Fu, Y., Ling, Z., Arabnia, H. et al. Current trend and development in bioinformatics research. BMC Bioinformatics 21 (Suppl 9), 538 (2020). https://doi.org/10.1186/s12859-020-03874-y

Download citation

Published : 03 December 2020

DOI : https://doi.org/10.1186/s12859-020-03874-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Bioinformatics
  • Human disease

BMC Bioinformatics

ISSN: 1471-2105

bioinformatics related research papers

  • Search Menu
  • Advance articles
  • Author Guidelines
  • Submission Site
  • Open Access
  • Why publish with this journal?
  • About Bioinformatics
  • Journals Career Network
  • Editorial Board
  • Advertising and Corporate Services
  • Self-Archiving Policy
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

Issue Cover

Editors-in-Chief

Janet Kelso

Alfonso Valencia

The leading journal in its field,  Bioinformatics  publishes the highest quality scientific papers and review articles of interest to academic and industrial researchers. Its main focus is on new developments in genome bioinformatics and computational biology.

Learn more about publishing with Bioinformatics here.

Why Publish?

Why publish with Bioinformatics ?

Are you looking for a home for your research? Publish with  Bioinformatics  and enjoy a variety of benefits, including: 

  • Strong reputation and Impact Factor
  • Distinguished and supportive editors
  • Fully open access

I'm ready to learn more!

Author Guidelines

Read our author guidelines to find out more about:

  • Manuscript types
  • Review processes
  • Open access options
  • Manuscript preparation

Author guidelines

format free submissions

Format free submissions

At first submission, Bioinformatics authors are no longer required to format their manuscript according to journal guidelines.

Find out more about submitting a manuscript

Bioinformatics is fully Open Access

Bioinformatics  is now fully OA

Bioinformatics is now fully open access. Visit the journal's open access page to learn more about the change.

Learn about the change

bioinformatics related research papers

High-Impact Research Collection

Explore the most read, most cited, and most discussed articles published in  Bioinformatics  in recent years and discover what has caught the interest of your peers.

Browse the collection

bioinformatics related research papers

International Society for Computational Biology

Bioinformatics  is an official journal of the International Society for Computational Biology, the leading professional society for computational biology and bioinformatics. Members of the society receive a 15% discount on article processing charges when publishing Open Access in the journal.

  • Read papers from the ISCB

Find out more

bioinformatics related research papers

Browse by subject

  • Genome analysis
  • Sequence analysis
  • Phylogenetics
  • Structural bioinformatics
  • Gene expression
  • Genetics and population analysis
  • Systems biology
  • Data and text mining
  • Databases and ontologies
  • Bioimage informatics

Publons sec story1

Bioinformatics  and Publons

Bioinformatics  is part of a trial with Publons to recognise our expert peer reviewers and raise the status of peer review.

Latest articles

Alerts in the Inbox

Email alerts

Register to receive table of contents email alerts as soon as new issues of Bioinformatics are published online.

Altmetric logo

Discover a more complete picture of how readers engage with research in Bioinformatics  through Altmetric data. Now available on article pages.

COPE logo

Committee on Publication Ethics (COPE)

This journal is a member of and subscribes to the principles of the Committee on Publication Ethics (COPE)

publicationethics.org

Recommend to your library

Recommend to your library

Fill out our simple online form to recommend Bioinformatics to your library.

Recommend now

Related Titles

Cover image of current issue from Bioinformatics Advances

  • Recommend to your Library

Affiliations

  • Online ISSN 1367-4811
  • Copyright © 2024 Oxford University Press
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Navigation group

Submission open

Submission closed

  • All sections
  • Computational BioImaging
  • Data Visualization
  • Drug Discovery in Bioinformatics
  • Evolutionary Bioinformatics
  • Genomic Analysis
  • Integrative Bioinformatics
  • Network Bioinformatics
  • Protein Bioinformatics
  • Single Cell Bioinformatics
  • All results
  • Has no e-book
  • Submission All Submission open Submission closed No matches to your query could be found. Try another search term.
  • Sections All sections Computational BioImaging Data Visualization Drug Discovery in Bioinformatics Evolutionary Bioinformatics Genomic Analysis Integrative Bioinformatics Network Bioinformatics Protein Bioinformatics Single Cell Bioinformatics See more No matches to your query could be found. Try another search term.
  • Ebook All results Has e-book Has no e-book No matches to your query could be found. Try another search term.

Research Topics

Bioinformatics approaches to investigate antimicrobial resistance (amr) in human, animal and environment.

  • Mohamed Samir
  • Hazem Ramadan
  • Yasser Mahmmod

bioinformatics related research papers

Omics Technologies and Bioinformatic Tools in Probiotic Research

  • Alex Galanis
  • Konstantinos Papadimitriou
  • Gerard M Moloney

bioinformatics related research papers

Innovative Tools for Multi-Omics Data Analysis

  • Runzhi Zhang
  • Lingsong Meng

bioinformatics related research papers

Critical Assessment of Massive Data Analysis (CAMDA) Annual Conference 2023

  • Paweł P Łabaj
  • Wenzhong Xiao
  • Joaquin Dopazo

bioinformatics related research papers

Computational protein function prediction based on sequence and/or structural data

  • Yaan J. Jang

bioinformatics related research papers

Good Practice in Data Analysis and Integration

  • Edoardo Saccenti
  • Raffaele Vitale
  • Chris Maliepaard
  • 3,447 views

bioinformatics related research papers

Bioinformatics tools and approaches for prediction and assessment of protein allergenicity and toxicity potential

  • Minh Nguyen
  • Andreas L. Lopata
  • Anand Kumar Andiappan

bioinformatics related research papers

Evolution of Short Genomic Regions: Discoveries, Methods, and Challenges

  • Nicole Hansmeier
  • Fabia Ursula Battistuzzi
  • Tzu-Chiao Chao
  • Helen Piontkivska

bioinformatics related research papers

Multi-omics approaches in the study of human disease mechanisms

  • Dapeng Wang
  • Giuseppe Agapito
  • 2,360 views

bioinformatics related research papers

Completing the Timetree of Life

  • Beatriz Mello
  • Jack M Craig
  • S. Blair Hedges
  • 1,550 views

bioinformatics related research papers

Original strategies for training and educational initiatives in bioinformatics, Volume II

  • Raquel Cardoso de Melo Minardi
  • Renato Augusto Corrêa Dos Santos
  • Allegra Via
  • 2,809 views

bioinformatics related research papers

Big data and artificial intelligence for genomics and therapeutics – Proceedings of the 19th Annual Meeting of the MidSouth Computational Biology and Bioinformatics Society (MCBIOS)

  • Huixiao Hong
  • Robert John Doerksen
  • Inimary Toby
  • Zhaohui Steve Qin
  • 4,283 views

bioinformatics related research papers

Bioinformatics and computational biology for investigating the impact of food metabolites on human health

  • Deborah Giordano
  • Maria I Klapa
  • Dominik Heider
  • 1,038 views

bioinformatics related research papers

Women in Bioinformatics

  • Tallulah Andrews
  • Constanza Cárdenas Carvajal
  • Irma Martínez-Flores
  • 3,289 views

bioinformatics related research papers

Computational Methods for Analysis of DNA Methylation Data, Volume II

  • Pietro Di Lena
  • Christine Nardini
  • Matteo Pellegrini
  • 2,304 views

bioinformatics related research papers

From one genome to many genomes: the evolution of computational approaches for pangenomics and metagenomics analysis

  • Leena Salmela
  • Paola Bonizzoni
  • Cinzia Pizzi

Bank Runs, Fragility, and Regulation

We examine banking regulation in a macroeconomic model of bank runs. We construct a general equilibrium model where banks may default because of fundamental or self-fulfilling runs. With only fundamental defaults, we show that the competitive equilibrium is constrained efficient. However, when banks are vulnerable to runs, banks’ leverage decisions are not ex-ante optimal: individual banks do not internalize that higher leverage makes other banks more vulnerable. The theory calls for introducing minimum capital requirements, even in the absence of bailouts.

The views expressed herein are those of the authors and not necessarily those of the Federal Reserve Bank of Minneapolis, the Federal Reserve System, or the National Bureau of Economic Research.

MARC RIS BibTeΧ

Download Citation Data

More from NBER

In addition to working papers , the NBER disseminates affiliates’ latest findings through a range of free periodicals — the NBER Reporter , the NBER Digest , the Bulletin on Retirement and Disability , the Bulletin on Health , and the Bulletin on Entrepreneurship  — as well as online conference reports , video lectures , and interviews .

15th Annual Feldstein Lecture, Mario Draghi, "The Next Flight of the Bumblebee: The Path to Common Fiscal Policy in the Eurozone cover slide

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • Published: 01 November 2002

Bioinformatics and genomic medicine

  • Ju Han Kim 1  

Genetics in Medicine volume  4 ,  pages 62–65 ( 2002 ) Cite this article

3906 Accesses

15 Citations

Metrics details

Bioinformatics is a rapidly emerging field of biomedical research. A flood of large-scale genomic and postgenomic data means that many of the challenges in biomedical research are now challenges in computational science. Clinical informatics has long developed methodologies to improve biomedical research and clinical care by integrating experimental and clinical information systems. The informatics revolution in both bioinformatics and clinical informatics will eventually change the current practice of medicine, including diagnostics, therapeutics, and prognostics. Postgenome informatics, powered by high-throughput technologies and genomic-scale databases, is likely to transform our biomedical understanding forever, in much the same way that biochemistry did a generation ago. This paper describes how these technologies will impact biomedical research and clinical care, emphasizing recent advances in biochip-based functional genomics and proteomics. Basic data preprocessing with normalization and filtering, primary pattern analysis, and machine-learning algorithms are discussed. Use of integrative biochip informatics technologies, including multivariate data projection, gene-metabolic pathway mapping, automated biomolecular annotation, text mining of factual and literature databases, and the integrated management of biomolecular databases, are also discussed.

Similar content being viewed by others

bioinformatics related research papers

High-throughput proteomics: a methodological mini-review

Miao Cui, Chao Cheng & Lanjing Zhang

bioinformatics related research papers

Decoding disease: from genomes to networks to phenotypes

Aaron K. Wong, Rachel S. G. Sealfon, … Olga G. Troyanskaya

bioinformatics related research papers

Multiparametric and accurate functional analysis of genetic sequence variants using CRISPR-Select

Yiyuan Niu, Catarina A. Ferreira Azevedo, … Morten Frödin

Clinical informatics and bioinformatics

The decade of the 1940s brought the first electronic digital computers, as well as the first antibiotic, penicillin. Motivated by these revolutionary innovations, by the late 1950s a few biomedical researchers had started to explore the possible utility of digital computers. By the 1960s, there was extensive use of computers in the medical sciences, which are fundamentally information-intensive. The English term medical informatics (a translation from the Russian informatika ) first appeared in 1974 because of the need for a name for this area of new biomedical knowledge and because of the lack of a single English term that includes both information (what is processed) and computers (how it is processed). The name also needed to encompass the fields of science , engineering , and technology . 1

Bioinformatics, a newly named and rapidly emerging field of biomedical research, has been recognized for about a decade. The emergence of modern bioinformatics obtained enormous insight from carefully constructed clinical genetics databases, such as disease-specific mutation databases and genotype-phenotype analyses. A flood of large-scale genomic and postgenomic data, powered by high-throughput technologies and large-scale databases, means that many of the challenges in biomedical research are now challenges in computational science. Not only are many of the fundamental problems in genomics/proteomics, such as string sequence homology, pattern recognition, structure prediction, and network analysis, the problems of computational science, but so also are the structural, behavioral, and developmental features of living organisms fundamentally informatical phenomena.

Biomedical informatics, the convergence of bioinformatics and clinical informatics, is radically transforming our biomedical understanding much the same way that biochemistry did a generation ago. Some academic institutions have already integrated bioinformatics and clinical informatics programs that have shared areas of research, 2 , 3 core methodologies, challenges, goals, and impact. 4 – 6 As bioinformatics moves from constructing raw biomolecular data into their biological functions and clinical importance, quality clinical information will become the critical part of further progress. A patient's biomolecular information, such as personal and familial genetic code, will soon be included in his/her electronic medical record as the most predictive clinical information for diagnostics, therapeutics, and prognostics; and this could threaten the right of privacy and confidentiality. Comprehensive integration of bioinformatics and clinical informatics systems, then, will be one of the primary challenges in the next decades.

Accomplishments of bioinformatics and the clinical relevance of biochip informatics

The critical dependence of the success of the Human Genome Project on bioinformatics is just one example of the remarkable accomplishments of bioinformatics. Other areas where bioinformatics has been crucial include sequence alignment of DNA and protein, natural genetic variation, prediction of the structure and function of biological macromolecules, analysis of biomolecular interaction networks, integration of heterogeneous biological databases, biomolecular knowledge representation, simulation of biological processes, analysis of the data created by large-scale biological experiments, and rational drug design.

Most researchers agree that the challenge now is to understand all the data. The speed of data generation now exceeds that of interpretation (i.e., more sequences than related publications in GenBank). This has become even more serious with the introduction of biochips that measure the functional activities of genes and proteins. DNA microarrays are microscopic slides containing a large number of cDNA (or oligonucleotide) samples as fluorescently labeled probes to quantitatively monitor the abundance of transcripts (or mRNAs). An image scanner translates fluorescent intensities into a numerical matrix of expression profiles.

Now that we have comprehensive maps of the human genome and transcriptome and since biochip technology can be applied to cells or tissue samples without pulling genes or proteins from them, we have an astounding technique to address the comprehensive spatial and temporal genomic complexity in living organisms under different experimental conditions. Biochip informatics with comprehensive expression profiling is clearly one of the most direct bridges from biomolecular informatics to clinical medicine and the improvement of diagnostics, therapeutics, and prognostics.

Integrative biochip informatics in functional genomics and proteomics

Biochip informatics: basic data analysis.

Because there are many sources of noise and systematic variability in microarray experiments, 7 , 8 data normalization and preprocessing are crucial in analysis. Normalization includes those transformations that control systematic variabilities within a chip or across multiple chips. The simplest way data normalization can be done is by dividing or subtracting all expression values by a representative value for the system or by a linear transformation to a fixed mean (i.e., 0.0) and unit variance (i.e., 1.0) (sometimes called “median polishing”). However, the linear response between the true expression level and measured fluorescent intensity may not be guaranteed, 9 , 10 especially when dye biases depend on array spot intensity or when multiple print tips are used in the microarray spotter. 11

Data preprocessing includes those transformations that prepare the data for the subsequent analysis. Scaling and filtering are the major steps of data preprocessing. A low variation filter to exclude genes that did not change significantly across experiments has been successfully applied in many studies. 12 Statistical significance testing, such as the analysis of variance and multiple comparisons, can also be used to filter data that show no significant change across conditions when a sufficient number of repeated observations are available.

The importance of data visualization cannot be overemphasized. It is highly recommended to scatter-plot the data whenever possible. The most straightforward approach to microarray data analysis is to find differentially expressed genes across different experimental conditions. 13 , 14 Standardized expression profiling, consistent database design, and streamlining the experimental process management are all crucial, 15 , 16 as are the supervised and unsupervised machine-learning algorithms that make sense of the mountains of genomic data. Here now is a brief description of the various machine-learning approaches to deciphering genomic data.

Biochip informatics: Functional clustering and machine-learning approaches

A general question in many research areas is how to organize observed data into meaningful structures. One common difficulty in biochip data analysis is the very high dimensionality of the data. The data projection method reduces high dimensionality and projects complex data structure onto a lower dimensional space. Cluster analysis, by reducing dimensionality, creates hypothesized clusters and helps researchers infer unknown functions of genes based on the assumption that a group of genes with similar expression profiles may be functionally associated.

Principal component analysis, a statistical approach to reduce dimensionality without losing significant information by paying attention only to those dimensions that account for large variance in the data, has been applied to microarray data analysis. 17 , 18 Mutidimensional scaling, a data projection method originally developed in mathematical psychology, 19 has also been shown to be a powerful tool in functional genomics research. 20

Cluster analysis is currently the most frequently used multivariate technique to analyze microarray data. Clusters can be developed using a variety of similarity or distance metrics: Euclidean distance, correlation coefficients, or mutual information. Hierarchical tree clustering joins similar objects together into successively larger clusters in a bottom-up manner (i.e., from the leaves to the root of the tree), by successively relaxing the threshold of joining objects or sets ( Fig. 1 ). 21 , 22 The relevance-networks approach takes the opposite strategy. 23 It starts with a completely connected graph with the vertices representing each object and the edges representing a measure of association, and then links are increasingly deleted to reveal “naturally emerging” clusters at a certain threshold.

figure 1

Cluster analysis and graphical display of genome-wide expression patterns (Jurkat T cells under gamma irradiation). (a) Hierarchical clustering creates functional clusters with color-coded expression patterns. (b) Partitional clusters with geometric grid structure are created by self-organizing maps.

Partitional clustering algorithms, such as K -means analysis and self-organizing maps, 24 which minimize within-cluster scatter or maximize between-cluster scatter, were shown to be capable of finding meaningful clusters from functional genomic data ( Fig. 1 ). 25 , 26 Creation of a hierarchical-tree structure in a top-down fashion (i.e., from the root to the leaves of the tree) by successive “optimal” binary partitioning based on graph theory 27 and geometric space-partitioning principle 28 has also been introduced.

The “optimal” partitioning problem (i.e., the best clustering) is fundamentally NP-hard and can be viewed as an optimization problem. Most of the meta-heuristic algorithms, such as simulated annealing and genetic algorithm 29 and model-based search, 30 can all be applied to attain better understanding of the complex data structure of genomic-scale expression profiles. The reliability and quality measures of clusters, as well as multilevel visualization for the evaluation of clustering solutions, should be addressed. 31 , 32

Integrative biochip informatics

Exploratory data analysis, such as clustering, is appropriate when there is no a priori knowledge about the area of research. Such a technique is known as unsupervised machine learning in the artificial intelligence community. With increasing knowledge of complex biological systems, supervised machine-learning techniques (or classification algorithms) are also being increasingly introduced into functional genomics with significant success. 33 , 34

In addition to clustering and classifying expression profiles (or unsupervised and supervised machine learning), systematic integration and streamlining of appropriate informatics technologies can greatly enhance the productivity of functional genomics research. For example, PubGene 35 links gene expression profiles to biomedical literature by combining gene ontology and text mining techniques applied to the PubMed database ( http:/http://www.pubgene.org ). A variety of meta-databases 36 and natural language processing techniques 37 are being applied to extract biomolecular interaction networks from biomedical literature and factual databases. Linking this information to genetic regulatory network and metabolic pathway information like KEGG is undergoing vigorous research. Structural sequence information can be used to greatly enhance functional understanding. 38 , 39

At the Harvard Medical School–affiliated Children's Hospital in Boston, we have also developed automatic annotation machines for each microarray probe by integrating many of the publicly available bioinformatics databases. An automated inference engine to predict the functional annotation of genes works together with all the streamlined biochip informatics technologies, including basic data analysis, functional clustering, and supervised classification algorithms. The management of integrated databases, as well as intelligent modules, is becoming more important and challenging. We are currently integrating these biochip informatics technologies into the advanced clinical information systems at Children's Hospital.

Biomedical informatics: The emergence of new medicine

Large areas of medical research and biotechnological development will be permanently transformed by the evolution of high-throughput techniques and informatics. Biochip technology is one of the most readily applicable bioinformatics innovations to biomedical research and clinical medicine. It has been demonstrated that certain types of cancer can be classified by large-scale gene expression profiling. 40 The capability of new disease class discovery, as well as prognostic prediction, has also been demonstrated. 41 Drug discovery is being transformed by developments in molecular cell biology and bioinformatics. 42

The spectacular capability of biochip technology to aid clinical medicine is no wonder considering that, essentially, the technology simultaneously performs tens of thousands of molecular marker studies with comprehensive sets of the biologically most informative molecules, genes, and proteins, in a very systematic and quantitative fashion. By doing so, biochip technology uncovers the molecular basis of histopathological processes, the fundamentals of modern diagnostics.

Bioinformatics will not replace experiments, but miniaturization and automation of laboratory processes can streamline and enable the discovery process to an extraordinary degree. Integrating quality clinical information is crucial to achieve real improvements in clinical diagnostics, therapeutics, and prognostics. Thus bioinformatics is not merely a tool to assist the discovery process; it becomes an integral part of discovery and in this way will permanently transform the structure of our biomedical knowledge bases.

The weaving of the horizontally integrated “omic” revolution of all biological building blocks (genome, transcriptome, proteome, metabolome, and biome) with the vertical integration of biomedical informatics [molecular bioinformatics, computational cell biology, 43 computational physiology 44 (neuroinformatics), 45 digital anatomy 46 (structural informatics), chemoinformatics, 47 , 48 clinical informatics, 49 and public health informatics 50 ] has come of age. The new medicine will be both molecularly informed and informatically empowered.

Collen MF . A history of medical informatics in the United States 1950 to 1990. American Medical Informatics Association , 1995.

Altman RB . The interactions between clinical informatics and bioinformatics: a case study. J Am Med Inform Assoc 2000; 7 : 439–443.

Article   CAS   Google Scholar  

Miller PL . Opportunities at the intersection of bioinformatics and health informatics: a case study. J Am Med Inform Assoc 2000; 7 : 431–438.

Rinfleisch TC, Brutlag DL . Directions for clinical research and genomic research into the next decade: implications for informatics. J Am Med Inform Assoc 1998; 5 : 404–411.

Article   Google Scholar  

Altman RB . Bioinformatics in support of molecular medicine. Proc AMIA Symp 1998; 53–61.

Kohane IS . Bioinformatics and clinical informatics: the imperative to collaborate [editorial comment]. J Am Med Inform Assoc 2000; 7 : 512–515.

Schuchhardt J, Beule D, Malik A, Wolski E, Eickhoff H, Lehrach H, Herzel H . Normalization strategies for cDNA microarrays. Nucleic Acids Res 2000; 28 : E47.

Wildsmith SE, Archer GE, Winkley AJ, Lane PW, Bugelski PJ . Maximization of signal derived from cDNA microarrays. Biotechniques 2001; 30 : 202–206, 208.

Kepler TB, Crosby L, Morgan KT . Normalization and analysis of DNA microarray data by self-consistency and local regression. Genome Biol 2002; 3 : RESEARCH0037.

Tseng GC, Oh M, Rohlin L, Liao JC, Wong WH . Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variation and assessment of gene effects. Nucleic Acids Res 2001; 29 : 2549–2557.

Yang YH, Dudoit S, Luu P, Speed TP . Normalization for cDNA microarray data. SPIE BiOS 2001 , San Jose, CA, January 2001.

Google Scholar  

Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR . Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci U S A 1999; 96 : 2907–2912.

DeRisi J, Penland L, Brown PO, Bittner ML, Meltzer PS, Ray M, Chen Y, Su YA, Trent JM . Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat Genet 1996; 14 : 457–460.

Heller RA, Schena M, Chai A, Shalon D, Bedilion T, Gilmore J, Woolley DE, Davis RW . Discovery and analysis of inflammatory disease-related genes using cDNA microarrays. Proc Natl Acad Sci U S A 1997; 94 : 2150–2155.

Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M . Minimum information about a microarray experiment (MIAME): toward standards for microarray data. Nat Genet 2001; 29 : 365–371.

Perou CM . Show me the data!. Nat Genet 2001; 29 : 373–459.

Hilsenbeck S, Friedrichs W, Schiff R, O'Connell P, Hansen R, Osborne C, Fuqua SW . Statistical analysis of array expression data as applied to the problem of Tamoxifen resistance. J Natl Cancer Inst 1999; 91 : 453–459.

Yeung KY, Ruzzo WL . Principal component analysis for clustering gene expression data. Bioinformatics 2001; 17 : 763–774.

Shepard RN . Multidimensional scaling, tree-fitting, and clustering. Science 1980; 210 : 390–397.

Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M, Radmacher M, Simon R, Yakhini Z, Ben-Dor A, Sampas N, Dougherty E, Wang E, Marincola F, Gooden C, Lueders J, Glatfelter A, Pollock P, Carpten J, Gillanders E, Leja D, Dietrich K, Beaudry C, Berens M, Alberts D, Sondak V . Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 2000; 406 : 536–540.

Eisen MB, Spellman PT, Brown PO, Botstein D . Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 1998; 95 : 14863–14868.

Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JCF, Trent JM, Staudt LM, Hudson J, Boguski MS, Lashkari D, Shalon D, Botstein D, Brown PO . The transcriptional program in the response of human fibroblasts to serum. Science 1999; 283 : 83–87.

Butte AJ, Kohane IS . Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac Symp Biocomput 2000; 418–429.

Kohonen T . Self-organized formation of topologically correct feature maps. Biol Cybern 1982; 43 : 59–69.

Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM . Systematic determination of genetic network architecture. Nat Genet 1999; 22 : 281–285.

Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander E, Golub TR . Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci U S A 1999; 96 : 2907–2712.

Sharan R, Shamir R . CLICK: a clustering algorithm with applications to gene expression analysis. Proc Int Conf Intell Syst Mol Biol 2000; 8 : 307–316.

CAS   PubMed   Google Scholar  

Kim JH, Ohno-Machado L, Kohane IS . Unsupervised learning from complex data: the matrix incision tree algorithm. Pac Symp Biocomput 2001; 30–41.

Lee K, Kim JH, Chung TS, Moon BS, Lee H, Kohane IS . Evolution strategy applied to global optimization of clusters in gene expression data of DNA microarrays. Proceedings of IEEE Congress on Evolutionary Computation , Seoul, Korea, May 27–30, 2001; 845–850.

Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL . Model-based clustering and data transformations for gene expression data. Bioinformatics 2001; 17 : 977–987.

Yeung KY, Haynor DR, Ruzzo WL . Validating clustering for gene expression data. Bioinformatics 2001; 17 : 309–318.

Kim JH, Kohane IS, Ohno-Machado L . Visualization and evaluation of clusters for exploratory analysis of gene expression data. J Biomed Inform . In press.

Brown MPS, Grundy WB, Lin D, Christianini N, Sugnet CW, Furgey TS, Haussler D . Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci U S A 2000; 97 : 262–267.

Heyer LJ, Kruglyak S, Yooseph S . Exploring expression data: identification and analysis of coexpressed genes. Genome Res 1999; 9 : 1106–1115.

Jenssen TK, Laegreid A, Komorowski J, Hovig E . A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 2001; 28 : 21–28.

CAS   Google Scholar  

Rebhan M, Chalifa-Caspi V, Prilusky J, Lancet D . GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics 1998; 14 : 656–664.

Park JC, Kim HS, Kim JJ . Bidirectional incremental parsing for automatic pathway identification with combinatory categorial grammar. Pac Symp Biocomput 2001; 396–407.

Zhu Z, Pilpel Y, Church GM . Computational identification of transcription factor binding sites via a transcription-factor-centric clustering (TFCC) algorithm. J Mol Biol 2002; 318 : 71–81.

Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES . Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999; 286 : 531–237.

Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Staudt LM, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 2000; 403 : 503–511.

Weinstein JN, Myers TG, O'Connor PM, Friend SH, Fornace AJ, Kohn KW, Fojo T, Bates SE, Rubinstein LV, Anderson NL, Buolamwini JK, van Osdol WW, Monks AP, Scudiero DA, Sausville EA, Zaharevitz DW, Bunow B, Viswanadhan VN, Johnson GS, Wittes RE, Paull KD . An information-intensive approach to the molecular pharmacology of cancer. Science 1997; 275 : 343–349.

Tomita M . Whole-cell simulation: a grand challenge of the 21st century. Trends Biotechnol 2001; 19 : 205–210.

Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgarner R, Goodlett DR, Aebersold R, Hood L . Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 2001; 292 : 929–934.

Chicurel M . Databasing the brain. Nature 2000; 406 : 822–825.

Brinkley JF . Structural informatics and its applications in medicine and biology. Acad Med 1991; 66 : 589–591.

Brown FK . Chemoinformatics: what is it and how does it impact drug discovery?. Annu Rep Med Chem 1998; 33 : 375–384.

Hann M, Green R . Chemoinformatics: a new name for an old problem. Curr Opin Chem Biol 1999; 379–383.

Degoulet P, Fischi M . Introduction to clinical informatics. New York: Springer, 1997.

Book   Google Scholar  

Friede A, Blum HL, McDonald M . Public health informatics: how information-age technology can strengthen public health. Annu Rev Public Health 1995; 16 : 239–252.

Download references

Acknowledgements

This study was supported by a grant from the Korea Health 21 R&D Project, Ministry of Health & Welfare, Republic of Korea (01-PJ10-PG6-01GM01-0004).

Author information

Authors and affiliations.

Children's Hospital Informatics Program, Harvard Medical School, Boston, Massachusetts

You can also search for this author in PubMed   Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Kim, J. Bioinformatics and genomic medicine. Genet Med 4 (Suppl 6), 62–65 (2002). https://doi.org/10.1097/00125817-200211001-00013

Download citation

Received : 11 July 2002

Accepted : 30 September 2002

Issue Date : 01 November 2002

DOI : https://doi.org/10.1097/00125817-200211001-00013

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • bioinformatics
  • genomic medicine
  • functional genomics
  • DNA microarray

This article is cited by

New approaches to physiological informatics in neurocritical care.

  • Marco D. Sorani
  • J. Claude Hemphill
  • Geoffrey T. Manley

Neurocritical Care (2007)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

bioinformatics related research papers

Read our research on: Gun Policy | International Conflict | Election 2024

Regions & Countries

About 1 in 4 u.s. teachers say their school went into a gun-related lockdown in the last school year.

Twenty-five years after the mass shooting at Columbine High School in Colorado , a majority of public K-12 teachers (59%) say they are at least somewhat worried about the possibility of a shooting ever happening at their school. This includes 18% who say they’re extremely or very worried, according to a new Pew Research Center survey.

Pew Research Center conducted this analysis to better understand public K-12 teachers’ views on school shootings, how prepared they feel for a potential active shooter, and how they feel about policies that could help prevent future shootings.

To do this, we surveyed 2,531 U.S. public K-12 teachers from Oct. 17 to Nov. 14, 2023. The teachers are members of RAND’s American Teacher Panel, a nationally representative panel of public school K-12 teachers recruited through MDR Education. Survey data is weighted to state and national teacher characteristics to account for differences in sampling and response to ensure they are representative of the target population.

We also used data from our 2022 survey of U.S. parents. For that project, we surveyed 3,757 U.S. parents with at least one child younger than 18 from Sept. 20 to Oct. 2, 2022. Find more details about the survey of parents here .

Here are the questions used for this analysis , along with responses, and the survey methodology .

Another 31% of teachers say they are not too worried about a shooting occurring at their school. Only 7% of teachers say they are not at all worried.

This survey comes at a time when school shootings are at a record high (82 in 2023) and gun safety continues to be a topic in 2024 election campaigns .

A pie chart showing that a majority of teachers are at least somewhat worried about a shooting occurring at their school.

Teachers’ experiences with lockdowns

A horizontal stacked bar chart showing that about 1 in 4 teachers say their school had a gun-related lockdown last year.

About a quarter of teachers (23%) say they experienced a lockdown in the 2022-23 school year because of a gun or suspicion of a gun at their school. Some 15% say this happened once during the year, and 8% say this happened more than once.

High school teachers are most likely to report experiencing these lockdowns: 34% say their school went on at least one gun-related lockdown in the last school year. This compares with 22% of middle school teachers and 16% of elementary school teachers.

Teachers in urban schools are also more likely to say that their school had a gun-related lockdown. About a third of these teachers (31%) say this, compared with 19% of teachers in suburban schools and 20% in rural schools.

Do teachers feel their school has prepared them for an active shooter?

About four-in-ten teachers (39%) say their school has done a fair or poor job providing them with the training and resources they need to deal with a potential active shooter.

A bar chart showing that 3 in 10 teachers say their school has done an excellent or very good job preparing them for an active shooter.

A smaller share (30%) give their school an excellent or very good rating, and another 30% say their school has done a good job preparing them.

Teachers in urban schools are the least likely to say their school has done an excellent or very good job preparing them for a potential active shooter. About one-in-five (21%) say this, compared with 32% of teachers in suburban schools and 35% in rural schools.

Teachers who have police officers or armed security stationed in their school are more likely than those who don’t to say their school has done an excellent or very good job preparing them for a potential active shooter (36% vs. 22%).

Overall, 56% of teachers say they have police officers or armed security stationed at their school. Majorities in rural schools (64%) and suburban schools (56%) say this, compared with 48% in urban schools.

Only 3% of teachers say teachers and administrators at their school are allowed to carry guns in school. This is slightly more common in school districts where a majority of voters cast ballots for Donald Trump in 2020 than in school districts where a majority of voters cast ballots for Joe Biden (5% vs. 1%).

What strategies do teachers think could help prevent school shootings?

A bar chart showing that 69% of teachers say better mental health treatment would be highly effective in preventing school shootings.

The survey also asked teachers how effective some measures would be at preventing school shootings.

Most teachers (69%) say improving mental health screening and treatment for children and adults would be extremely or very effective.

About half (49%) say having police officers or armed security in schools would be highly effective, while 33% say the same about metal detectors in schools.

Just 13% say allowing teachers and school administrators to carry guns in schools would be extremely or very effective at preventing school shootings. Seven-in-ten teachers say this would be not too or not at all effective.

How teachers’ views differ by party

A dot plot showing that teachers’ views of strategies to prevent school shootings differ by political party.

Republican and Republican-leaning teachers are more likely than Democratic and Democratic-leaning teachers to say each of the following would be highly effective:

  • Having police officers or armed security in schools (69% vs. 37%)
  • Having metal detectors in schools (43% vs. 27%)
  • Allowing teachers and school administrators to carry guns in schools (28% vs. 3%)

And while majorities in both parties say improving mental health screening and treatment would be highly effective at preventing school shootings, Democratic teachers are more likely than Republican teachers to say this (73% vs. 66%).

Parents’ views on school shootings and prevention strategies

In fall 2022, we asked parents a similar set of questions about school shootings.

Roughly a third of parents with K-12 students (32%) said they were extremely or very worried about a shooting ever happening at their child’s school. An additional 37% said they were somewhat worried.

As is the case among teachers, improving mental health screening and treatment was the only strategy most parents (63%) said would be extremely or very effective at preventing school shootings. And allowing teachers and school administrators to carry guns in schools was seen as the least effective – in fact, half of parents said this would be not too or not at all effective. This question was asked of all parents with a child younger than 18, regardless of whether they have a child in K-12 schools.

Like teachers, parents’ views on strategies for preventing school shootings differed by party. 

Note: Here are the questions used for this analysis , along with responses, and the survey methodology .

bioinformatics related research papers

Sign up for our weekly newsletter

Fresh data delivered Saturday mornings

‘Back to school’ means anytime from late July to after Labor Day, depending on where in the U.S. you live

Among many u.s. children, reading for fun has become less common, federal data shows, most european students learn english in school, for u.s. teens today, summer means more schooling and less leisure time than in the past, about one-in-six u.s. teachers work second jobs – and not just in the summer, most popular.

About Pew Research Center Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of The Pew Charitable Trusts .

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Curr Genomics
  • v.17(4); 2016 Aug

Bioinformatics Approach in Plant Genomic Research

a Plant Abiotic Stress Research Group, Ton Duc Thang University, Ho Chi Minh City, Vietnam;

b Faculty of Applied Sciences, Ton Duc Thang University, Ho Chi Minh City, Vietnam;

Phuc Nguyen

c School of Biotechnology, International University, Vietnam National University, Ho Chi Minh City, Vietnam

Nguyen Phuong Thao

The advance in genomics technology leads to the dramatic change in plant biology research. Plant biologists now easily access to enormous genomic data to deeply study plant high-density genetic variation at molecular level. Therefore, fully understanding and well manipulating bioinformatics tools to manage and analyze these data are essential in current plant genome research. Many plant genome databases have been established and continued expanding recently. Meanwhile, analytical methods based on bioinformatics are also well developed in many aspects of plant genomic research including comparative genomic analysis, phylogenomics and evolutionary analysis, and genome-wide association study. However, constantly upgrading in computational infrastructures, such as high capacity data storage and high performing analysis software, is the real challenge for plant genome research. This review paper focuses on challenges and opportunities which knowledge and skills in bioinformatics can bring to plant scientists in present plant genomics era as well as future aspects in critical need for effective tools to facilitate the translation of knowledge from new sequencing data to enhancement of plant productivity.

1. INTRODUCTION

The Plant kingdom is very important not only for human but also for other living organisms. One of the crucial role of plants is to provide a huge amount of food [ 1 ]. Plants are also used in making many human medicines [ 2 ] and have been selected as model organisms to study transposable elements in heterochromatin and epigenetic control [ 3 ]. Study of plant biology has, therefore, been conducted broadly since the early stage of human life because of its vital role.

Modern technologies have pushed the study of plant biology to a higher level than before [ 4 ]. The innovation of high-throughput sequencing methods gives scientists the ability to exploit the structure of the genetic material at the molecular level which is known as “genomics”. Plant genomics study has exploded recently and becomes the main theme in plant research due to the rapid increase of sequenced genomes of many plant species [ 5 ]. It is easy to see the huge impact of plant genome research on the improvement of economically important plants and the knowledge of plant biology [ 6 ]. Open-access and constant updates to this plant genomic information create a fertile environment for plant research to grow. This requires strong connection and cooperation among global biological community [ 7 ].

In this paper, we review firstly the development of genomic sequencing technologies and their applications in plant genomic research. Then, we introduce recent approaches of bioinformatics in managing and analyzing plant genomic databases. Particularly, we summarize most popular plant genomic resources. In addition, we also provide fundamental knowledge of key methods for integration and analysis of these genomic data such as comparative genomic analysis, phylogenomics, evolutionary analysis and genome-wide association study (GWAS) in plant.

2. Next generation sequencing technology in plant genomic research

The development of DNA sequencing technology has been a great and memorial journey filled with many historical events. In the last decade, nearly all of DNA sequence production has restrictively been executed with capillary-based, semi-automated applications of the Sanger biochemistry and its variations [ 8 - 10 ]. Over the years, the field of DNA sequencing has been revived and prospered due to various scientific breakthroughs. These technological advancements eventually lead to the encouragement for developing novel experimental designs for this field due to various reasons [ 11 ]. Ultimately, next-generation sequencing (NGS) technologies were released in 2005 [ 12 ]. They are known as “high throughput sequencing technologies that parallelize the sequencing process, producing millions of sequences at once at a much lower per-base cost than conventional Sanger sequencing” [ 13 ].

Based on NGS technologies, big companies like Roche, Illumina, Applied Biosystems and so forth have recently developed many autonomous and ultrahigh-throughput platforms. All of them are all well-fitted for the current and even future large sequence needs. Generally, Sanger’s dideoxy chain termination sequencing technology is no longer utilized in these NGS platforms. Instead, more advanced methods are applied such as pyrosequencing, sequencing-by-synthesis, sequencing-by-ligation, ion semiconductor-based non-optical sequencing, single molecule sequencing and nanopore sequencing [ 14 ].

Sequencing-by-synthesis platform utilizes DNA polymerase to extend many DNA strands in parallel [ 15 ]. This method uses modified deoxynucleoside triphosphates (dNTPs) containing a terminator which prevents further polymerization, thus, only one single base can be added by DNA polymerase to each growing DNA copy strand. Therefore, the newly incorporated nucleotide or oligonucleotide can be determined as extension proceeds. The pyrosequencing platform is based on the principle of sequencing-by-synthesis (SBS) [ 16 ]. It relies on the detection of pyrophosphate released on nucleotide incorporation by DNA polymerase to facilitate a following series of enzymatic reactions that finally produces light signal from the cleavage of oxyluciferin by luciferase. Sequencing-by-ligation platform uses DNA ligase to create sequential ligation of dye-labeled oligonucleotides. This process enables massively parallel sequencing of clonally amplified DNA fragments [ 17 ]. The discrepancy sensitivity of these clonally amplified DNA fragments is then used to determine the hidden sequence of the target DNA molecule. Ion semiconductor-based non-optical sequencing platform detects the hydrogen ions which are released during DNA polymerization. Single molecule sequencing is based on “the successive enzymatic degradation of fluorescently labeled single DNA molecules, and the detection and identification of the released monomer molecules according to their sequential order in a micro-structured channel” [ 18 ]. Single molecule sequencer does not require any amplification of DNA fragments prior to sequencing [ 19 ]. Nanopore sequencing identifies individual nucleotide sequences as the DNA strand is passed through a membrane-inserted protein nanopore, one base at a time, by alterations in the ion current [ 20 ].

Some examples for well-known NGS platforms commercially available are Genome Sequencer from Roche/454 (Pyrosequencing); Genome Analyzer from Illumina/Solexa (Sequencing-by-synthesis); SOLiD from Applied Biosystems (Sequencing-by-ligation) and Polonator from Dover SystemsP (Sequencing-by-ligation), Ion Torrent from Life Science, Inc. (Ion semiconductor-based non-optical sequencing); Heliscope sequencers from Helicos Bioscience Corporation (True single molecule sequencing); PacBio RS sequencers from Pacific Biosciences (Single molecule, real-time sequencing); GridION and miniaturized MinION sequencers from Oxford Nanopore Technologies (Nanopore sequencing) [ 4 , 14 ].

The main differences among these systems are the length of a sequence read, the unique error model that they applied and the operation cost [ 21 - 24 ]. These differences may affect how the reads are utilized in bioinformatics analyzes, depending upon the application [ 19 ]. However, most of the results finally showed that the data produced are similar among these methods [ 21 - 24 ]. Therefore, it mainly depends on the ultimate goal of a particular research that one may choose the appropriate sequencing methods.

With its rapid innovation, NGSs have been well applied to many aspects in plant genomic research, such as exome sequencing and studying genetic transmission of alleles/quantitative trait loci (QTLs) through whole genome sequencing [ 14 ]. Exome sequencing can effectively help in exploring biodiversity, studying host–pathogen interactions, investigating the natural evolution of crops, testing for the inheritance of genetic markers, providing large-scale genetic resources for the crop improvement, identifying the genes and establishing the presence of functional gene sets that are involved in symbiotic or other co-existential systems [ 14 ]. In addition, NGS methods with single-base resolution can provide epigenomic information. For instance, a study in A. thaliana epigenome revealed that the location and abundance of small RNA targets were significantly related to cytosine methylation [ 25 ]. Another application of plant genome sequencing is genotyping by sequencing (GBS), which is emerging as high through-put and inexpensive method for optimizing genotype populations. GBS has many approaches for enhancing genomic map construction, especially single nucleotide polymorphisms (SNPs) identification [ 26 ]. 681,257 SNP markers of 2,815 maize inbred accessions were found to be positively associated with trait related genes by performing GBS [ 27 ].

The successful application of NGSs in plant genomic research is undoubtable. However, there are challenges in developing computational tools for analyzing genome sequences. Galaxy ( http://galaxyproject.org ) is one of the software systems in which researchers can easily use analysis tools through web-based interfaces comprised of enormous free-accessed biological data [ 28 ]. Another software is Artemis, which is freely available from Sanger institute ( http://www.sanger.ac.uk/ ). It provides genome browser and annotation tool [ 29 ]. There are several other genome sequence analysis tools given by The Broad's Genome Sequencing and Analysis Program (GSAP) ( http://www.broadinstitute.org/ ). Additionally, the rapid decrease in cost of genome sequencing leads to the urgent requirement of a development of huge database storage and management. In fact, there are more and more plant genomic databases have been generated to confront with that demand.

3. PLANT GENOMIC RESOURCES

The history of plant genomics has been changed dramatically by the creation of expressed sequence tag (EST) sequencing, a high-throughput gene discovery method [ 30 ], and the release of the complete Arabidopsis thaliana genomic sequence in 2000 [ 31 ]. Following that success, the complete genomic sequence of rice became available only 2 years later [ 32 ]. These events have created powerful waves on both plant biotechnology and crop bioinformatics. For the advancement of learning, more sequencing projects on vital plant species have been carried out by combining novel in silico technologies from genomic research with traditional breeding schemes for further enhancing the quality of crops.

With the advent of NGS technology in 2005 [ 33 ], the number of plant genomes sequenced have dramatically increased to more than 100 species in 2014 according to CoGepedia, a platform that aims to record all plant genomes with published or in-processed sequences [ 12 ]. Throughout the years, these genomes have contributed many valuable materials for plant research in modern molecular genomics era. Based on that foundations, genetical/biological activities of many critical genes and pathways have been revealed [ 34 ]. For instance, plant species such as Arabidopsis [ 31 ], Brachypodium distachyon (grass) [ 35 ], Physcomitrella patens (moss) [ 36 ] and Setaria italic (millet) [ 37 , 38 ] can be used as scientific model for genomic studies in drought tolerance [ 39 ]. Others like Oryza sativa (rice) [ 40 , 41 ], Populus trichocarpa (poplar) [ 42 ], Zea mays (maize) [ 43 ], Glycine max (soybean) [ 44 ], Solanum lycopersicum (tomato) [ 45 ], and Pinus taeda (loblolly pine) [ 46 ] can serve as both crops and functional models [ 34 ].

Non-model and non-crop plant genomes can also tell a story about genome construction and flowering plant evolution [ 34 ]. For examples, Utricularia gibba (bladderwort) and Genlisea aurea (corkscrew) genomes can provide significant understanding about genome size variation [ 47 , 48 ]. Furthermore, Spirodela polyrhiza (greater duckweed) genome which share the similarity in size with that of Arabidopsis but only needs 28% fewer genes to function normally [ 49 ]. In another case, the genomes of Selaginella moellendorffii (spikemoss) and Amborella trichopoda present the bridge between the evolution of vascular plants and angiosperms respectively, revealing fundamental understandings about the trajectory of plant specific gene families and the radiance of flowering plants, thus, shedding more light in the evolution of flowering plant [ 34 ].

The gene knowledge drawn from genomics can be utilized to recognize, classify, exploit and tag individual alleles as well as to promote and manipulate molecular markers to track the desired alleles in breeding programs [ 50 ]. For those reasons, many genome sequencing projects in the field of horticultural crops were carried out such as Tomato genome sequencing project ( www.sgn.cornell.edu/about/tomato ) [ 45 ], Potato genome sequencing consortium, (www.potato genome.net) [ 51 ], Papaya genome sequencing project ( www.asgpb.mhpcc.hawaii.edu/papaya/ ) [ 52 ], Grape genome sequencing project ( www.vitaceae.org ) [ 53 ], Floral genome sequencing project ( www.fgp.bio.psu.edu/ ) [ 54 ] and hopefully many more will be available in public domain for scientific usages in near future. Combining with traditional methods, these projects were armed with advanced sequencing technologies, to fully certify generation of high-quality sequences and budget-efficient design [ 55 ]. Therefore, these whole-genome sequencing projects may have great significant impact in global food insurance and bio-energy advancement by providing invaluable resources for comparative and functional genomic studies [ 55 ]. If current research keeps moving forward, noticeable impact on global human well-being may be seen through applications of genomic science resources to horticulture plant species.

The availability of complete genome sequences, as well as the explosion of sequence data, is leading to an urgent need for well-catalogued and annotated DNA sequence databases. The largest and most well-known of these sequence databases are GenBank, EMBL and DNA Data Bank of Japan [ 32 ]. These databases are acknowledged as the standard figure for public annotated DNA sequence collection worldwide and contain millions of plant DNA sequences. Take NCBI as an example, up to 2015, NCBI Genome database have been increased to a total of 5,132,285 plant accession entries according to RefSeq Growth Statistics ( http://www.ncbi.nlm.nih.gov/refseq/statistics/ ). Back to 2004, there were only 88,972 entries, thus, the growth rate is approximately 458,483 entries per year over ten years, which means more than 38,000 sequences are updated monthly.

There are other public databases which may provide extra information on plant genome such as Phytozome [ 56 ], PlantGDB [ 57 ], EnsembPlants, ChloroplastDB [ 58 ], KEGG [ 59 ], Genomes On-Line Database (GOLD) [ 12 ] and the wiki of CoGepedia web page (Table ​ 1 1 ). Recently, in addition to these general sequence data banks, other databases that focus on specific plant species have been available. Some examples for species-specific sequence databases are The Arabidopsis Initiative Resource (TAIR) [ 60 ], The Salk Institute Genomics Analysis Laboratory (SIGnAL), The RIKEN Arabidopsi s Genome Encyclopedia (RARGE) [ 61 ], The Rice Genome Annotation Project (RGAP) [ 62 ],The Rice Annotation Project (RAP-DB) [ 63 ], The Solanaceae (SOL) Genomics Network (SGN) [ 64 ], Gramene [ 65 ], GrainGenes [ 66 ], SoyBase [ 67 ], MaizeGDB [ 68 ], CyanoBase [ 69 ], the Genome Database for Rosaceae (GDR) [ 70 ], Brassica Genome Gateway and Cucurbit Genomics Database (Table 1 ) [ 71 ]. Commonly, these databases and associated web portals incorporate a set of analytical, visualization and interrogation tools to study the genomic sequences they process such as BLAST for identifying sequence similarity in large datasets.

List of plant genomic databases.

4. Plant cOMPARATIVE GENOMIC ANALYSIS

Once whole genomes have been sequenced, defining and describing the gene and non-coding content in these sequences is an important process [ 72 ]. For that reason, plant comparative genomic analysis has arisen as a new field of modern biotechnology since its main function is to predict functions for many unknown genes by studying the significant differences and similarities among species. These genes, however, are required to appear in the available datasets of orthologs evolved from the same ancestor [ 73 ]. As can be seen, developing new tools, strategies to manage and analyze these tremendous data has been urgently needed. Recent approaches in bioinformatics and systematic biology have reached those demands but still faced further challenges.

4.1. Tools and Databases for Plant Comparative Genomic Analysis

Using comparative genomic approach, more and more genes in plant species have been annotated. For instance, several known stress-responsive transcription factors (TFs) in Arabidopsis and rice were used to correctly predict stress-responsive TFs in many other plant species, such as soybean, maize, sorghum, barley, and wheat [ 74 - 76 ]. Moreover, not only comparing within plant species, comparative genomics between plants and distantly related prokaryotes can be greatly presumed the genes functionally associated. The function of NiaP protein family in plants was determined from knowing the role of those proteins in bacteria [ 77 ]. Similar strategies to identify functional genes among different plants using comparative analysis also help researchers study genes annotation in newly sequenced plant species [ 78 ].

In addition, comparative genomics can discover missing biosynthetic genes by co-expression analysis [ 79 ]. This

method performs by considering an unknown gene that is co-expressed with various genes from a metabolic pathway which is expected to have a function in that particular pathway [ 80 , 81 ]. GolmTranscriptome DB [ 82 ] and ATTED-II [ 83 ] are two popular tools for such type of analysis in plants. One case for this analysis is the discovery of trans-prenyldiphosphate synthase responsible for making the solanesyl moiety of ubiquinone-9. Arabidopsis gene At2g34630 was identified as an alternative candidate using the co-expression and under-expression analysis in Arabidopsis and by functional complementation in yeast [ 84 ].

Besides tools and strategies for analysis, powerful computational resources are essential to store and manage massive genomic data. Many online platforms have been developed, published and available to perform comparative genomic study among different plant species. For instance, several plant genomic data platforms described below have been the most representative and widely used recently.

Phytozome. One of the largest comparative databases for plant species ( http://phytozome.jgi.doe.gov/pz/portal.html ). It contains plant genome, gene family data, and evolutionary history information. From the beginning, only 25 plant genomes were sequenced and annotated. This number has increased up to more than 50 species at the current state. Phytozome also provides impressive tools for comparative analysis in level of sequence, gene structure, gene family, and genome organization. With those tools and comprehensive web portal, Phytozome makes it accessible for scientist worldwide conducting plant research intensively [ 56 ].

PLAZA . Being known as the most comprehensible plant comparative genomics online platform, PLAZA integrates functional and structure annotation of all currently published crop plant genomes ( http://plaza.psb.ugent.be/ ). Together with that huge set of data, PLAZA provides many interactive tools to study gene, genome evolution, and gene function. Those tools include pre-computed datasets cover, intraspecies dot plots, whole-genome multiple sequence alignments, homologous gene families, phylogenetic trees, and genomic colinearity between species [ 85 ].

GreenPhylDB . A web resource belongs to South Green Bioinformatics Platform ( http://southgreen.cirad.fr/ ) and is open to public access. GreenPhylDB is designed for comparative and functional genomics in plants. This database contains 37 full genomes of members of the Plant kingdom at the current release version 4. Catalogue of gene families from GreenPhylDB is provided by gene predictions of genomes, covering a broad taxonomy of green plants. Its web interfaces have been continually developed to improve the navigation through information related to each gene or gene family, such as gene composition, protein domains, publications, orthologous gene predictions, and also external links. The latest version of this database is now possible to browse the full Gene Oncology, which supports gene discovery [ 86 ].

PlantsDB. This is one of the most commonly used plant database resources for integrative and comparative plant genome research ( http://mips.helmholtz-muenchen.de/plant/genomes.jsp ). PlantsDB comprises database instances for tomato, Medicago , Arabidopsis , Brachypodium , Sorghum , maize, rice, barley and wheat. This platform stores and provides individual plant genomes. Moreover, it is also equipped with up-to-date bioinformatics tools to visualize synteny, transfer data from model systems to crops and explore similarities and peculiarities of different plant species. Further important analysis strategies developed from PlantsDB are repeat catalogs and classification systems for all plant species [ 87 ].

4.2. Remaining Challenges

The enormous amount of genomic data for plants rapidly increases. Thousands of Gb of plant sequences are deposited in NCBI and other public databases monthly. However, reference genome sequence with basic annotation provided by current comparative genomic databases is simply a foundation. It still needs to be integrated with specific biological data such as plant epigenetic decorations and gene expression under vary conditions of environment, development stages and tissue types in order to get better detailed genome maps [ 34 ].

Moreover, since plant genomes have been constantly sequenced and re-sequenced, there is rising problem in updating databases. The update process should occur in all comparative genomic databases, not just solely in that individual genome database. This technical problem requires efforts to synchronize update data resources among different plant genomic platforms. Developing a strong community network of plant researchers might be one solution for this issue [ 88 ].

Several databases have been developed, published and available to compare plant genomes and tentatively identify orthologs (Table 1 ). Having powerful application in gene prediction, comparative genomics recently has played an important role in contributing the functional annotation infrastructure on which future plant biotechnology researchers rely on.

5. PHYLOGENOMICS AND EVOLUTIONARY ANA- LYSIS IN PLANT

Phylogenomics is known as molecular phylogenetic analysis, in which using sets of genomic database for gene function prediction and exploration of the evolutionary relationships among species. This definition of phylogenomics was formed from the early studies in the late 1990s when a scientific hypothesis about protein function via evolutionary analysis of a gene and its homologs was published [ 89 ]. Phylogenomics was also defined as the new era of phylogenetic analysis when there are more complete genomes sequenced [ 90 ]. Plant phylogenomics has an advantage over other species, which is the ability to identify hundreds of low copy number nuclear genes, hence easily to study the molecular systematic and evolutionary biology [ 91 ]. Current approaches of NGS also provide plant phylogenomics research useful information about plant genome diversity, such as the nature and frequency of genome duplication among a diversity of plant lineages [ 92 - 94 ].

There are two important goals in phylogenomic research aims to accomplish. First is to discover the evolutionary patterns among plant species using nuclear genomic information. Second is to derive new hypothesis for the unknown function of plant genes associated to major divergence events in the evolution of plant species [ 95 ]. Genomic data give more advantages in the evolutionary study than morphological data which are easily misleading or fossil data which are usually fragmented. Phylogenomics also uses a set of orthologs from genomic sequence via a phylogenetic context to detect hypotheses for the genes and biological processes [ 96 ]. The main difference between functional phylogenomics compare to classical phylogenetic analysis methods and current functional genomic methods is that in phylogenomics research, genomic information is mined without incorporating a phylogenetic context during the search for orthologs or candidate genes of functional importance [ 97 ]. However, it remains a debating issue in constructing the tree of life (phylogeny of all organisms), which inferred evolutionary relationship using phylogenomics as the advance method. Some studies continuously revalidated the positions of certain plant species in biological taxonomy [ 98 - 100 ] to get the most accurate tree as possible. Therefore, how to draw a scientifically significant topology is still problematic due to some limitations, such as the confliction among methodologies and character sets [ 101 ] and systematic errors from merely adding more sequences [ 102 ].

As shown above, the main problem of phylogenomics comes from how to handle the large scale of genomic data in a proper way to avoid systematic misleading (bias) assumptions. Statistical confidence ( P value ) which is normally used in such phylogenetic issue manner, however, was reported as unreliable. The authors then suggested that the magnitudes of differences (effect sizes) and biological relevance are those should be more focus on to get trustworthy results [ 103 ]. Another solution is the improvement of existing phylogenetic algorithms so that phylogenomic relationships can be inferred with minimal technical biases and greater computer efficiency [ 104 ].

New methods and tools have been developed to gradually overcome these limitations of plant phylogenomics. For instance, de novo assembly of short read RNA-seq data dramatically improves gene coverage by non-redundant and non-chimeric transcripts that are optimized for downstream phylogenomic analysis [ 105 ]. Another protocol is called Hyb-Seq, which combines target enrichment of low-copy nuclear exons and flanking regions, as well as genome skimming of high-copy repeats and organelle genomes, to efficiently produce genome-scale data sets for plant phylogenomics [ 106 ]. More recently, ExaML (Exascale Maximum Likelihood), which is usually known as new code for large-scale phylogenetic analyzes on Intel MIC (Many Integrated Core) hardware platform, has been updated its version 3. This coding program represents the achievement of developing better phylogenetic analysis algorithms, it is now possible to analyze datasets with 10-20 genes and up to 55,000 taxa [ 107 ]. However, even though it is just released few months ago, ExaML still has its limit since it can only run on supercomputer with Linux/Mac system. Obviously, new plant phylogenomic tools similar to ExaML is desperately needed with high quality performance and easy to operate in any computational system in the future.

6. GENOME-WIDE ASSOCIATION STUDIES IN PLANT

Basic knowledge of phenotypic variation, such as those agronomically important traits used for plant breeding resources has been the main trend of plant genetic studies. In classical crop breeding, biparental cross-mapping is still a major method for genetics dissections of the traits although its limitation is giving the QTLs mapping with low resolution (typically with several megabases in distance) [ 108 ]. To overcome that disadvantage, GWAS is currently a favorable tool to explore the allelic variation in a broader scope for extensive phenotypic diversity and higher resolution of QTL mapping thanks to the advent of NGS. Using GWAS, many research projects have been done to investigate the association between genetic variation and valuable plant traits. GWAS has been successfully applied to study Arabidopsis thaliana , a typical model plant organism, in which more than 1,300 distinct accessions have been genotyped for 250,000 SNPs [ 109 ] and 107 phenotypes have been studies [ 110 ]. Following this initial foundation, there were numerous achievements in conducting GWAS on other traits of interests in Arabidopsis , such as glucosinolate levels [ 111 ], shade avoidance [ 112 ], heavy metal [ 113 ], salt tolerance [ 114 ] and flowering time [ 115 ], etc. Beside Arabidopsis , rice, one of the most important crop species in the world, also has been focus of intense efforts to map the ancestral genetic variation that underlines agronomic traits such as heading date, grain size, and starch quality [ 116 ]. A few rice genes having large effects in controlling traits are involved in determining yield, morphology, stress tolerance, and nutritional quality were also identified [ 117 ]. GWAS has been widely used to dissect complex traits in some other major crops, e.g., maize and soybean [ 118 - 122 ].

It is undeniable that GWAS has the powerful application to plant species for identifying phenotypic diversity in trait-associated loci, as well as allelic variation in candidate genes addressing quantitative and complex traits [ 123 , 124 ]. However, to accelerate genetic mapping and gene discovery in plant using GWAS, besides massive DNA variation data from NGS, it requires having a high-through put phenotyping facility that is capable to capture in details specific traits to enhance GWAS results and gain more significant gene identification information [ 125 ]. It is a challenging and promising road for future plant genomic mapping research. Hence, there are efforts on making high quality phenotyping data [ 126 - 129 ]. Furthermore, having computational tools to assist GWAS is also concerning issue. There are three main factors required for a GWAS tool to well perform including computing speed, memory requirements, and statistical power [ 130 ]. At the current stage, several bioinformatics approaches have been introduced as GWAS acceleration tools. Following are some examples:

Heap. Heap is a SNPs detection tool for NGS data with special reference to GWAS and genomic. Heap detects larger number of variants taking advantage of the information whether the samples are inbred (homozygosity assumption) or not. For data portability to GWAS/GP, Heap outputs variant information in vcf, beagle and PED/MAP format files that are compatible with existing GWAS/GP tools [ 131 ].

GnpIS-Asso . GnpIS-Asso is a generic database for managing and exploiting plant genetic association studies. This database provides tools that allow plant scientists or breeders to get associations values between traits and markers obtained in several association studies. It is also easy to view graphically the results with dedicated plots (QQPlot, Manhattan Plot), generated dynamically and to extract data in files to continue the analysis with external tools. After selecting the best markers associated to trait of interest, one specific tool automatically jumps on the genome to find where those markers are located on chromosomes and to identify which genes or other markers or features of interest are nearby. This database is already currently used for dealing GWAS for two species: tomato and maize [ 132 ].

BioGPU . As a high performance computing tool for GWAS, BioGPU effectively controls false positives caused by population structure and unequal relatedness among individuals and improves statistical power when compared to mixed linear model methods. The BioGPU method requires much less complex computing time. BioGPU was developed with parallel computational capacity to increase computing speed, so that computing time decreases linearly with the number of central processing units. To solve the memory footprint bottleneck, BioGPU allows users to directly control memory usage when big data are analyzed on computers with limited memory, which means users have the option to trade computing time for less memory usage. Based on these features, BioGPU makes analyzes of large and complex datasets feasible without supercomputers [ 130 ].

BHIT . Bayesian high-order interaction toolkit (BHIT) first builds a Bayesian model on both continuous data and discrete data, which is capable of detecting high-order interactions in SNPs related to case-control or quantitative phenotypes. Using both simulation data and soybean nutritional seed composition studies on oil content and protein content, BHIT effectively detects the high-order interactions associated with phenotypes, and it outperformed a number of other currently available tools. BHIT are also used on Soybean 50K SNP array analysis by diversity computational strategies. Then a series of SNP interactions in multiple-orders are detected associated with oil and protein phenotypes. BHIT is freely available at http://digbio.missouri.edu/BHIT/ for academic users [ 133 ].

While it was time-consuming in the past to perform QTL analysis a small data, recent bioinformatics approach helps running GWAS with a simple marker scan of few hundred thousand SNPs on PC or web-based software within few minutes [ 123 ]. However, future GWAS assisted tools still need to be improved in speed and increased memory capacity in order to integrate with rapidly growing plant genomic data. Moreover, to ensure the accuracy of GWAS results, statistical test is very important factor and must be applied intensively, in which mixed models are set as the error-making factor of genetic background [ 134 , 135 ]. One example for this is a GWAS online tool is the one for Arabidopsis , which was developed based on R and Python programming languages [ 136 ]. This web-based server comprises of common accessions with their genotyping information and several statistical options as well as integrates correlation analysis among published traits [ 136 ].

In combination with high resolution phenotyping technologies, performing GWAS is a novel strategy for conducting research on plant genetics, genomics, gene characterization and breeding [ 137 ]. Nevertheless, GWAS analysis still has another limitation, which is failure in detecting epistatic and gene-environment interactions in most studies [ 138 ]. Due to the fact that living organisms express their phenotypes as the result of not only one but several factors including epistatic effects and their interactions with environment; hence it is important to estimate those gene-gene and gene-environment interactions for better breeding system [ 139 , 140 ]. Focusing on one main SNP that correlates with a specific phenotype as normal GWAS output may miss the key genetic variants with particular environment response in the context of complex traits [ 141 ]. For this issue, bioinformatics approach is again a current solution. Generalize multifactor dimensionality reduction (GMDR) algorithm on a computing system with graphics processing units (GPUs) is one in some available methods at the moment that can screen potential candidate variants and then use the mixed liner model to detect the epistatic and gene-environment interactions [ 142 ]. This new GWAS strategy was applied and showed its success in identifying four significant SNPs associated with additive, epistatic, and gene-environment interaction effects in rice [ 138 ]. Similar GWAS method using epistatic association mapping (EAM) also successfully detected three epistatic QTLs in soybean [ 143 ]. Those presented methods are just the groundwork, future bioinformatics tools have to be more powerful in statistical methodology and overcome the heavy burden of current computation [ 144 - 146 ].

7. BIOINFORMATIC ADVANCES BEYOND PLANT GENOMIC RESEARCH

The world is now at the post genomic era since DNA sequencing technology continues reaching unprecedented innovations in sequencing scale and throughput. In particular, the term “genomics” by itself is only just a small part in the whole picture named “Omics”. With the development of modern technology, several new omics layers have been emerged to deepen the knowledge of plant molecular system [ 147 ]. The most recent added omics layers include interactomics, epigenomics, hormonomics, and metabolomics. While NGS provides feature for whole-genome sequencing/re-sequencing for various genomic analysis, such as those are discussed across this paper, RNA sequencing (RNA-seq) is established for transcriptome and non-coding RNAome analysis, quantitative detection of epigenomic dynamics, and Chip-seq analysis for DNA–protein interactions [ 148 ]. In addition, approaches in transcriptional regulatory networks research based on omics data have been published such as interactome analysis for networks formed by protein–protein interactions [ 149 ], hormonome analysis for phytohormone-mediated cellular signaling [ 150 ], and metabolome analysis for metabolic systems [ 151 ]. Apparently, these rapidly growing omics databases widen the large-scale of genomic resources. Therefore, bioinformatics has become more essential than ever for every aspect of omic-based research to be well managed and effectively analyzed.

8. CONCLUSIONS

Recent advances in bioinformatics application for plant genomes not only provide huge potential for large-scale genomic research among plant species but also many technical challenges. NGS technologies and platforms will make plant genetic data become abundant in the next few years. With these accessible genomic data, development of effective tools for these data management and analysis become increasingly important. Indeed, there are more and more genome databases of plant species continuously established merging with different analysis methods. Comparative genomic analysis gives a specific insight of functional genes within the same and among plant species. Phylogenomic results show more accurate evidences for evolution studies and hypothesized function of genes in plant. GWAS, which has been currently used in plant research, successfully point out loci and allelic variation related to valuable traits. On the contrary, one of the main challenges facing plant genomic researchers is the high demand of knowledge and skills in bioinformatics as well as computer sciences in order to well manage and intensively manipulate the results from the increasing of large-scale plant genomic data. Moreover, since high density genotype information rapidly exploited, high-throughput phenotyping is urgently needed to provide plant genomic analysis results at high resolution.

In brief, the recent wealth of plant genomic resources, along with advances in bioinformatics, have enabled plant researchers to achieve fundamental and systematic understanding of economically important plants and plant processes, critical for advancing crop improvement. Despite these exciting achievements, there remains a critical need for effective tools and methodologies to advance plant biotechnology, to tackle questions that are hardly solved using current approaches, and to facilitate the translation of this newly discovered knowledge to improve plant productivity.

ACKNOWLEDGEMENTS

“This work was supported by Vietnam National University, HCM under grant number C2014-28-07 to N.P.T. This work was also funded by the Asian Office of Aerospace Research & Development of The United Statesunder grant number FA2386-15-1-419 to L.L.”

CONFLICT OF INTEREST

The author(s) confirm that this article content has no conflict of interest.

IMAGES

  1. (PDF) Role of Bioinformatics in Clinical Trials: An overview

    bioinformatics related research papers

  2. (PDF) Journal of Proteomics & Bioinformatics -Open Access Performance

    bioinformatics related research papers

  3. (PDF) Data mining in bioinformatics: Selected papers from BIOKDD

    bioinformatics related research papers

  4. Bioinformatics Final Research Paper

    bioinformatics related research papers

  5. (PDF) BIOINFORMATICS ADVANCES IN GENOMICS

    bioinformatics related research papers

  6. (PDF) Emerging role of bioinformatics tools and software in evolution

    bioinformatics related research papers

VIDEO

  1. Bioinformatics Introduction for Synthetic Biology

  2. Bioinformatics : Research and Applications Presentation for Bioinfotech Club at Prathyusha Engg. Col

  3. Bioinformatics Part-1| Open Elective| Drug Design| Biotechnology| St. Joseph's University| Bengaluru

  4. Use of Bioinformatics Related Tools for Plant Breeding Research

  5. What are Genome information resources in bioinformatics

  6. Computer Aided Drug Design

COMMENTS

  1. Bioinformatics

    Bioinformatics is a field of study that uses computation to extract knowledge from biological data. It includes the collection, storage, retrieval, manipulation and modelling of data for analysis ...

  2. Articles

    BMC Bioinformatics: Open access journal publishing sound research in bioinformatics, with 3.0 Impact Factor and 19 days to first decision. ... High-dimensional omics data are increasingly utilized in clinical and public health research for disease risk prediction. Many previous sparse methods have been proposed that using prior knowledge, e.g ...

  3. Bioinformatics and Biology Insights: Sage Journals

    Bioinformatics and Biology Insights is an open access, peer-reviewed journal that considers articles on bioinformatics methods and their applications, which must pertain to biological insights. All papers should be easily amenable to biologists and, as such, help bridge the gap between theories and applications. View full journal description

  4. Current trend and development in bioinformatics research

    These articles reflect current trend and development in bioinformatics research. The supplement to BMC Bioinformatics was proposed to launch during the BIOCOMP'19—The 2019 International Conference on Bioinformatics and Computational Biology held from July 29 to August 01, 2019 in Las Vegas, Nevada. In this congress, a variety of research ...

  5. Bioinformatics

    Bioinformatics is an official journal of the International Society for Computational Biology, the leading professional society for computational biology and bioinformatics. Members of the society receive a 15% discount on article processing charges when publishing Open Access in the journal. Read papers from the ISCB. Find out more.

  6. Current trend and development in bioinformatics research

    This is an editorial report of the supplements to BMC Bioinformatics that includes 6 papers selected from the BIOCOMP'19—The 2019 International Conference on Bioinformatics and Computational Biology. These articles reflect current trend and development in bioinformatics research. Keywords: Bioinformatics, Biomarkers, Human disease, Microbiome.

  7. A global perspective on evolving bioinformatics and data science

    The 13 computational research and training needs investigated in the NSF survey. Computational research and training needs of NSF DBS investigators. Research needs. Training and support needs. Publish data to the community. Data management and metadata. Sufficient data storage. Bioinformatics and data analysis. Share data with colleagues.

  8. Frontiers in Bioinformatics

    An innovative journal that provides a forum for new discoveries in bioinformatics. It focuses on how new tools and applications can bring insights to specific biological problems. ... Submit your research. Start your submission and get more impact for your research by publishing with us. Author guidelines.

  9. Essential interpretations of bioinformatics in COVID-19 pandemic

    Abstract. The currently emerging pathogen SARS-CoV-2 has produced the global pandemic crisis by causing COVID-19. The unique and novel genetic makeup of SARS-CoV-2 has created hurdles in biological research, due to which the potential drug/vaccine candidates have not yet been discovered by the scientific community.

  10. Frontiers in Bioinformatics

    High Performance Computing, Big Data Analytics and Integration for Multi-Omics Biomedical Data. An innovative journal that provides a forum for new discoveries in bioinformatics. It focuses on how new tools and applications can bring insights to specific biological problems.

  11. Biomedinformatics : A New Journal for the New Decade to Publish ...

    Both digitized healthcare and bioinformatics-based research is producing and benefiting from increasingly complex data. This requires the development of tools and methods to extract information from these data and translate it into new knowledge. ... including the absolute and relative proportions of bioinformatics-related papers, ...

  12. Science, medicine, and the future: Bioinformatics

    Bioinformatics is defined as the application of tools of computation and analysis to the capture and interpretation of biological data. It is an interdisciplinary field, which harnesses computer science, mathematics, physics, and biology (fig. . 1 ). Bioinformatics is essential for management of data in modern biology and medicine.

  13. Bioinformatics Related Research Topics

    Training in data science skills can unlock the potential of biomedical Big Data. To address the need for data science skills and knowledge, JAX offers a bioinformatics training program that equips researchers with the critical data science skills needed to move research forward.

  14. Current Research Topics in Bioinformatics

    A recent study has found that the interest of researchers in these topics plateaued over after the early 2000s [1]. Besides the above mentioned hot topics, the following topics are considered demanding in bioinformatics. Cloud computing, big data, Hadoop. Machine learning. Artificial intelligence.

  15. (PDF) Bioinformatics: an overview for cancer research

    BIOINFORMATICS: AN OVERVIEW FOR CANCER RESEARCH. Mamta Chowdhary *1, Dr. Asha R ani 1, Jyoti Parkash 2, Mohd Shahn az 2, and Dhruv Dev 2. 1 Dolphin (PG) college of Life Sciences, Chuni Kalan, Dist ...

  16. 2021 Bioinformatics and Translational Informatics Best Papers

    We focused our search on the most relevant journals for bioinformatics and translational informatics with electronic publication dates on or after January 1, 2021. The journals surveyed for best papers are as follows: Journal of the American Medical Informatics Association (JAMIA), Journal of Biomedical Informatics (JBI), PLoS Computational ...

  17. Bank Runs, Fragility, and Regulation

    Founded in 1920, the NBER is a private, non-profit, non-partisan organization dedicated to conducting economic research and to disseminating research findings among academics, public policy makers, and business professionals.

  18. Bioinformatics and genomic medicine

    Bioinformatics is a rapidly emerging field of biomedical research. A flood of large-scale genomic and postgenomic data means that many of the challenges in biomedical research are now challenges ...

  19. Bioinformatics Methods in Medical Genetics and Genomics

    Note the related bioinformatics tools papers [20,21] published in the Frontiers in Genetics special issue "Bioinformatics of Genome Regulation and Systems Biology" , and BMC Genomics issue . The research topic on gene expression regulation in Frontiers in Genetics is continued in 2020.

  20. About 1 in 4 public school teachers experienced a gun-related lockdown

    High school teachers are most likely to report experiencing these lockdowns: 34% say their school went on at least one gun-related lockdown in the last school year. This compares with 22% of middle school teachers and 16% of elementary school teachers. Teachers in urban schools are also more likely to say that their school had a gun-related ...

  21. Bioinformatics Approach in Plant Genomic Research

    Abstract. The advance in genomics technology leads to the dramatic change in plant biology research. Plant biologists now easily access to enormous genomic data to deeply study plant high-density genetic variation at molecular level. Therefore, fully understanding and well manipulating bioinformatics tools to manage and analyze these data are ...