--- title: "Bioinformatics" chunk: 5/6 source: "https://en.wikipedia.org/wiki/Bioinformatics" category: "reference" tags: "science, encyclopedia" date_saved: "2026-05-05T14:00:39.573858+00:00" instance: "kb-cron" --- Tens of thousands of three-dimensional protein structures have been determined by X-ray crystallography and protein nuclear magnetic resonance spectroscopy (protein NMR) and a central question in structural bioinformatics is whether it is practical to predict possible protein–protein interactions only based on these 3D shapes, without performing protein–protein interaction experiments. A variety of methods have been developed to tackle the protein–protein docking problem, though it seems that there is still much work to be done in this field. Other interactions encountered in the field include Protein–ligand (including drug) and protein–peptide. Molecular dynamic simulation of movement of atoms about rotatable bonds is the fundamental principle behind computational algorithms, termed docking algorithms, for studying molecular interactions. == Biodiversity informatics == Biodiversity informatics deals with the collection and analysis of biodiversity data, such as taxonomic databases, or microbiome data. Examples of such analyses include phylogenetics, niche modelling, species richness mapping, DNA barcoding, or species identification tools. A growing area is also macro-ecology, i.e. the study of how biodiversity is connected to ecology and human impact, such as climate change. == Others == === Literature analysis === The enormous number of published literature makes it virtually impossible for individuals to read every paper, resulting in disjointed sub-fields of research. Literature analysis aims to employ computational and statistical linguistics to mine this growing library of text resources. For example: Abbreviation recognition – identify the long-form and abbreviation of biological terms Named-entity recognition – recognizing biological terms such as gene names Protein–protein interaction – identify which proteins interact with which proteins from text The area of research draws from statistics and computational linguistics. === High-throughput image analysis === Computational technologies are used to automate the processing, quantification and analysis of large amounts of high-information-content biomedical imagery. Modern image analysis systems can improve an observer's accuracy, objectivity, or speed. Image analysis is important for both diagnostics and research. Some examples are: high-throughput and high-fidelity quantification and sub-cellular localization (high-content screening, cytohistopathology, Bioimage informatics) morphometrics clinical image analysis and visualization determining the real-time air-flow patterns in breathing lungs of living animals quantifying occlusion size in real-time imagery from the development of and recovery during arterial injury making behavioral observations from extended video recordings of laboratory animals infrared measurements for metabolic activity determination inferring clone overlaps in DNA mapping, e.g. the Sulston score === High-throughput single cell data analysis === Computational techniques are used to analyse high-throughput, low-measurement single cell data, such as that obtained from flow cytometry. These methods typically involve finding populations of cells that are relevant to a particular disease state or experimental condition. === Ontologies and data integration === Biological ontologies are directed acyclic graphs of controlled vocabularies. They create categories for biological concepts and descriptions so they can be easily analyzed with computers. When categorised in this way, it is possible to gain added value from holistic and integrated analysis. The OBO Foundry was an effort to standardise certain ontologies. One of the most widespread is the Gene ontology which describes gene function. There are also ontologies which describe phenotypes. == Databases == Databases are essential for bioinformatics research and applications. Databases exist for many different information types, including DNA and protein sequences, molecular structures, phenotypes and biodiversity. Databases can contain both empirical data (obtained directly from experiments) and predicted data (obtained from analysis of existing data). They may be specific to a particular organism, pathway or molecule of interest. Alternatively, they can incorporate data compiled from multiple other databases. Databases can have different formats, access mechanisms, and be public or private. Some of the most commonly used databases are listed below: Used in biological sequence analysis: Genbank, UniProt Used in structure analysis: Protein Data Bank (PDB) Used in finding Protein Families and Motif Finding: InterPro, Pfam Used for Next Generation Sequencing: Sequence Read Archive Used in Network Analysis: Metabolic Pathway Databases (KEGG, BioCyc), Interaction Analysis Databases, Functional Networks Used in design of synthetic genetic circuits: GenoCAD == Software and tools == Software tools for bioinformatics include simple command-line tools, more complex graphical programs, and standalone web-services. They are made by bioinformatics companies or by public institutions. === Open-source bioinformatics software === Many free and open-source software tools have existed and continued to grow since the 1980s. The combination of a continued need for new algorithms for the analysis of emerging types of biological readouts, the potential for innovative in silico experiments, and freely available open code bases have created opportunities for research groups to contribute to both bioinformatics regardless of funding. The open source tools often act as incubators of ideas, or community-supported plug-ins in commercial applications. They may also provide de facto standards and shared object models for assisting with the challenge of bioinformation integration. Open-source bioinformatics software includes Bioconductor, BioPerl, Biopython, BioJava, BioJS, BioRuby, Bioclipse, EMBOSS, .NET Bio, Orange with its bioinformatics add-on, Apache Taverna, UGENE and GenoCAD. The non-profit Open Bioinformatics Foundation and the annual Bioinformatics Open Source Conference promote open-source bioinformatics software. === Web services in bioinformatics === SOAP- and REST-based interfaces have been developed to allow client computers to use algorithms, data and computing resources from servers in other parts of the world. The main advantage are that end users do not have to deal with software and database maintenance overheads. Basic bioinformatics services are classified by the EBI into three categories: SSS (Sequence Search Services), MSA (Multiple Sequence Alignment), and BSA (Biological Sequence Analysis). The availability of these service-oriented bioinformatics resources demonstrate the applicability of web-based bioinformatics solutions, and range from a collection of standalone tools with a common data format under a single web-based interface, to integrative, distributed and extensible bioinformatics workflow management systems. ==== Bioinformatics workflow management systems ==== A bioinformatics workflow management system is a specialized form of a workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, in a Bioinformatics application. Such systems are designed to