kb/data/en.wikipedia.org/wiki/Bioinformatics-3.md

6.8 KiB
Raw Blame History

title chunk source category tags date_saved instance
Bioinformatics 4/6 https://en.wikipedia.org/wiki/Bioinformatics reference science, encyclopedia 2026-05-05T14:00:39.573858+00:00 kb-cron

=== Analysis of regulation === Gene regulation is a complex process where a signal, such as an extracellular signal such as a hormone, eventually leads to an increase or decrease in the activity of one or more proteins. Bioinformatics techniques have been applied to explore various steps in this process. For example, gene expression can be regulated by nearby elements in the genome. Promoter analysis involves the identification and study of sequence motifs in the DNA surrounding the protein-coding region of a gene. These motifs influence the extent to which that region is transcribed into mRNA. Enhancer elements far away from the promoter can also regulate gene expression, through three-dimensional looping interactions. These interactions can be determined by bioinformatic analysis of chromosome conformation capture experiments. Expression data can be used to infer gene regulation: one might compare microarray data from a wide variety of states of an organism to form hypotheses about the genes involved in each state. In a single-cell organism, one might compare stages of the cell cycle, along with various stress conditions (heat shock, starvation, etc.). Clustering algorithms can be then applied to expression data to determine which genes are co-expressed. For example, the upstream regions (promoters) of co-expressed genes can be searched for over-represented regulatory elements. Examples of clustering algorithms applied in gene clustering are k-means clustering, self-organizing maps (SOMs), hierarchical clustering, and consensus clustering methods.

== Analysis of cellular organization == Several approaches have been developed to analyze the location of organelles, genes, proteins, and other components within cells. A gene ontology category, cellular component, has been devised to capture subcellular localization in many biological databases.

=== Microscopy and image analysis === Microscopic pictures allow for the location of organelles as well as molecules, which may be the source of abnormalities in diseases.

=== Protein localization === Finding the location of proteins allows us to predict what they do. This is called protein function prediction. For instance, if a protein is found in the nucleus it may be involved in gene regulation or splicing. By contrast, if a protein is found in mitochondria, it may be involved in respiration or other metabolic processes. There are well developed protein subcellular localization prediction resources available, including protein subcellular location databases, and prediction tools.

=== Nuclear organization of chromatin ===

Data from high-throughput chromosome conformation capture experiments, such as Hi-C (experiment) and ChIA-PET, can provide information on the three-dimensional structure and nuclear organization of chromatin. Bioinformatic challenges in this field include partitioning the genome into domains, such as Topologically Associating Domains (TADs), that are organised together in three-dimensional space.

== Structural bioinformatics ==

Finding the structure of proteins is an important application of bioinformatics. The Critical Assessment of Protein Structure Prediction (CASP) is an open competition where worldwide research groups submit protein models for evaluating unknown protein models.

=== Amino acid sequence === The linear amino acid sequence of a protein is called the primary structure. The primary structure can be easily determined from the sequence of codons on the DNA gene that codes for it. In most proteins, the primary structure uniquely determines the 3-dimensional structure of a protein in its native environment. An exception is the misfolded prion protein involved in bovine spongiform encephalopathy. This structure is linked to the function of the protein. Additional structural information includes the secondary, tertiary and quaternary structure. A viable general solution to the prediction of the function of a protein remains an open problem. Most efforts have so far been directed towards heuristics that work most of the time.

=== Homology === In the genomic branch of bioinformatics, homology is used to predict the function of a gene: if the sequence of gene A, whose function is known, is homologous to the sequence of gene B, whose function is unknown, one could infer that B may share A's function. In structural bioinformatics, homology is used to determine which parts of a protein are important in structure formation and interaction with other proteins. Homology modeling is used to predict the structure of an unknown protein from existing homologous proteins. One example of this is hemoglobin in humans and the hemoglobin in legumes (leghemoglobin), which are distant relatives from the same protein superfamily. Both serve the same purpose of transporting oxygen in the organism. Although both of these proteins have very different amino acid sequences, their protein structures are very similar, reflecting their shared function and shared ancestor. Other techniques for predicting protein structure include protein threading and de novo (from scratch) physics-based modeling. Another aspect of structural bioinformatics include the use of protein structures for Virtual Screening models such as Quantitative Structure-Activity Relationship models and proteochemometric models (PCM). Furthermore, a protein's crystal structure can be used in simulation of for example ligand-binding studies and in silico mutagenesis studies. A 2021 deep-learning algorithms-based software called AlphaFold, developed by Google's DeepMind, greatly outperforms all other prediction software methods, and has released predicted structures for hundreds of millions of proteins in the AlphaFold protein structure database.

== Network and systems biology ==

Network analysis seeks to understand the relationships within biological networks such as metabolic or proteinprotein interaction networks. Although biological networks can be constructed from a single type of molecule or entity (such as genes), network biology often attempts to integrate many different data types, such as proteins, small molecules, gene expression data, and others, which are all connected physically, functionally, or both. Systems biology involves the use of computer simulations of cellular subsystems (such as the networks of metabolites and enzymes that comprise metabolism, signal transduction pathways and gene regulatory networks) to both analyze and visualize the complex connections of these cellular processes. Artificial life or virtual evolution attempts to understand evolutionary processes via the computer simulation of simple (artificial) life forms.

=== Molecular interaction networks ===