2.7 KiB
| title | chunk | source | category | tags | date_saved | instance |
|---|---|---|---|---|---|---|
| Computational genomics | 2/2 | https://en.wikipedia.org/wiki/Computational_genomics | reference | science, encyclopedia | 2026-05-05T14:02:18.727709+00:00 | kb-cron |
== Biosynthetic gene clusters == Bioinformatic tools have been developed to predict, and determine the abundance and expression of, this kind of gene cluster in microbiome samples, from metagenomic data. Since the size of metagenomic data is considerable, filtering and clusterization thereof are important parts of these tools. These processes can consist of dimensionality -reduction techniques, such as Minhash, and clusterization algorithms such as k-medoids and affinity propagation. Also several metrics and similarities have been developed to compare them. Genome mining for biosynthetic gene clusters (BGCs) has become an integral part of natural product discovery. The >200,000 microbial genomes now publicly available hold information on abundant novel chemistry. One way to navigate this vast genomic diversity is through comparative analysis of homologous BGCs, which allows identification of cross-species patterns that can be matched to the presence of metabolites or biological activities. However, current tools are hindered by a bottleneck caused by the expensive network-based approach used to group these BGCs into gene cluster families (GCFs). BiG-SLiCE (Biosynthetic Genes Super-Linear Clustering Engine), a tool designed to cluster massive numbers of BGCs. By representing them in Euclidean space, BiG-SLiCE can group BGCs into GCFs in a non-pairwise, near-linear fashion. Satria et. al, 2021 across BiG-SLiCE demonstrate the utility of such analyses by reconstructing a global map of secondary metabolic diversity across taxonomy to identify uncharted biosynthetic potential, opens up new possibilities to accelerate natural product discovery and offers a first step towards constructing a global and searchable interconnected network of BGCs. As more genomes are sequenced from understudied taxa, more information can be mined to highlight their potentially novel chemistry.
== Compression algorithms ==
== See also == Bioinformatics Computational biology Earth BioGenome Project Genomics Microarray BLAST Computational epigenetics Nvidia Parabricks - suite of free software for genome analysis developed by Nvidia List of Metagenomics software List of genomic re-sequencing data compression tools
== References ==
== External links == Harvard Extension School Biophysics 101, Genomics and Computational Biology, http://www.courses.fas.harvard.edu/~bphys101/info/syllabus.html University of Bristol course in Computational Genomics, http://www.computational-genomics.net/