kb/Biostatistics-1.md at d2951df04285b6d61e1b3d24a6877d45e7bd8aaf

turtle89431 2e50ba1868 Scrape wikipedia-science: 15617 new, 4054 updated, 20200 total (kb-cron)

2026-05-05 07:02:36 -07:00

6.8 KiB

Raw Blame History

title	chunk	source	category	tags	date_saved	instance
Biostatistics	2/6	https://en.wikipedia.org/wiki/Biostatistics	reference	science, encyclopedia	2026-05-05T14:01:53.013815+00:00	kb-cron

=== Hypothesis definition === Once the aim of the study is defined, the possible answers to the research question can be proposed, transforming this question into a hypothesis. The main propose is called null hypothesis (H0) and is usually based on a permanent knowledge about the topic or an obvious occurrence of the phenomena, sustained by a deep literature review. We can say it is the standard expected answer for the data under the situation in test. In general, HO assumes no association between treatments. On the other hand, the alternative hypothesis is the denial of HO. It assumes some degree of association between the treatment and the outcome. Although, the hypothesis is sustained by question research and its expected and unexpected answers. As an example, consider groups of similar animals (mice, for example) under two different diet systems. The research question would be: what is the best diet? In this case, H0 would be that there is no difference between the two diets in mice metabolism (H0: μ1 = μ2) and the alternative hypothesis would be that the diets have different effects over animals metabolism (H1: μ1 ≠ μ2). The hypothesis is defined by the researcher, according to his/her interests in answering the main question. Besides that, the alternative hypothesis can be more than one hypothesis. It can assume not only differences across observed parameters, but their degree of differences (i.e. higher or shorter).

=== Sampling === Usually, a study aims to understand an effect of a phenomenon over a population. In biology, a population is defined as all the individuals of a given species, in a specific area at a given time. In biostatistics, this concept is extended to a variety of collections possible of study. Although, in biostatistics, a population is not only the individuals, but the total of one specific component of their organisms, as the whole genome, or all the sperm cells, for animals, or the total leaf area, for a plant, for example. It is not possible to take the measures from all the elements of a population. Because of that, the sampling process is very important for statistical inference. Sampling is defined as to randomly get a representative part of the entire population, to make posterior inferences about the population. So, the sample might catch the most variability across a population. The sample size is determined by several things, since the scope of the research to the resources available. In clinical research, the trial type, as inferiority, equivalence, and superiority is a key in determining sample size.

=== Experimental design === Experimental designs sustain those basic principles of experimental statistics. There are three basic experimental designs to randomly allocate treatments in all plots of the experiment. They are completely randomized design, randomized block design, and factorial designs. Treatments can be arranged in many ways inside the experiment. In agriculture, the correct experimental design is the root of a good study and the arrangement of treatments within the study is essential because environment largely affects the plots (plants, livestock, microorganisms). These main arrangements can be found in the literature under the names of "lattices", "incomplete blocks", "split plot", "augmented blocks", and many others. All of the designs might include control plots, determined by the researcher, to provide an error estimation during inference. In clinical studies, the samples are usually smaller than in other biological studies, and in most cases, the environment effect can be controlled or measured. It is common to use randomized controlled clinical trials, where results are usually compared with observational study designs such as case–control or cohort.

=== Data collection === Data collection methods must be considered in research planning, because it highly influences the sample size and experimental design. Data collection varies according to the type of data. For qualitative data, collection can be done with structured questionnaires or by observation, considering presence or intensity of disease, using score criterion to categorize levels of occurrence. For quantitative data, collection is done by measuring numerical information using instruments. In agriculture and biology studies, yield data and its components can be obtained by metric measures. However, pest and disease injuries in plants are obtained by observation, considering score scales for levels of damage. Especially, in genetic studies, modern methods for data collection in field and laboratory should be considered, as high-throughput platforms for phenotyping and genotyping. These tools allow bigger experiments, while turn possible evaluate many plots in lower time than a human-based only method for data collection. Finally, all data collected of interest must be stored in an organized data frame for further analysis.

== Analysis and data interpretation ==

=== Descriptive tools ===

Data can be represented through tables or graphical representation, such as line charts, bar charts, histograms, scatter plot. Also, measures of central tendency and variability can be very useful to describe an overview of the data. Follow some examples:

==== Frequency tables ==== One type of table is the frequency table, which consists of data arranged in rows and columns, where the frequency is the number of occurrences or repetitions of data. Frequency can be: Absolute: represents the number of times that a determined value appear;

    N
    =
    
      f
      
        1
      
    
    +
    
      f
      
        2
      
    
    +
    
      f
      
        3
      
    
    +
    .
    .
    .
    +
    
      f
      
        n
      
    
  

{\displaystyle N=f_{1}+f_{2}+f_{3}+...+f_{n}}

Relative: obtained by the division of the absolute frequency by the total number;

      n
      
        i
      
    
    =
    
      
        
          f
          
            i
          
        
        N
      
    
  

{\displaystyle n_{i}={\frac {f_{i}}{N}}}

In the next example, we have the number of genes in ten operons of the same organism.

Genes = {2,3,3,4,5,3,3,3,3,4}

==== Line graph ====

Line graphs represent the variation of a value over another metric, such as time. In general, values are represented in the vertical axis, while the time variation is represented in the horizontal axis.

6.8 KiB Raw Blame History Unescape Escape

6.8 KiB

Raw Blame History