kb/data/en.wikipedia.org/wiki/Data_science-1.md

3.4 KiB

title chunk source category tags date_saved instance
Data science 2/2 https://en.wikipedia.org/wiki/Data_science reference science, encyclopedia 2026-05-05T09:53:53.540719+00:00 kb-cron

data collection and integration; data cleaning and preparation (handling missing values, outliers, encoding, normalisation); feature engineering and selection; visualisation and descriptive statistics; fitting and evaluating statistical or machine-learning models; communicating results and ensuring reproducibility (e.g., reports, notebooks, and dashboards). Lifecycle frameworks such as CRISP-DM describe these steps from business understanding through deployment and monitoring. Data science involves working with larger datasets that often require advanced computational and statistical methods to analyze. Data scientists often work with unstructured data such as text or images and use machine learning algorithms to build predictive models. Data science often uses statistical analysis, data preprocessing, and supervised learning. Recent studies indicate that AI is moving towards data-centric approaches, focusing on the quality of datasets rather than just improving AI models. This trend focuses on improving system performance by cleaning, refining, and labeling data (Bhatt et al., 2024). As AI systems grow larger, the data-centric view has become increasingly important.

== Cloud computing for data science ==

Cloud computing can offer access to large amounts of computational power and storage. In big data, where volumes of information are continually generated and processed, these platforms can be used to handle complex and resource-intensive analytical tasks. Some distributed computing frameworks are designed to handle big data workloads. These frameworks can enable data scientists to process and analyze large datasets in parallel, which can reduce processing times.

== Ethical consideration in data science == Data science involves collecting, processing, and analyzing data which often includes personal and sensitive information. Ethical concerns include potential privacy violations, bias perpetuation, and negative societal impacts. Ethics education in data science has grown to encompass both technical principles and more expansive philosophical questions. Research indicates that data science ethics courses are increasingly integrating human-centric topics, including fairness, accountability, and responsible decision-making, thereby connecting them to enduring discussions in moral and political philosophy (Colando & Hardin, 2024). The goal of this method is to help students understand how data-driven technologies affect society. Machine learning models can amplify existing biases present in training data, leading to discriminatory or unfair outcomes. Another area of data science that is growing is the push for better ways to cite data. Citing datasets makes it easier for other researchers to understand what data was used and for studies to be repeated (Lafia et al., 2023). These practices give the people who collect and manage data the credit they deserve, which is becoming more important in modern research.

== See also ==

Python (programming language) R (programming language) Data engineering Big data Machine learning Artificial intelligence Bioinformatics Astroinformatics Topological data analysis List of data science journals List of data science software List of open-source data science software Data science notebook software

== References ==