--- title: "Data mining" chunk: 4/4 source: "https://en.wikipedia.org/wiki/Data_mining" category: "reference" tags: "science, encyclopedia" date_saved: "2026-05-05T03:56:28.867669+00:00" instance: "kb-cron" --- === Situation in the United States === US copyright law, and in particular its provision for fair use, upholds the legality of content mining in America, and other fair use countries such as Israel, Taiwan and South Korea. As content mining is transformative, that is it does not supplant the original work, it is viewed as being lawful under fair use. For example, as part of the Google Book settlement the presiding judge on the case ruled that Google's digitization project of in-copyright books was lawful, in part because of the transformative uses that the digitization project displayed—one being text and data mining. == Software == === Free open-source data mining software and applications === The following applications are available under free/open-source licenses. Public access to application source code is also available. Carrot2: Text and search results clustering framework. Chemicalize.org: A chemical structure miner and web search engine. ELKI: A university research project with advanced cluster analysis and outlier detection methods written in the Java language. GATE: a natural language processing and language engineering tool. KNIME: The Konstanz Information Miner, a user-friendly and comprehensive data analytics framework. Massive Online Analysis (MOA): a real-time big data stream mining with concept drift tool in the Java programming language. MEPX: cross-platform tool for regression and classification problems based on a Genetic Programming variant. mlpack: a collection of ready-to-use machine learning algorithms written in the C++ language. NLTK (Natural Language Toolkit): A suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python language. OpenNN: Open neural networks library. Orange: A component-based data mining and machine learning software suite written in the Python language. PSPP: Data mining and statistics software under the GNU Project similar to SPSS R: A programming language and software environment for statistical computing, data mining, and graphics. It is part of the GNU Project. scikit-learn: An open-source machine learning library for the Python programming language; Torch: An open-source deep learning library for the Lua programming language and scientific computing framework with wide support for machine learning algorithms (development of it moved mostly to the much more used Python-based PyTorch) UIMA: The UIMA (Unstructured Information Management Architecture) is a component framework for analyzing unstructured content such as text, audio and video – originally developed by IBM. Weka: A suite of machine learning software applications written in the Java programming language. === Proprietary data-mining software and applications === The following applications are available under proprietary licenses. Angoss KnowledgeSTUDIO: data mining tool LIONsolver: an integrated software application for data mining, business intelligence, and modeling that implements the Learning and Intelligent OptimizatioN (LION) approach. PolyAnalyst: data and text mining software by Megaputer Intelligence. Microsoft Analysis Services: data mining software provided by Microsoft. NetOwl: suite of multilingual text and entity analytics products that enable data mining. Oracle Data Mining: data mining software by Oracle Corporation. PSeven: platform for automation of engineering simulation and analysis, multidisciplinary optimization and data mining provided by DATADVANCE. Qlucore Omics Explorer: data mining software. RapidMiner: An environment for machine learning and data mining experiments. SAS Enterprise Miner: data mining software provided by the SAS Institute. SPSS Modeler: data mining software provided by IBM. STATISTICA Data Miner: data mining software provided by StatSoft. Tanagra: Visualization-oriented data mining software, also for teaching. Vertica: data mining software provided by Hewlett-Packard. Google Cloud Platform: automated custom ML models managed by Google. Amazon SageMaker: managed service provided by Amazon for creating & productionising custom ML models. == See also == Methods Application domains Application examples Related topics For more information about extracting information out of data (as opposed to analyzing data), see: Other resources International Journal of Data Warehousing and Mining == References == == Further reading == == External links ==