Available Online at www.ijpret.com 1185
INTERNATIONAL JOURNAL OF PURE AND
APPLIED RESEARCH IN ENGINEERING AND
TECHNOLOGY
A PATH FOR HORIZING YOUR INNOVATIVE WORK
THE STUDY ON DATA WAREHOUSE AND MINING
ASHVINI S. GORTE, PROF. NIKHIL S. BAND, DR. HEMANT R. DESHMUKH 1.Student of Master of Engineering in (CSE), IBSS college of Engineering and Technology, Amravati, India.
2.Assistant professor Department of (CSE), IBSS College of Engineering and Technology, Amravati, India. 3.Head of the Department of (CSE), IBSS College of Engineering and Technology, Amravati, India.
Accepted Date: 05/03/2015; Published Date: 01/05/2015
Abstract: Data ware housing is a booming industry with many interesting research problem. The data warehouse is concentrated on only few aspects. Here we are discussing about the data warehouse design and usage. Let’s look at various approaches to the data ware house design and usage process and the steps involved. Data warehouse can be built using a top-down approach, bottom – top-down approach or a combination of both. In this research paper we are discussing about the data warehouse design process. This paper provides an overview of Data warehousing, Data Mining, and the architecture of Data Warehousing. The data warehouse supports on-line analytical processing (OLAP), the functional and performance requirements of which are quite different from those of the on-line transaction processing (OLTP) applications traditionally supported by the operational databases. Data warehouses provide on-line analytical processing (OLAP) tools for the interactive analysis of multidimensional data of varied granularities, which facilitates effective data mining.
Keywords:Data Warehousing, Data Warehousing Design, Data Mining, Process.
Corresponding Author: MS. ASHVINI S. GORTE
Access Online On:
www.ijpret.com
How to Cite This Article:
Available Online at www.ijpret.com 1186
INTRODUCTION
A data warehouse is a “subject-oriented, integrated, time varying, non-volatile collection of data that is used primarily in organizational decision making. (Inmon, W.H., 1992) Typically, the data warehouse is maintained separately from the organization’s operational databases. Data warehousing is a collection of decision support technologies, aimed at enabling the knowledge worker (executive, manager, and analyst) to make better and faster decisions. It serves as a physical implementation of a decision support data model and stores the information on which an enterprise needs to make strategic decisions. The best subset contains the minimum number of dimensions that contribute more accuracy. Data mining, popularly known as Knowledge Discovery in Databases (KDD it is the nontrivial extraction of implicit, previously unknown and potentially useful information from data in databases.
II.DATA WAREHOUSING:
Definition of data warehousing
Available Online at www.ijpret.com 1187
warehousing technologies have been successfully deployed in many industries: manufacturing (for order shipment and customer support), retail (for user profiling and inventory management), financial services (for claims analysis, risk analysis, credit card analysis, and fraud detection), transportation (for fleet management), telecommunications (for call analysis and fraud detection), utilities (for power usage analysis), and healthcare (for outcomes analysis). This paper presents a roadmap of data warehousing technologies, focusing on the special requirements that data warehouses place on database management systems (DBMSs).
III.ARCHITECTURE AND END-TO-END PROCESS
Figure shows a typical data warehousing architecture.
Available Online at www.ijpret.com 1188
storage servers, database and OLAP servers, and tools. Integrate the servers, storage, and client tools. Design the warehouse schema and views. Roll out the warehouse and applications.
IV. DATA WAREHOUSE DESIGN PROCESS:
Fig: data warehouse design process
Here we discussed about various approaches to the data warehouse design process and the steps involved. A data warehouse can be built using a top-down approach, a bottom-up approach or a combination of both. The top – down approach starts with overall design and planning. It is useful in cases where the technology is mature and well known, and where the business problems that must be solved are clear and well understood. The bottom -up approach starts with experiments and prototypes. This is useful in the early stage of business modeling and technology development. And it also allowed an organization to move forward at considerable less expenses and evaluate the technological advantages before making significant commitments. In the combined approach, an organization can be exploit the planned and strategic nature of the top-down approach while retaining the rapid implementation and opportunistic application of the bottom – up approach.
V.DATA MINING
Data Mining is the extraction or “Mining” of knowledge from a large amount of data or data warehouse. To do this extraction data mining combines artificial intelligence, statistical analysis and database management systems to attempt to pull knowledge form stored data. Data mining is the process of applying intelligent methods to extract data patterns. This is done using the front-end tools. The spreadsheet is still the most compiling front-end Application for Online Analytical Processing (OLAP). The challenges in supporting a query environment for OLAP can be crudely summarized as that of supporting spreadsheet Operation effectively over large multi-gigabytes databases. To distinguish information extraction through data mining from that of a traditional database querying, the following main observation can be made.
VI.DATA MINING LIFE CYCLE:
Design
Available Online at www.ijpret.com 1189
The life cycle of a data mining project consists of five phases The sequence of the phases is not rigid. Moving back and forth between different phases is always required. It depends on the outcome of each phase. The main phases are:
1. Business Understanding:
This phase focuses on understanding the project objectives and requirements from a business perspective, then converting this knowledge into a data mining problem definition and a preliminary plan designed to achieve the objectives
2 Data Understanding:
It starts with an initial data collection, to get familiar with the data, to identify data
quality problems, to discover first insights into the data or to detect interesting subsets
to form hypotheses for hidden information.
3 Data Preparation:
In this stage, it collects all the different data sets and construct the varieties of the
activities basing on the initial raw data
4 Modeling:
In this phase, various modeling techniques are selected and applied and their parameters Are calibrated to optimal values.
5 Evaluation:
In this stage the model is thoroughly evaluated and review. The steps executed to construct the model to be certain it properly achieves the business objectives. At the end of this phase, a decision on the use of
the data mining results should be reached. 4.6 Deployment:
VII.CONCLUSION:
Available Online at www.ijpret.com 1190
project needs to make planned decisions. So, its architecture is said to be constructed by integrate data from multiple various sources to support logical reporting and decision-making..In this paper we briefly reviewed the various data mining applications. This review would be helpful to researchers to focus on the various issues of data mining. In future course, we will review the various classification algorithms and significance of evolutionary computing (genetic programming) approach in designing of efficient classification algorithms for data mining.
VIII. REFERENCES:
1. Fayyad, Usama; Piatetsky-Shapiro, Gregory; Smyth, Padhraic (1996). "From Data Mining to Knowledge Discovery in Databases". Retrieved 17 December 2008.
2. Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2009). "The Elements of Statistical Learning: Data Mining, Inference, and Prediction".
3. Han, Jiawei; Kamber, Micheline (2001). Data mining: concepts and techniques. Morgan Kaufmann. Thus, data mining should have been more appropriately named "knowledge mining from data," which is unfortunately somewhat long OKAIRP 2005 Fall Conference, Arizona State niversity About.com: Datamining
4. Witten, Ian H.; Frank, Eibe; Hall, Mark A. (30 January 2011). Data Mining: Practical Machine
Learning Tools and Techniques (3 ed.). Elsevier
5. Bouckaert, Remco R.; Frank, Eibe; Hall, Mark A.; Holmes, Geoffrey; Pfahringer, Bernhard; Reutemann, Peter; Witten, Ian H. (2010). "WEKA Experiences with a Java open-source project". Journal of Machine Learning Research11: 2533–2541. The original title, "Practical machine learning", was changed ... The term "data mining" was [added] primarily for marketing reasons.
6. Mena, Jesús (2011). Machine Learning Forensics for Law Enforcement, Security, and
Intelligence. Boca Raton, FL: CRC Press (Taylor & Francis Group). .