IT Community - Software Programming, Web Development and Technical Support

Info on Data mining

This is a discussion on Info on Data mining within the Database Support forums, part of the Web Development category; Data mining Data mining is sorting through data to identify patterns and establish relationships.It is an analytic process designed ...


Go Back   IT Community - Software Programming, Web Development and Technical Support > Web Development > Database Support

Register FAQ Members List Calendar Mark Forums Read
  #1 (permalink)  
Old 05-22-2007, 04:00 AM
vadivelanvaidyanathan vadivelanvaidyanathan is offline
D-Web Genius
 
Join Date: Feb 2007
Posts: 803
vadivelanvaidyanathan is on a distinguished road
Default Info on Data mining

Data mining

Data mining is sorting through data to identify patterns and establish relationships.It is an analytic process designed to explore data (usually large amounts of data - typically business or market related) in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data. The ultimate goal of data mining is prediction - and predictive data mining is the most common type of data mining and one that has the most direct business applications.


Data mining parameters include:

Association - looking for patterns where one event is connected to another event
Sequence or path analysis - looking for patterns where one event leads to another later event
Classification - looking for new patterns (May result in a change in the way the data is organized but that's ok)
Clustering - finding and visually documenting groups of facts not previously known
Forecasting - discovering patterns in data that can lead to reasonable predictions about the future (This area of data mining is known as predictive analytics.)

Use of the Data mining:
Data mining techniques are used in a many research areas, including mathematics, cybernetics, and genetics. Web mining, a type of data mining used in customer relationship management (CRM), takes advantage of the huge amount of information gathered by a Web site to look for patterns in user behavior. A data miner is a program that collects such information, often without the user's knowledge, as spyware.

-V.Vadivelan
Attached Images
File Type: jpg stadatm1.JPG (20.7 KB, 0 views)
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Sponsored Links
  #2 (permalink)  
Old 05-22-2007, 04:02 AM
vadivelanvaidyanathan vadivelanvaidyanathan is offline
D-Web Genius
 
Join Date: Feb 2007
Posts: 803
vadivelanvaidyanathan is on a distinguished road
Default Re: Info on Data mining

The process of data mining consists of three stages: (1) the initial exploration, (2) model building or pattern identification with validation/verification, and (3) deployment (i.e., the application of the model to new data in order to generate predictions).

Stage 1: Exploration.
This stage usually starts with data preparation which may involve cleaning data, data transformations, selecting subsets of records and - in case of data sets with large numbers of variables ("fields") - performing some preliminary feature selection operations to bring the number of variables to a manageable range (depending on the statistical methods which are being considered). Then, depending on the nature of the analytic problem, this first stage of the process of data mining may involve anywhere between a simple choice of straightforward predictors for a regression model, to elaborate exploratory analyses using a wide variety of graphical and statistical methods (see Exploratory Data Analysis (EDA)) in order to identify the most relevant variables and determine the complexity and/or the general nature of models that can be taken into account in the next stage.

Stage 2: Model building and validation.
This stage involves considering various models and choosing the best one based on their predictive performance (i.e., explaining the variability in question and producing stable results across samples). This may sound like a simple operation, but in fact, it sometimes involves a very elaborate process. There are a variety of techniques developed to achieve that goal - many of which are based on so-called "competitive evaluation of models," that is, applying different models to the same data set and then comparing their performance to choose the best. These techniques - which are often considered the core of predictive data mining - include: Bagging (Voting, Averaging), Boosting, Stacking (Stacked Generalizations), and Meta-Learning.

Stage 3: Deployment.
That final stage involves using the model selected as best in the previous stage and applying it to new data in order to generate predictions or estimates of the expected outcome

-V.Vadivelan
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 05-22-2007, 04:15 AM
vadivelanvaidyanathan vadivelanvaidyanathan is offline
D-Web Genius
 
Join Date: Feb 2007
Posts: 803
vadivelanvaidyanathan is on a distinguished road
Default Re: Info on Data mining

Models for Data Mining

In the business environment, complex data mining projects may require the coordinate efforts of various experts, stakeholders, or departments throughout an entire organization. In the data mining literature, various "general frameworks" have been proposed to serve as blueprints for how to organize the process of gathering data, analyzing data, disseminating results, implementing results, and monitoring improvements.

One such model, CRISP (Cross-Industry Standard Process for data mining) was proposed in the mid-1990s by a European consortium of companies to serve as a non-proprietary standard process model for data mining. This general approach postulates the following (perhaps not particularly controversial) general sequence of steps for data mining projects:



Another approach - the Six Sigma methodology - is a well-structured, data-driven methodology for eliminating defects, waste, or quality control problems of all kinds in manufacturing, service delivery, management, and other business activities. This model has recently become very popular (due to its successful implementations) in various American industries, and it appears to gain favor worldwide. It postulated a sequence of, so-called, DMAIC steps -



- that grew up from the manufacturing, quality improvement, and process control traditions and is particularly well suited to production environments (including "production of services," i.e., service industries).

Another framework of this kind (actually somewhat similar to Six Sigma) is the approach proposed by SAS Institute called SEMMA -



- which is focusing more on the technical activities typically involved in a data mining project.

All of these models are concerned with the process of how to integrate data mining methodology into an organization, how to "convert data into information," how to involve important stake-holders, and how to disseminate the information in a form that can easily be converted by stake-holders into resources for strategic decision making.

-V.Vadivelan
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
what is data mining? saravanan XML and SOAP 11 09-26-2008 10:05 PM
Data integrity and Data validity simplesabita Software Testing 1 10-30-2007 04:22 AM
What are the different problems that “Data mining” can solve? Data warehousing oxygen Database Support 1 07-26-2007 04:37 AM
How to copy data from one data table to another datable kingmaker C# Programming 1 07-24-2007 12:05 AM
Info on PHP accelerator sivaramakrishnan PHP Programming 0 07-11-2007 08:18 AM


All times are GMT -7. The time now is 02:17 PM.


Copyright ©2004 - 2007, DiscussWeb. All Rights Reserved.

SEO by vBSEO 3.0.0