9 Laws Everyone In The Data Mining Should Use

  • Post author:
  • Post last modified:February 28, 2018
  • Post category:Data Mining
  • Reading time:6 mins read

DATA MINING is a powerful new technology with a great potential to help companies focus on more important information by extracting the hidden predictive information from large database in their data warehouses. There are some 9 data mining laws that miner should follow when performing mining on particular data sets.

9 Laws Everyone In The Data Mining Should Use

Data mining provides two types of results:

  • Business Insights
  • Predictive models, makes predictions automatically.

It includes a various methods that include, clustering, classification and market basket analysis, etc.

Read:

Laws of Data Mining

Data mining is a not a technique but a process of creating knowledge either in natural or artificial form. In the year 1999, popular CRISP-DM methodology is the Cross-Industry Standard Process for Data Mining, which found successful and many data miners across globe follow this method. The methodology describes how to perform data mining but, does not describe what data mining is.

The following 9 laws of data mining:

Law 1: Goals for Business

Data mining solutions achieve business objectives”

It is not a technology but a process concerned with solving business problems as well as achieving business goals, which has one or more business objectives at its heart. Define objective for a business; there will be no data mining without it.

Law 2: Knowledge Repository for Business

Knowledge is nuclear to every step in data mining process”

This defines the major characteristics of data mining process. With the help of CRISP-DM reference model:

  • To understand the business, one should have strong knowledge about the business which helps in mapping business objectives to its goals.
  • Business knowledge and data understanding helps to determine business problem and how to solve it
  • Data Preparation uses business knowledge to frame the data
  • Model  algorithm to build predictive prototype and understand their nature in the business, i.e. to understand their business importance.
  • Evaluation means considerate the business strike by using the prototype.
  • Deployment is enabling the data mining outcome to perform in the business process.

In short, every step of data mining process will not be effective until there is business knowledge.

Law 3: Preparing the Data

Half of every data mining process is data preparation”

The principle of data preparation is to put the data in such a way that data mining queries can be asked as well as make it easy for analyzing and answering it. Prepare data in such a way that it should take minimal time and easy to update. There are 2 appearances for data preparation: first, the data should have the proper format so that analyzing should be easy from all the aspects. i.e entire data should be reflected in single table. The second appearance is making the data more relevant with respect to the business problems. I.e. certain aggregates may be relevant for  mining quires.

You cannot automate these appearances data preparation steps in any of the simple ways.

CRISP-DM-reference-model

Law 4: R and D-DM

The right protocol for a given operation can only be disclosed through R and D”

 There are five circumstances which add the necessity for R and D in finding data mining solutions:

  • Data mining (DM) is a process of extracting the hidden patterns, solutions as well as searching for not yet discovered connections.
  • Given operation, there is not only a single problem but various prototypes can be used to solve a various part of the problems, the way in which the problem is disintegrated is itself as the outcome of data mining and not known before the process of disintegration begins.
  • The data miner changes the problem accordingly by preparing the data so that the grounds deciding a prototype are regularly deviating.
  • There is no technical part of equivalent for predictive prototype.
  • During data mining process, business objectives itself go through improvement and evolution. Therefore, data mining goals keep on changing.

 Law 5: Watkins Law

 There are always patterns”

 Following describes Watkins general law:

  • Limit the field of interest to both in business objectives as well as in data mining goal.
  • Bring the data that is related to both business objective as well as mining goal
  • Apply rules to supervise the process
  • The data mining process generates certain patterns within the field of interest which will be relevant to the business objectives as well as its rules.

In short, there are always patterns which are necessary by the processed data. To find the patterns start from the simple method or a process or anything you know about the business, which contributes to the business knowledge.

Law 6: Vision Law

Data Mining intensifies approach in business field”

Data mining algorithms contribute to find the interesting patterns ahead normal human efficiency. This helps business experts as well as data miners to make use of this contribution to solve their problems and decision making.

Law 7: Foresight Law

Foresight increases knowledge narrowly by observation”

Predict the data using traditional methods like clustering and associated prototypes. Use clustering to predict the group into which individual falls. Similarly, use association model to predict number of features based on known information.

Clustering helps to find the anomaly, i.e the unusual transaction is determined by observing the given data set. It’s important to note that this new information doesn’t belong to the given “data”.

Market basket analysis helps to predict the association patterns at physical stores. Which helps to predict the cross selling as well as the business improvements.

Law 8: “Value Law”

“The value of data mining result is not determined by accuracy or stability of predictive prototypes”

Accuracy and stability are important terms that decide how to do predictions using predictive prototype.

Accuracy is something which tells how often the predictions are correct, whereas strength tells how much the prediction would change when tested with some other sample of same population.

High percentage of Accuracy will not improve the value of prediction prototype if the prototype doesn’t fit business problem. Similarly, cannot replace stability for the capability of the prototype to provide the insight of any business or for its fit to business problem.