Tap to Read ➤

An Overview of Data Mining Technology

Ravi Kumar Paluri
Data mining is the process of extracting knowledge from data. It enables complex business processes to be re-engineered and understood. This can be achieved by past behavior of business processes in relation with data patterns.
Data mining tools are used to judge future trends and behavior allowing businesses to make knowledge driven, proactive decisions. It provides automated, prospective analysis which leaves behind the analysis of past events provided by retrospective tools of decision support systems.
Data mining can answer business questions that were once considered time-consuming to resolve. They search databases for hidden patterns and predictive information. It can be implemented rapidly on existing hardware and software platforms to increase the value of existing information resources. It can also be easily integrated with new products and systems.

How Data Mining Works

Preparing the Information

Data has to be properly organized for effective information processing. In data mining, 70% to 80% of the time is spent on sorting and summarizing the information, before mining efforts actually start. Data is prepared on the basis of desired information objectives.

Modeling

A model is created depending upon many factors, such as database size, number of known variables, and which kind of algorithms are to be employed. Successful models help companies to study and effectively identify their target market.
Models are designed against massive transaction detail data warehouses of 10TB to 20TB. These models are often used as front end by intelligent segmentation capabilities to enable the derivation of actionable customer segments.

Scoring Customers

The best way to access a model's viability is to test it against existing data, where the answer to a particular question is already known. The score is the output of a model, which is number between 0 and 1, as the probability of answer to a specific question.

Dynamic Scoring

In this, a scoring process is determined by another software application with the aim to use this score for other purposes. For up-to-date results and eliminating the need to score an entire database, only required record subsets are scored.

Scope of Data Mining

Automated Prediction of Trends and Behavior

Data mining automatically finds predictive information in large databases. The questions, which require extensive analysis, can be answered directly from the data. Predictive problem is a typical example of targeted marketing. This technology uses data on past mailings to identify targets, and provide maximize return on investment in future mailings.

Automated Discovery of Previously Unknown Patterns

Data mining tools search databases and identify previously hidden patterns in one step. Pattern discovery is example of the analysis of retail sales data to identify seemingly unrelated products that are often purchased together.
The tools can analyze massive databases in minutes, if they are implemented on high performance parallel processing systems. Faster processing allows users to automatically experiment with more models to understand complex data. Databases can be larger in both depth and width.

a. More Columns

Analysts provide limited number of variables during analysis due to time constraints. Data mining allows users to explore the full depth of a database without selecting a subset of variables.

b. More Rows

Larger samples cause lower estimation errors and variance, and allow users to make inferences about small, important segments of a population.
The most commonly used techniques are:
i.] Artificial neural networks provides non-linear predictive models that learn through training, and resemble biological neural networks in structure.
ii.] Decision trees represent sets of decisions. This generates rules for the classification of a dataset. Some methods include Classification and Regression Trees (CART).
iii.] Genetic algorithms provide optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of evolution.
iv.] Rule Induction are useful if if-then rules are based on statistical significance.

Architecture for Data Mining

Data mining is fully integrated with a data warehouse and flexible interactive business analysis tools. Many tools work outside the data warehouse and require extra steps for importing, extracting, and analyzing the data. While a new approach requires operational implementation, integration with the warehouse simplifies the application from data mining.
The resulting analytic data warehouse can be useful to improve business processes throughout the organization in promotional campaign management, new product rollout, and fraud detection.
Data warehouse is the ideal starting point containing a combination of internal data tracking and all customer contact, coupled with external market data about competitor activity.
Prospecting background information on potential customers provides an excellent basis. This can be implemented in Sybase, Oracle, Redbrick, and so on, and should be optimized for flexible and fast data access.
When navigating the data warehouse, a OLAP (On-Line Analytical Processing) server enables a more sophisticated end-user business model. These multidimensional structures allow the user to analyze data as per their view.
The data mining server is fully integrated with the data warehouse and OLAP server to embed ROI focused business analysis. Metadata template, which is process-centric, defines the objectives for specific business issues, like campaign management, prospecting, and promotion optimization.
Integration with the data warehouse allows operational decisions to be directly implemented and tracked. With new decisions and results the warehouse grows and the organization can continually mine the best practices and apply them to future decisions.

Applications

1. A pharmaceutical company can examine its recent sales force activity and their results to improve marketing activities which have the greatest impact. The data needs to include information about the local health care systems as well as competitor market activity.
By using a wide-area network, the results can be distributed to the sales force, that enables the representatives to review the recommendations from the perspective of the key attributes in the decision process. This dynamic analysis allows best practices from throughout the organization to be applied in specific sales situations.
2. A credit card company can force its vast warehouse of customer transaction data to identify customers interested in a new credit product. A small test mailing can identify the attributes of customers with an affinity for the product.
3. A large consumer package goods company can apply data mining to improve its sales process to its retailers. Data collected from consumer panels, shipments, and competitor activity can be used to determine the reasons for brand and store switching. Using this analysis, the manufacturer can select best strategies that reach their target customer segments.