Interpreting Big Data Analytics often leads us to the quandary of answering the question, “Is it signal, or just noise?” When the statistics applied for our analytics study is based upon historical data it is retrospective in nature. The analysis may point out significant signals that are potentially meaningful and also potentially misleading. How does one tell the difference between signal and noise?
A true signal was caused by something that can be turned on or turned off. Noise is based upon the environment where there is no control that allows for turning something on or off. An example of a quantifiable signal is the faucet in your kitchen sink. Turn it on and the water flows. Turn on and to one side or another you get either hot, cold, or a mixture of tempered water. Even with this simple example we must assume that there is water, water pressure in the pipe to the faucet, and a hot water source all connected to the faucet and all valves are in working order. For a signal to be meaningful in big data analytics have you considered the relevant assumptions?
Noise is a totally different phenomenon. As Dr. Genichi Taguchi has proposed, noise, or uncontrollable variables, cause measureable changes to a system under study from sources that are uncontrollable. We must strive to understand those sources and find the variables we can control that minimize the influence of the uncontrollable variables. An example of noise is the weather. It can change hourly, daily, weekly, monthly, and annually while we have no control, but measureable changes may occur in the system under study. Taguchi pointed out that we need to consider the potential noise factors, or variables that can influence outcomes. It is often easy to point out a shift in the variables being measured in the study and proclaim that something has happened, but what was the cause? Was it signal, or just noise?
In retrospective analysis the data we have is historical. The challenge is to collect data associated with potential noise factors that can adversely influence the key outcomes leading to misinterpretation. To guard against this situation designing the data collection plan with input variables, output variables, and noise factors is required. Collecting historical noise factor data can be difficult and often doesn’t exist because it wasn’t originally captured during the time period under study. The following is a proactive case example geared toward minimizing the adverse influence of noise.
Complete details are available as a course on EducateVirtually.com, related to promotional marketing.
The case study is based upon a simple designed experiment with highly complex noise factors with the sole purpose to allow the measurement of the signal above the noise. In this case the 3 factors in the designed experiment were descriptive mailings about the product, samples of the product, and telemarketing about the product. Each variable had only 2 settings for the target audience of customers, either you got it or you didn’t. The experiment had 8 conditions of 3 factors at 2 settings, or a full factorial, which is all possible combinations. The experiment was conducted in a small country in Europe. The country can be classified into numerous bricks, or a zip code in the United States. The bricks had unique demographic classifications, which are the noise variables. The bricks were then grouped into like demographic groups. Each experiment condition was then applied to the same number of bricks of each demographic classification that were selected randomly and not used for any other condition. In essence the noise associated with the unique demographic differences was equally distributed across all experiment conditions leveling the playing field allowing the signal to rise above the noise. The experiment was a success by both reducing promotional spend and significantly driving sales.
To learn more, just take the course and many others at EducateVirtually.com.