Whether you’re building a model or trying to simulate a scenario within your analysis, at some point in your Excel career, you’re going to need to use a randomized set of data. If you need to create a purely random set of numbers, with no specific constraints or parameters, you can just use the RAND function in Excel to generate those numbers for you.
While Excel’s random number generating formula will help you some situations, there are many analysis and simulation cases where it simply won’t be realistic. For example, let’s say that I wanted to simulate the test scores for a group of students on an exam and I know from past history that the average score is a 80. The lowest possible score is 0 and the highest possible score is 100.
While I want a randomized result, I know that the test scores are not going to be uniformly distributed between 0 and 100. However, if we used Excel’s basic RAND formula without any adjustments, that is the output that Excel would create for us.
The Random Normal Distribution
Before we go further, please note that this is an Excel blog and not a statistics blog. Therefore, I’m not going to spend much time on statistical proofs or mentioning famous mathematicians. I’ll try to keep it as simple as possible and focus on Excel execution.
, also commonly referred to as a bell curve, is based on the assumption that a distribution of values generally cluster around an average. Within the distribution, very high and very low values are still possible, but are less frequent than the ones closer to the average.
In nature, we know that this type of clustering occurs, as on the aforementioned test example, as generally a lot of people will score near the average, and generally fewer people will have very high and very low scores.
A “random” normal distribution is just a random set of data that collectively matches the characteristics of a normal distribution. The random normal distribution is one the most common data sets that you’ll want to use to make your data look realistic for real life situations.
The two key parameters you need to know about the normal distribution / bell curve are:
- The mean – what the average of the normal distribution will be, and simplifying greatly, where roughly the middle of your chart top of that bell curve is
- The standard deviation – again, simplifying greatly, how wide the bell curve will be; the higher this number, the wider the bell curve will be
How it Works
To create a normally distributed set of random numbers in Excel, we’ll use the NORMINV formula. The NORMINV formula is what is capable of providing us a random set of numbers in a normally distributed fashion. The syntax for the formula is below:
= NORMINV ( Probability, Mean, Standard Deviation )
The key to creating a random normal distribution is nesting the RAND formula inside of the NORMINV formula for the probability input
Let’s go through the inputs to explain how it works:
- Probability – for the probability input, you just want to input the RAND function. If you look at the uniform graph to the left this is what the RAND function would produce by itself – an output where all values have the same probability of occurring. What the NORMINV function does is convert this uniform distribution into a normal one, by making values closer to the mean more likely and values further from the mean less likely. It does so using the next set of inputs as context.
- Mean – as mentioned before, this is the average that your random values will cluster around. Because you’re using a random set of numbers to generate your distribution, this most likely won’t end up being your actual average or the actual apex of your bell curve.
- Standard Deviation – the standard deviation will determine you wide your distribution is. The higher the number, the wider your distribution of values.
Based on the syntax, what Excel creates a normally distributed set of data based on the mean and standard deviation you provided. The probability input of the syntax is what determines the actual data value that is returned. For the probability input, Excel is expecting a number between 0 and 1 which is exactly what the RAND provides.
To summarize, what Excel does is take the value from our RAND function, which by itself provides a random set of numbers uniformly distributed between 0 and 1, and forces it to instead to create a normally distributed set of numbers based on a mean and standard deviation we provide.
We’ve gone through the theory, so let’s go through the test example we’ve referenced to put the NORMINV formula into practice:
Step 1: Determine how many numbers you’ll need in your sample
For our example, we’ll assume we only need 15 values.
Step 2: Input the mean and standard deviation you want
You can put these directly in the formula, but generally it’s better to have them in a separate cell, as it makes it easier to do mass updates to large sets of random numbers.
Using our previous example, we’ll assume that the average of our sample should be 80 and the standard deviation should be 10.
Step 3: Write the Normal Inverse Formula using the RAND formula and referencing the mean and standard deviation you documented
= NORMINV ( Probability, Mean, Standard Deviation )
For the NORMINV function, your parameter inputs should be:
- Probability = RAND()
- Mean = reference the cell with 80
- Standard Deviation = reference the cell with 10
Step 4: Reference Lock your mean and standard deviation references
This is a very important step, as generally, you’ll want to use this formula for multiple cells. You can do this by clicking into the cell reference within the formula and hitting the F4 key.
Step 5: Copy your formula down
You can do this by double clicking the lower right hand corner of the cell.
Step 6: Hardcode your values
Now that you’ve created your random number set, you probably don’t want it to change every time you update a cell. Therefore, go ahead and copy your data set values, and perform a Paste Special ( ALT → E → S → V ) to hard code the values.
You now have a normally distributed set of random numbers, based on a defined mean and standard deviation.
Normally Distributed Random Number Template
We’ve gone through the process of creating a random normal distribution of numbers manually. But I’ve also built a simple Excel template that will help make this process a lot easier.
All you need to do is download the file and input the following parameters:
- Standard Deviation
- Sample Count (up to 1000)
- Bin Size
The template uses the same formula described above, but also has a separate formula that delimits minimums and maximums. (Unfortunately, NORMINV does not have any parameters within its syntax to let you set a minimum or maximum return value)
Once you’ve set your values, one of the key features of the template is providing you a quick of the distribution of the random data you’ve created. This is done with the “Count” cells to the right, which are conditionally formatted with data bars. This basically provides you a histogram on its side. I didn’t want to go through the trouble of creating a graph because this is supposed to be a flexible, quick and dirty view of your data. It would defeat the purpose if you had to mess around with chart formatting as you were playing with your inputs.
Excel Template Settings & Notes
- To adjust the “histogram”‘ view, you should change the minimum and bin size to get the right view of your data. I’ve included 20 bins for you to work with, but this can be increased by appending bin numbers and copying the two other formulas down.
- I’ve nested the NORMINV formula inside the ROUND function so that the random values returned are always integers. You can remove this or change the number of decimal places returned by adjusting the formula.
- For the Minimum and Maximum limitations, I’ve made the assumption that if the NORMINV formula comes up with a value either less than the minimum or greater than the maximum, it’ll simply return the minimum or maximum respectfully. If this is not desired, you can simply change the logic in the “Accounting for Min and Max Limits” field.
- If you need to add more numbers to your random sample (more than 1,000 values), you can simply insert rows into the data portion of the template and copy down the formulas.
- Note that any time you recalculate you save or add new values, your data set will change because the RAND function will recalculate.
- To save your values, all you need to do is copy from the green cells and do a Paste Special → Values to hardcode the numbers. I chose not to build in a macro to do this since this is intended to be a relatively simple template.