If you work in consulting or some other profession that requires you to analyze data, sooner or later, you’ll need to build a histogram. This is because the histograms is one of the most effective methods to visualize and understand a data set. Averages, sums, and counts are calculations that only tell you part of the story about your data – the histogram quickly shows you how your data is distributed, which is a great transition into further analysis and interpretation.
Conceptually, a histogram is fairly simple. The key factors that make up a histogram are:
- It’s a column chart or (vertical bar chart) with a set of bins as the horizontal axis
- The bins represent a range of values
- The bars within each bin represent a count of how many values in your data set fall within each respective range of values
In the example below, we have 17 values falling in the second bin, meaning 17 values in our data set fall between 60 and less than 65.
Please note: I’ve summarized the notation of the x axis to simplify it. The bin “<=65” also implies “>60”, with 60 being the value of the prior bin. Therefore, values in that bin are both less than or equal to 65 AND ALSO greater than 60. This was done purely for visual design purposes so the x-axis wouldn’t be so cluttered.
The Problem With Building Histograms
Building a histogram is often a clunky process. Within Excel, you can create a histogram in one of three ways:
- Using Nested IF Statements
- Using the MATCH Formula’s Approximate Match Option
- Using VLOOKUP’s Range Lookup Feature
But there is one key problem with the process:
When you first build a histogram, you often need to take a guess at what your bin sizes should be and continue re-guessing until you get it right
Without actually seeing a distribution of your data first (which is the whole point of creating a histogram), there’s a very good chance that your initial guess at the ideal bin size will be off. Therefore, the process that entails is usually one where you setup your formulas based on one bin size, look at your output, decide that you need a different bin size, and redo your formulas with different bins until you get it right.
Visual Considerations for Histograms
It’s important to note that changing the bin sizes of a histogram can have a significant impact on the visual output of your histogram. Below is an example of two histograms using different bin sizes, both referencing the exact same data set. In most situations, the impact is purely visual – meaning one chart simply looks better than the other. However, in some cases, it’s possible that using the wrong bin size can lead you towards making an error or incorrect decision. Below is an example of the original histogram we created compared to a histogram of the same data set using a bin size that is twice as big.
Additionally, the maximum number of bins, or the bin cutoff, can also greatly impact your visual output. Below is another example of two histograms referencing the same data, but with different bin cutoffs. Below is an example of a histogram of our data set being cutoff at 85 rather than 105.
The rusinfomos.ru Histogram Builder with Adjustable Bin Sizes Template
The attached template is designed for two specific purposes:
- Speed up the process of building histogram charts
- Solve the bin size issue by making them easier to adjust
Using the Template
Working with this template is a fairly simple process. The scope is fairly small in that, all it does is takes your data, puts it into a histogram, and lets you adjust parameters to make your chart look better. The specific steps are below:
Step 1: Load your Data
I put this input at the bottom because there are 1,000 initial inputs you can load. If you need to load more than that, simply insert rows between the yellow input cells to accommodate the additional data points you need. As long as you use insert, and don’t just append the values to the bottom, the formulas in this template will pick up your inputs.
Step 2: Set Your Initial Histogram Parameters
The template requires you to provide three inputs:
- Lowest Bin Value – what your very first bin value will be
- Bin Size – the range / size of each bin
- # of Bins (Max 20) – the total number of bins you’ll have in the x-axis; this also determines your bin cutoff
I’ve also added an input for Chart Name which allows you to, not surprisingly, change the name of the chart.
Step 3: Adjust your parameters
Once you’ve input your parameters, your histogram chart will automatically appear on the right. You can immediately adjust those same parameters to make the chart look better.
Below is an example with a smaller bin size and more bins:
And below is an example with the upper cutoff at 95 rather than 105:
Template Settings & Features
- Bin Notation – The notation for the x-axis labels of a histogram can vary a lot between different users. If you want to adjust these, simply change formulas in the “Label” field within the histogram calculation table.
- Bin Assignment Calculation – the method I used for the bin assignment calculation was the MATCH formula with the approximate match option selected. You can find a tutorial of it here.
- Dynamic Chart Reference – you’ll notice that the chart on the right has a dynamic reference, meaning it’s reference automatically increases and decreases based on the number of data values you have in your table. The dynamic reference is based off of named ranges that use the OFFSET function. They should be immune from causing errors if you insert or delete columns. Please note that if you copy over the histogram builder to a new sheet, the chart references will still be referencing the original sheet.
- Chart Formatting – I’ve done some light formatting on the graph for the histogram, but at the end of the day, it’s impossible to make formatting ideal for any type of data set. Therefore, you’ll still have to finalize this piece before you finish your histogram. (But I did solve bin sizes for you)
- 20 Bin Maximum – I’ve set this somewhat arbitrarily, but generally your histogram starts to look too cluttered if you use more than 20 bins. Therefore, I don’t recommend going above that threshold. If necessary, you can adjust this simply by inserting rows into the calculation table, re-numbering your bins appropriately, and copying all of the formulas down.