Visualizing Data(Bertelsmann Data Science Challenge)

3 min readJun 15, 2018

There are various ways of visualizing data. Frequency table is one way of visualizing data. It refers to counting the occurrence of an object in a data and representing that in a tabular form. A frequency table also has a column dedicated to relative frequency which is calculated as the occurence of an object in a data relative to the total number of objects. Relative Frequency can be represented in fractions or percentages. Relative Frequencies should therefore add up to either 1 or 100%.

Sample frequency table. source:http://www.softschools.com/math/probability_and_statistics/frequency_table_categorical_data/

In grouping data represented as numbers, the data is grouped together based on how one wants to group them. They could be grouped based on particular intervals which can be also known as bin or bucket.

sample frequency table showing bin size of 10.(category column). source: http://www.softschools.com/math/probability_and_statistics/histogram/

Another way of visualizing data is through using the histogram.The bin size depends on how much detail you want. The bigger the bin size, the less detail, the smaller the bin size the more details you get. Therefore in choosing a histogram as a means of visualizing data, we sacrifice some data.

What are the differences between a histogram and a bar graph?
1. Spaces between histogram are not distinct . The intervals can vary depends on the detail one wants whereas bar graph is distinct.

2. The histogram the x-axis being numbers(eg. age) is represented orderly whereas with the bar graph, the bars/data can be arranged in any order.

3. The shape of the histogram is important whereas the shape of the bar graph is arbitrary.

4. Values on the x-axis for a histogram are numerical or quantitative and the values on the x-axis of a bar graph are categorical or qualitative.

Histogram: A distribution can be symmetrical or skewed. A distribution is said to be symmetrical if the data is uniformly distributed. On the other hand if majority of the the data is at the lower end or left side of the distribution, it is described as being positively skewed whereas if majority of the data is at the right side of the histogram, it is described as negatively skewed.

Visualizing Data(Bertelsmann Data Science Challenge)

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Jida Asare

No responses yet