For my 1st bog post I will cover data visualization in Excel and Google Fusion Tables (Heat maps). As a Data Analyst, the person must be acquainted with many software tools. Each project should start with these steps:
- What question needs to be answered with the project
- What data is available, what data will be needed
- What software tools are best for the project
- What is the best way to present the project
For the Irish Census project the objective the project is to produce the report showing population by county and product the Irish population heat map.
As a data source I had table with the 2011 Census data:
At first glance this table is fine, but only at 1st glance. As a report it is fine, but as data source it needs to be transformed. Just to name few of the data issues;
- First column contains multiple data types, City, County, Province,
- County names are not consistent
- Males and Females are in the same Row
- Total persons is static field
I have chosen Excel to transom the data. The final data table looks like this:
The data is broken down, by County, Province, Gender and QTY of persons.
Second data source is Google maps coordinates for the Irish Counties:
To achieve the goal of the project these two data sources need to be merged. To achieve this Google Fusion tables is one of the best tools to use. To create Heat Map few simple steps need to be taken:
- Upload County boarder data to Google fusion tables
- Upload Excel file with the County population data
- Merge these two (common fields are County names)
- Tweak the report presentation. Add the “buckets” to colour code counties by the population quantity in each county.
Heat Map is one the best way for the customer to consume the map data report. The same report can be presented in the table summary. Excel is in the league on its own when it comes to reporting.
|County||Population||% Share||Province||Population||% of Population|
Other uses of this data
With the current data we could calculate the population density in each County. This data can also be used in conjunction with other data sources, for instance graduates in each county, to get comparison of each county comparing the % of third level graduates. Getting number of schools in county calculate student distribution per school and % of recent graduates attending 3rd level.
Get coordinates of international Airports, to find out how far on average people need to travel to the nearest airport.