Google Data Analytics Case Study

Hi, Happy holidays. Its been months since my last post, because work.. and life.. still, over one hundred new visitors monthly on this sparse blog.. :)Thank you all..

This post is different. It’s on a case study for the Google Data Analytics Certificate I just completed. It’ll be a long write up detailing the analysis process for the case study.
The scenario is: I’m a junior data analyst for Cyclistic, a fictional Bike share company in Chicago. I’ll share the slides presentation, then walk through the data analysis process.

The tools used in this study are:
– Microsoft Excel
– Rstudio
– GitHub
– Tableau
– Adobe Reader
– Google Slides

The analysis process is categorized into 6 phases.
1. Ask
2. Prepare
3. Process
4. Analyze
5. Share
6. Act

  1. Ask
    In the Ask phase, we do two things.
    a). We define the problem to be solved (or the business case)
    The problem to be solved is usually given by your team leader or manager. For this project, it is provided by the case study assignment.
    Show how Annual Members and Casual Riders use Cylistic bikes differently.
    b). We make sure we fully understand the stakeholders expectations.
    Our stakeholders here are:
    – Lily Moreno (Team Leader/Manager)
    – The marketing team
    – Cyclistic Executives.
    They want to create marketing strategies to convert casual riders to annual members for the company to earn more money. To do this, the stakeholders need to understand how casual riders use the company product differently from annual members. To help them understand, our task is to show the differences.

  2. Prepare
    In Prepare phase, we’ll verifiy our data source integrity, credibility and reliability. Then collect and store the data for use in our analysis.
    Our data source here is provided by the case study assignment. Link with access to 12 csv files hosted on AWS for each month from Nov 2020 to Oct 2021. I downloaded the files for study in Excel. I think the best way to get to know your data is in Excel. All 12 files containing data on all rides for each month have the same structure of 13 columns.
    1. ride i_d
    2. bike_type
    3. date/time_of _ride_start
    4. date/time_of_ride_end
    5. start_station_name
    6. start_station_id
    7. end_station_name
    8. end_station_id
    9. start_station_latitude
    10. start_station_logitude
    11. end_station_latittude
    12. end_station_logitude
    13. customer_type

    After getting to know the data, I did some quick data cleaning on all 12 files, by
    1. Editing a few column names.
    2. Adding one more column on all 12 files. “day_of_the_week” to show the day of the week the customer rented a bike. The Excel formula for this is : TEXT((reference cell), “dddd”)
    3. Save editied files as an excel workbook (.xlsx).

  3. Process
    Here, we do more cleaning. Combining datasets and transforming it to make our information more useful for analysis. This includes removing inconsistencies, outliers, typos etc that could skew our analysis.
    For this phase, I chose R. SQL may probably be more practical here, but I like R a lot more, I think its very elegant. Its my preferred programming language.
    The reason we’re moving to R from Excel mainly is because we need to combine all 12 files into one and they won’t fit in Excel. Excel has a maximum capacity of about 1 million rows. These 12 files combined is around 5.4 million rows.

    In R-studio, we
    1. Combined all 12 files into one.
    2. Calculated the average length of each ride. “avg_ride_length” column added.
    3. Further cleaning by deleting inconsistencies such as negative ride lengths and “TEST” rides. Around 1,500 rows was removed, approximately 0.03% of our data.
    4. Export the clean data as a .csv file for further analysis and visualization in Tableau.
    5. Save the R-script for future reference.
    For libraries and functions I used in R, you can view my R-script on GitHub.

  4. Analyze
    In the Analyze phase, basically we try to make sense of our processed data. We transform and organize it into useful information. So that we can draw conclusions, make predictions and drive informed decision making.
    Here, I chose Tableau. Although we can do analysis and create charts in R with ggplot2, it’s amazing what you can do in R.
    Still, I don’t think writing lines of code is the best way to explore data for insights. In R, you have to already know what your output will look like, then you write the code to generate it.
    With Tableau, you don’t really know. You explore, play with, filter, mix, match and generate visuals from the data to make complex concepts and numbers easier to see and understand.

    Using Tableau Desktop, I connected to the csv file downloaded from Rstudio, then started exploring to see what we can come up with. This is the most fun part really. From our data, we were able to extract nine unique insights shown in charts. Once satisfied with our exploration and insights gotten, we then edit our visuals to make it clear, simple and to best show what the data is saying.

  5. Share
    In the share phase, we’ll interpret the results of our analysis and share our conclusion and recommendations to help stakeholders make data-driven decisions.

    We’re done editing our charts in Tableau Desktop.
    Next, we save our work to Tableau Public online. Tableau doesn’t let you save your work locally without first uploading it to your account online.
    This done, we then download it in PDF. PNG files from Tableau public is not good. Best option is PDF. Save the pdf files and open it up in Adobe reader and then export as a jpeg file. Then upload the jpegs in creating our Google Slides in Chrome.

  6. Act
    This is the final phase in Data Analysis. Here, we take all of the insights we’ve got from our data and put them to work to solve the original business case: To show how Annual Members and Casual Riders use Cyclistic bikes differently so that our stakeholders can create marketing strategies to convert casual riders to annual members for the company to earn more money. Here’s a good time to go through the Google slides again and you can see that we have successfully accomplished our task.

    Act is what we’re doing right now. Me writing this and you reading.. if it was for real, there will be a presentation of the slides and stakeholders nodding heads in comprehension like in the movies saying stuff like “excellent” then going back to the “drawing board” and creating marketing strategies/allocating budget based on the insights and recommendations from the data analysis. harharhar..

    Alright! This brings us to the end of this post. Not really sure why I’m smiling r/n but this again was fun. Thank you for being here. Your curiosity and our shared interest brought you. Its the end of the year. Happy Holidays!!

Mapping Covid-19 by Cities and Provinces

Creating this viz was fun. Had to look up all kinds of places like Sultan Kudarat, Puerto Princesa city, Sulu, Eastern Samar..

The Data
Covid-19 stats: from DOH
Population by Cities and Provinces: PSA
Coordinates: Google maps

*Key: Blue dots on the map are places with less than 1500 covid cases. Red dots are 1500 cases and beyond.

The map isn’t live. Will update every month until this is all over.

Stay safe. Happy Easter!!

When will Covid-19 end?

Girl meeting her friends at a city bar for drinks at night

It’ll be nice to have a future date or timeframe set when everything will be back to normal. But, there isn’t.
To try and answer the question, lets look at some numbers.

Philippines Total Population 2021: 109,856,591 (April 04, 2021. CPD)
Total Number of people infected: 795,051 (April 03, 2021. DOH)
Percentage of Total Population infected: 0.72% (less than 1%)

Total number confirmed of Vaccines acquired: 2,525,600
Total number of people vaccinated: 737,569 (Source)
Percentage of Total population vaccinated: 0.67% (less than 1%)

Based on the premise that the vaccines are 100% effective against current Covid strain and new variants, for covid-19 to end, we need herd immunity.

Herd immunity: is when a large part of the population (around 67%) is immune or resistant to the virus, so the virus has no where left to go or be transmitted and eventually the virus dies out.
This is the primary target of all vaccination programs.

To reach Herd immunity in Philippines, 67% of the population: around 74.4 million people need to be vaccinated.
This is when Covid-19 will end.
At the moment, we are at less than 1% vaccinated.

Going by the report in the Manila Bulletin, the government is on track to acquire 170 million vaccine doses, with target to vaccinate 70 million people this year (Source). How the government will handle deployment and coordination to vaccinate 70 million people remains to be seen. At least, we are on our way. 737,569 confirmed vaccinated is a lot more than zero. Its a start.


Artwork credit: https://mixkit.co/@casandrabanuelos/

How Philippines contributes to the future of global sustainable energy.

There was an article today in the Economist about how lithium battery costs have fallen by 98% in three decades. Its good. It made me do some research and write this post.
Batteries play an important role in the global shift towards clean sustainable energy and reducing carbon emissions. Lithium-ion (Li-ion) rechargeable batteries are the most common. They are used in our phones, laptops and predominantly in EV’s since Tesla started the movement in 2008. Li-ion batteries are also used in Energy Storage Power Stations to harvest large amount of electricity generated by renewable energy sources such as Wind and Solar.
Increasing demand for Li-ion batteries has led to developments and production of more powerful, more efficient and less costly batteries.

An important component of almost all Li-ion batteries produced today is Cobalt.

Cobalt Ore

Cobalt is a scarce lustrous mineral. It plays a huge role in stabilizing Li-ion batteries and increasing their energy density. It is rare, expensive and found only in a few places on earth. Philippines happens to be one of the few.

DR Congo by far is the largest Cobalt producing country, accounting for over 60%. The Philippines is the fifth largest with 3.8% of global production. What is even more significant is having an essential energy commodity at home as the world looks to a future of clean renewable energy, moving away from dependence on coal and oil.

Photo Credit : Wikipedia Commons
Data Source: BP Statistical Review of World Energy 2020

Download PDF document
01. World Cobalt Mines Production (Thousand Tonnes) 2015 – 2019
02. Share of World Total Cobalt Production 2019

Market Capitalization vs Total Assets: San Miguel/ Tesla examples

This article is continued from our previous post: Largest companies in Philippines by market capitalization.


If you look up San Miguel Corporation (SMC) on Wikipedia, you’ll see that SMC Total Assets is stated as P1.8 trillion in 2020. Converted to USD, P1.8t/50 = $36 billion. Note from the previous chart that SMC estimated market cap value is $5.7B.
Does this mean SMC stock is undervalued by over $30B? Short answer: No.
Lets have a look at it.
Total Assets is all items of values and resources a company owns. Property, buildings, factories, equipment, inventory, cash, patent rights and so on are the assets. The assets generate income for the company. SMC values its total assets at $36B.
As SMC is listed on the stock exchange, the “Market” (financial institutions, investors, accountants, bankers, the general public) has looked at it, its total assets, future expected incomes, its liabilities, operation costs.. and having considered everything, right now at this moment will pay $5.7B to buy the company. And they do. Thousands of company shares are bought and sold daily on the stock exchange at the estimated market value of $5.7B and not $36B.

Tesla
Quick google search shows Tesla has a market capitalization of around $580 billion, and total assets of $52 billion. Does it mean the company is overvalued by over $520B? There is no short answer here. Tesla’s different.
The point here is the Market, having considered everything, although Tesla’s assets (everything of value the company owns) is just $52B, the market is willing to pay $580B to buy the company. And they do. Millions TSLA shares are traded daily on the stock market at price value of over $580B and not 52.

TLDR
Total assets is all the stuff a company owns.
Market capitalization is how much the market will pay to buy the company.

Photo credit: NICE GUYS

PSE index – Ranking Largest Companies in Philippines by Market Capitalization.

The Data
Source: pse.com.ph (data saved on 21 March 2021)

Brief description
Market Capitalization is the market value of a company.
It is calculated by multiplying the stock price by the total number of outstanding shares.

The Market Cap value for all companies listed on the Philippines Stock Exchange(PSE) is available on the official website. We’ll focus on the top 30 companies that make up the PSE index.

Calculations
– All figures have been converted from Pesos to USD. ($20 billion is an easier view compared to P1 trillion).
Conversion rate is set at $1 = P50.
The actual rate of USD/PHP is around $1 = 48.50(1.5 pesos or $0.02 difference). It’s fine. Using P50/$1 undervalues all estimated figures by approximately 3%. It gives a fair leeway to account for the fluctuations in stock prices and rounding up of the numbers to one decimal place in billions.

For the most recent updated stock prices and market cap values in pesos, visit pse.com.ph. It is updated every 15 mins.

The charts:

Summary
We all know SM, BDO, BPI are some of the largest companies in the country. It’s interesting to look at the actual figures in a chart and compare the top 30.

You can download pdf copies of the charts here:
01. Top 30 largest companies in Philippines by market capitalization. Table
02. Top 30 largest companies in Philippines by market capitalization. Bar chart

Global warming by country – tree map or world map

Hi, welcome to another blog post.
For this post on global warming, we’ll be using the same data on two different charts: the world map chart and the tree map. And we’ll see how it looks.

Brief description.
You already know what global warming is.
Earth’s temperature getting hotter (since at least 1980 when we all started paying attention). Ice caps melting, oceans rising, forests cleared, rivers and lakes drying, Australia fires . .

Causes
Global warming is caused by humans activity that leads to emission of Green House Gases(GHG). These gases trap heat in the atmosphere causing earth’s temperature to warm. 80% of GHG is Carbon dioxide (CO2). CO2 is released from heat and smoke of factories, cars and electric power plants, all burning gas or coal. Methane released from waste of industrial livestock farming (beef, chicken..) and waste (our thrash) in landfills makes up 10% of GHG. The remaining 10% is made up of Nitrous oxide and other harmful gasses.

Data Source
climatewatchdata.org.
Measurement Unit : Million metric tons of carbon dioxide. (MtCO2e)
All GHG emissions are expressed in CO2 (or carbon) equivalent.

The figures shows the amount of GHG emitted in a country during a particular year.
I have used the most recent of the historical data accessible, 2018.

Summary

From the world map chart, we easily see countries with less GHG emissions in green. Most of Northern Europe and Africa. They are of lower income/smaller economies and have less factories and cars producing GHG.
The tree map on the other hand better shows countries with the highest GHG emissions in red. Larger economies with more factories and higher GDP.

To show the actual emissions figures on all 193 countries, I embedded the vizzes directly from Tableau public. All we have to do is to hover over an area on the map and the information is highlighted.

On Climate change.. watch Home by Yann Arthus-Bertrand . .

How much is the Philippines?

Hi, for our first data visuals post, we’ll be doing a brief overview of the economy.
For context, we’ll compare it with similar South East Asia nations and some of the worlds largest economies.

The data
source : data.worldbank.org
The indicators we’ll be using are:

  • GDP
  • GDP growth rate
  • Total Population
  • Population growth rate
  • Per capita income

    Brief descriptions
  • GDP: The Annual GDP (Gross Domestic Product) is the total money value of all goods and services produced within a country in a year. It’s what the economy is worth.
  • Total Population: is the total number of everyone living in the country.
  • Per capita Income: is GDP divided by the total population. Basically, it shows the overall annual average income of everyone living in the country.

In 2020, the Philippines economy was estimated to be worth approximately $367 billion, down from $377 billion in 2019.
From 2005 to 2015, the economy grew by around $100 billion every 5 years.

As the global pandemic hits hard in 2020, the economy fell by -9.5%. In the last 30 years since 1990, Philippines has had a negative growth rate in only 3 years. 2020, 1998 and 1991.

Total population has grown from 62 million people in 1990 to 110 million people in 2020. But, there’s been a gradual fall in the growth rate. In 1990, the population grew at a rate of 2.5% . In 2020, by 1.4%. Note how the population grows by 8 million people every 5 years.

In 2020, per capita income was $3600. Converting it to PHP with USD set at 1USD = 50PHP. 3600 x 50 = 180,000 PHP.
The estimated overall average income of everyone in Philippines in 2020, was 180,000PHP.
Divided by 12 months equals 15,000PHP per month.

Of the 5 countries compared in the chart, Malaysia’s GDP was the lowest. But in terms of per capita income, it’s the highest.

The U.S is the world’s largest economy with GDP of over $20 trillion.

China the world’s second largest economy, has a lower average per capita income than Malaysia, Japan, Canada.

Summary.

The charts were created using data from world bank. We can see 15 to 30 years trend on how much Philippines economy is worth, the total number of people and how much is earned on average per year. And we compared the economy with similar south east Asia countries and some of the world’s largest economies.

Download the PDF doc

01. Philippines GDP
02. Annual growth rate of GDP
03. Total population and population growth rate.
04. Per capita income since 1990
05. Comparing Philippines GDP with similar South East Asia countries.
06. Comparing Philippines GDP with similar SE Asia countries and world.
07. Philippines Per capita income with similar South East Asia countries.
08. Philippines Per capita income with similar SE Asia countries and world.

Hello World!

Hi, my name is Fola. welcome to our first blog post.

The World

“hello world!” is what a lot of us first wrote in html code when creating our first webpage.

About us:
This website is created to share data vizzes, analysis and insights. I hope you’ll find some information here useful or at least interesting.

About me: I’ve only recently gotten into creating data visuals and analytics. I spent a few months learning Python, SQL and doing some basic data analysis in Jupyterlab using libraries like Matplotlib and Seaborn before stumbling on Tableau.
At the time writing this, its March 19, 2021. I’d say its been two months since I started using Tableau. I’ve always liked graphs, infographics and read posts from Statista and CB insights for years. Since we’ve all been spending more time alone during this pandemic, we get into new stuff. Mine is creating visuals from data.. and Star Wars.. (had passed on it until the Mandalorian).

Contact:
For enquiries, ideas and suggestions on topics we should explore or just general data stuff, feel free to send an email at “[email protected]”.

Subscribe to our mailing list so we can send newsletters to your mailbox in the future. I like reading newsletters. There’s some really good ones like “While You Were Working” and “Quartz” . but creating one.. weekly/monthly.. should be fun.

At the bottom of each blog post, there’s a link for you to download a pdf copy of the viz.

Thank you. Happy reading.

Fola