There is a treasure trove of insights out there just waiting to be discovered. These nuggets of knowledge are scattered across Open Data portals and buried in files. After digging them up, they can be processed, polished and displayed.

Learn data science skills and techniques
to understand how insights are discovered
While the approach to each project will vary case by case, the data science process will typically involve most (if not all) of the following stages:

COLLECT
Raw data can come in a variety of tabular, layered and geographical formats. It can be collected from a wide range of sources including Open Data portals, webpages and live streams.
CLEAN
Like the real world, real data can be messy. For example, it can contain gaps, typos and duplication. Cleaning produces a dataset that is consistent, complete and representative.
EXPLORE
All datasets are different. By gaining familiarity with each one, relevant variables can be identified. Relationships and trends can be flagged for further investigation or candidates for modelling.
PREPARE
The data must be in the right format for the next stages. Achieving this may involve selecting and combining relevant variables, filtering values and ordering them. Datasets can also be joined for a broader analysis.
COMPUTE
Algorithms can process the refined data, quantifying trends and patterns to generate insights or make predictions. Cloud computing offers the scalability to churn through virtually limitless data.