Find and Know Your Data
by Jack Dougherty, last updated March 1, 2017
Searching for Open Data
Increasing numbers of governmental agencies and non-profit organizations are publicly sharing open data on the web. When starting a new data visualization project, ask yourself these questions:
- Do I have the most relevant data for my project?
- Is it the most current data, in the most user-friendly format?
- Is data available at the individual level, or aggregated into larger groups?
- Which organizations might have collected data for my topic?
- Which open data repositories might have published this data?
What features do open repositories offer?
- View and export: At minimum, most open data repositories allow users to view their data and export it into common spreadsheet formats. Some also provide geographical boundaries for polygon maps.
- Built-in visualization tools: Some repositories offer built-in tools for users to create interactive charts or maps on the platform site. Some also provide code snippets for users to embed these built-in visualizations into their own websites.
- Static and Live data: Most repositories offer static datasets for a specific time period, but some also provide “live” data that is continuously updated.
- Application Programming Interface (APIs): Some repositories provide endpoints with code instructions that allow users to pull data directly from the platform into an external sites or online visualization, which is ideal for continuously updated data.
Know Your Data
Before starting to create charts or maps, get to know your data.
- Where did it come from?
- Who compiled the data, and for what purpose?
- What do the data labels really mean?
- Ask yourself: Am I working with the most recent version, in the best available format?
TO DO add resource https://github.com/Quartz/bad-data-guide
open data inception 1600+ sites portal http://opendatainception.io/
- Know your data: go out into the field to directly observe how the original data is measured and collected
Closely examine your data files to understand their meaning, sources of origin, and limitations. TO DO expand on this theme with examples of bad and misleading data
- Always ask: Am I using the best available data?
- Compare the HFS list to the City of Hartford’s current list of food establishments: https://data.hartford.gov/browse
- go to Public Health Category
- click on the “dataset” version (updated 10 Feb 2016), which is same data but different view than the “map” version
- click on light blue “export” button into any format you wish to compare with the HFS list (see screenshot)
- decide which list is best for your organization’s goal