A Comprehensive Health Data Collection

Easy access to nearly every health dataset for the US and California
Civic Knowledge collects public dataset, cleans and links them, and loads them into databases and file repositories so users can begin analysis immediately. The data can be used directly by websites, in popular statistical software or in Excel, eliminating the most expensive parts of most data projects: finding and preparing data for use.

In early 2015, we will release a database that contains nearly every significant dataset related to health for California and the US.

Features

Broad Coverage

550 measures in 146 topics from 90 government, nonprofit and university sources, with ten sources from the State of California. See this spreadsheet for a list of the datasets and their sources.

Full text search

Easily search through every table, column, column description, data dictionary and related document to identify required datasets.

Flexible access

Download data in the most popular, useful formats CSV, SAS or SPSS files, or connect directly to a SQL or NoSQL database. There’s an access method to fit into any workflow.

Tailored pricing

Several pricing and access options are tailored for grant-funded research, long-term research, or indicator websites. The search system can be used for free to find public source datasets, and many licensed datasets are available for free.

Fully Documented

A comprehensive documentation system links to original source websites, data dictionaries, and complete schemas for all tables and columns. Every document associated with every dataset is available in one spot on the web.

Thoroughly Linked

Every dataset has an associated crosswalk to link it to anything it can be linked to, including census geographies, local area boundaries and associated datasets. Where possible, additional auxiliary datasets provide pre-computed population density, rates and ratios.

Users can choose from several access methods to suit their needs:

  • Files: Download tabular data as files in CSV, SAS, SPSS and STATA format. Download geographic data as Shapefiles, KML or GeoJSON.
  • SQL: Direct access to a SQL database.
  • NoSQL: Direct access to a MongoDB NoSQL database.
  • Web: Fetch results of a query as JSON with a web request.

The database system includes a full text search engine that allows users to quickly locate datasets based on metadata, data dictionaries, documentation and associated web pages. Users can add datasets to collections, export the collections to databases, files or Web APIs, and share collections with others. The data in the system includes the original data, along with common computed values for population rates, ratios, aggregations to larger areas, estimations of counts for smaller areas and links to the census and other associated datasets.

Civic Knowledge expects to begin releasing beta versions of the system for early access use and testing in February 2015. If you are interested in using the database, contact Eric Busboom at eric@civicknowledge.com for more information or to be added to the beta list.