Historical API

Sometimes it is preferable to retrieve all history or a daily update feed instead of directly querying the dataset. This is often the case for clients who use the API strictly to download a daily feed.

Dataset History List Endpoint

To retrieve a list of available history for a dataset:

GET /datasets/<DATASET_ID>/history/ HTTP/1.1
Authorization: token 01234567890123456789
X-API-Version: 20151130
Accept: application/json
curl -L "https://data.thinknum.com/datasets/<DATASET_ID>/history/" \
     -H 'Accept: application/json' \
     -H 'Authorization: token 01234567890123456789' \
     -H 'X-API-Version: 20151130'

For example, for "Traction" dataset you would get a response similar to the following, indicating that there are 4 days of daily history available for download and 2 months of monthly history:

{
  "total": 5,
  "id": "traction",
  "history": [
    "2019-01-04",
    "2019-01-03",
    "2019-01-02",
    "2019-01-01",
    "2018-12",
    "2018-11"
  ]
}

At the beginning of every month, the last month's daily history is combined into a single monthly file.

Dataset History Download Endpoint

Once you have identified the dataset and historical day/month you're interested in, you can view the metadata for the historical file:

GET /datasets/<DATASET_ID>/history/<HISTORY_DATE> HTTP/1.1
Authorization: token 01234567890123456789
X-API-Version: 20151130
Accept: application/json
curl -L "https://data.thinknum.com/datasets/<DATASET_ID>/history/<HISTORY_DATE>" \
     -H 'Accept: application/json' \
     -H 'Authorization: token 8930af13ac0bd6506a792ccabbf46d80' \
     -H 'X-API-Version: 20151130'

For example, the metadata for the "2018-12" historical file for the "Traction" dataset would have a response similar to the following, indicating that there are 3,464,374 rows in the historical file:

{
  "date_updated": "2018-12",
  "status": 200,
  "total": 3464374,
  "id": "traction"
}

To download a CSV of the historical data, simply change the "Accept" parameter to "text/csv":

GET /datasets/<DATASET_ID>/history/<HISTORY_DATE> HTTP/1.1
Authorization: token 01234567890123456789
X-API-Version: 20151130
Accept: text/csv
curl -L "https://data.thinknum.com/datasets/<DATASET_ID>/history/<HISTORY_DATE>" \
     -H 'Accept: text/csv' \
     -H 'Authorization: token 01234567890123456789' \
     -H 'X-API-Version: 20151130' \
     -o '<HISTORY_DATE>.csv'

Gzip compression is also available to speed up the download and consume less bandwidth. It can be enabled through the standard HTTP protocol "Accept-Encoding" header:

GET /datasets/<DATASET_ID>/history/<HISTORY_DATE> HTTP/1.1
Authorization: token 01234567890123456789
X-API-Version: 20151130
Accept: text/csv
Accept-Encoding: gzip
curl -L "https://data.thinknum.com/datasets/<DATASET_ID>/history/<HISTORY_DATE>" \
     -H 'Accept: text/csv' \
     -H 'Accept-Encoding: gzip' \
     -H 'Authorization: token 01234567890123456789' \
     -H 'X-API-Version: 20151130' \
     -o '<HISTORY_DATE>.csv' \
     --compressed

❗️

If using cURL to download historical files, you must be using version 7.58.0 or above due to a bug listed under CVE-2018-1000007

History CSV Data Format

The data for historical updates is provided in standard CSV format with header.

Each record contains a unique identifier column, allowing you to sync your datastore with any additions or updates in the Thinknum dataset.

How to parse History CSV using Python

Once you download history CSV file, then you can parse it by using Python.

To parse file by using "csv" module which is Python default module.

import csv
headers = []
rows = []
with open('/mnt/.../job-listings.csv', 'r') as handle:
    reader = csv.reader(
        handle,
        delimiter=',',
        quotechar='"',
        escapechar='\\',
    )
    headers = next(reader)
    for row in reader:
        rows.append(row)

To parse file by utilizing "pandas". You need to install "pandas" library additionally.

import pandas as pd
df = pd.read_csv(
    '/mnt/.../job-listings.csv',
    delimiter=',',
    quotechar='"',
    escapechar='\\'
)