Bulk Data - Example Credit Card EU

Step-by-Step Guide: Accessing Bulk Data via Carbon Arc SDK

This tutorial walks through authenticating, retrieving, purchasing, and downloading bulk data using the Carbon Arc Python SDK.

Prerequisites

Ensure you have:

Python 3.10+

Installed the following:

  pip install --upgrade carbonarc

1. Authenticate and Initialize the Client

Create a .env file containing your API token:

API_AUTH_TOKEN=<your API token> from https://platform.carbonarc.co/profile>

Then, initialize the client:

import os
from datetime import datetime
from dotenv import load_dotenv
from carbonarc import CarbonArcClient

load_dotenv()
API_AUTH_TOKEN = os.getenv("API_AUTH_TOKEN")
ca = CarbonArcClient(API_AUTH_TOKEN)

2. List Available Datasets

Use the client to fetch all datasets:

datasets = ca.data.get_datasets()
datasets

This returns a list of dataset metadata including IDs and descriptions.

3. Retrieve Dataset Metadata

Select a dataset and fetch details:

dataset = ca.data.get_dataset_information(dataset_id="CA0042")
dataset

This is the Dataset ID for Credit Card EU data.

This confirms the schema, pricing, and availability.

4. Fetch the Manifest (Created or Updated Files)

Use a timestamp to pull recently created or updated files:

from datetime import datetime

last_ingest_time = datetime.now().strftime('%Y-%m-%dT%H:%M:%S')
print(last_ingest_time)

manifest = ca.data.get_data_manifest(dataset_id="CA0042")
manifest

5. Extract File URLs

From the manifest response, extract the list of downloadable URLs:

file_urls = [x['url'] for x in manifest['datasources']]
file_urls

6. Purchase Files

Pass the file URLs and dataset ID into the buy_data method:

order = ca.data.buy_data(dataset_id="CA0042", file_urls=file_urls)
print(order.keys())

This will return an object containing authorized file URLs.

7. Download the File

Download a file to your local directory:

ca.data.download_file(file_id=order['file_urls'][0], directory="./")

8. Confirm File Download

(Optional) Check recently saved files in your local directory.

Step-by-Step Guide: Accessing Bulk Data via Carbon Arc SDK​

Prerequisites​

1. Authenticate and Initialize the Client​

2. List Available Datasets​

3. Retrieve Dataset Metadata​

4. Fetch the Manifest (Created or Updated Files)​

5. Extract File URLs​

6. Purchase Files​

7. Download the File​

8. Confirm File Download​

For more resources, visit:​