Bulk Data - Example Credit Card EU
Step-by-Step Guide: Accessing Bulk Data via Carbon Arc SDK
This tutorial walks through authenticating, retrieving, purchasing, and downloading bulk data using the Carbon Arc Python SDK.
Prerequisites
Ensure you have:
- Python 3.10+
Installed the following:
pip install --upgrade carbonarc
1. Authenticate and Initialize the Client
Create a .env file containing your API token:
API_AUTH_TOKEN=<your API token> from https://platform.carbonarc.co/profile>
Then, initialize the client:
import os
from datetime import datetime
from dotenv import load_dotenv
from carbonarc import CarbonArcClient
load_dotenv()
API_AUTH_TOKEN = os.getenv("API_AUTH_TOKEN")
ca = CarbonArcClient(API_AUTH_TOKEN)
2. List Available Datasets
Use the client to fetch all datasets:
datasets = ca.data.get_datasets()
datasets
This returns a list of dataset metadata including IDs and descriptions.
3. Retrieve Dataset Metadata
Select a dataset and fetch details:
dataset = ca.data.get_dataset_information(dataset_id="CA0042")
dataset
This is the Dataset ID for Credit Card EU data.
This confirms the schema, pricing, and availability.
4. Fetch the Manifest (Created or Updated Files)
Use a timestamp to pull recently created or updated files:
from datetime import datetime
last_ingest_time = datetime.now().strftime('%Y-%m-%dT%H:%M:%S')
print(last_ingest_time)
manifest = ca.data.get_data_manifest(dataset_id="CA0042")
manifest
5. Extract File URLs
From the manifest response, extract the list of downloadable URLs:
file_urls = [x['url'] for x in manifest['datasources']]
file_urls
6. Purchase Files
Pass the file URLs and dataset ID into the buy_data method:
order = ca.data.buy_data(dataset_id="CA0042", file_urls=file_urls)
print(order.keys())
This will return an object containing authorized file URLs.
7. Download the File
Download a file to your local directory:
ca.data.download_file(file_id=order['file_urls'][0], directory="./")
8. Confirm File Download
(Optional) Check recently saved files in your local directory.