Skip to main content

Bulk Data – Getting Started

This guide walks you through authenticating, retrieving, and downloading bulk data files from Carbon Arc using the Python SDK.


Prerequisites

Make sure you have:

  • Python 3.10+
  • The carbonarc SDK and python-dotenv installed

Setup your environment:

python3.10 -m venv .venv  
source .venv/bin/activate
pip install git+https://github.com/Carbon-Arc/carbonarc
pip install python-dotenv

## Environment Setup

1. Create a `.env` file in your working directory **or** export `API_AUTH_TOKEN` in your environment.

```bash
# .env file
API_AUTH_TOKEN=<your API auth token from https://platform.carbonarc.co/profile>

Example Code

# Import required dependencies
import os
from datetime import datetime
from dotenv import load_dotenv
from carbonarc import CarbonArcClient

load_dotenv()

## Read in environment variables
API_AUTH_TOKEN=os.getenv("API_AUTH_TOKEN")

# Create API Client
client = CarbonArcClient(API_AUTH_TOKEN)

List available datasets

## List datasets
datasets = ca.data.get_datasets()

Select a data identifier (example)

Rerieve information for a given dataset ID.

## Get information for given dataset
dataset = client.data.get_dataset_information(dataset_id="CA0000")

Retrieve files since last creation

## Downloading files created since last ingestions, this needs last ingestion time
last_ingest_time = datetime.now().strftime('%Y-%m-%dT%H:%M:%S')
print(last_ingest_time)

manifest = client.data.get_data_manifest(dataset_id="CA0000", created_since=last_ingest_time)

Manifest file structure

Manifest file structure

{
'url': 'link',
'format': 'parquet',
'records': 1000,
'size_bytes': 123456789,
'modification_time': '2025-04-15T23:04:44',
'price': 123.45,
}

```python
## Find and pick file urls from manifest
file_urls = [x['url'] for x in manifest['datasources']]

## Buy manifest files
order = client.data.buy_data(dataset_id="CA0028", file_urls=file_urls)

Buy the data

Select the manifest file, and buy the data.

## Download file to current directory
client.data.download_file(file_id=order['files'][0], directory="./")
## Download file to current directory
client.data.download_file(file_id=order['files'][0], directory="./")