Bulk Data – Getting Started
This guide walks you through authenticating, retrieving, and downloading bulk data files from Carbon Arc using the Python SDK.
Prerequisites
Make sure you have:
- Python 3.10+
- The
carbonarc
SDK andpython-dotenv
installed
Setup your environment:
python3.10 -m venv .venv
source .venv/bin/activate
pip install git+https://github.com/Carbon-Arc/carbonarc
pip install python-dotenv
## Environment Setup
1. Create a `.env` file in your working directory **or** export `API_AUTH_TOKEN` in your environment.
```bash
# .env file
API_AUTH_TOKEN=<your API auth token from https://platform.carbonarc.co/profile>
Example Code
# Import required dependencies
import os
from datetime import datetime
from dotenv import load_dotenv
from carbonarc import CarbonArcClient
load_dotenv()
## Read in environment variables
API_AUTH_TOKEN=os.getenv("API_AUTH_TOKEN")
# Create API Client
client = CarbonArcClient(API_AUTH_TOKEN)
List available datasets
## List datasets
datasets = ca.data.get_datasets()
Select a data identifier (example)
Rerieve information for a given dataset ID.
## Get information for given dataset
dataset = client.data.get_dataset_information(dataset_id="CA0000")
Retrieve files since last creation
## Downloading files created since last ingestions, this needs last ingestion time
last_ingest_time = datetime.now().strftime('%Y-%m-%dT%H:%M:%S')
print(last_ingest_time)
manifest = client.data.get_data_manifest(dataset_id="CA0000", created_since=last_ingest_time)
Manifest file structure
Manifest file structure
{
'url': 'link',
'format': 'parquet',
'records': 1000,
'size_bytes': 123456789,
'modification_time': '2025-04-15T23:04:44',
'price': 123.45,
}
```python
## Find and pick file urls from manifest
file_urls = [x['url'] for x in manifest['datasources']]
## Buy manifest files
order = client.data.buy_data(dataset_id="CA0028", file_urls=file_urls)
Buy the data
Select the manifest file, and buy the data.
## Download file to current directory
client.data.download_file(file_id=order['files'][0], directory="./")
## Download file to current directory
client.data.download_file(file_id=order['files'][0], directory="./")