Introduction to Web Content
Carbon Arc’s Web Content offering grants users access to a set of open-source data feeds available for purchase and analysis through the Carbon Arc Hub. These feeds capture a broad range of publicly available online data, including content from company websites, blogs, product listings, and more.
What’s Included
Web Content feeds typically include structured, time-stamped data scraped from public web sources. Depending on the feed, this may include:
- Product descriptions and metadata
- SaaS ecosystem metadata
- Ecommerce product information and SKUs
- Real estate listings
- Automotive auction site metadata and listings
How to Access Web Content Feeds
-
Browse Feeds in the Hub:
Navigate to the Web Content section of the Hub Carbon Arc Hub to explore the available feeds and associated metadata. -
Purchase a Feed:
Each feed can be purchased using tokens directly in the Hub UI. Users can review historical coverage, pricing, and file manifest before purchasing. -
Programmatic Access (Optional):
Developers and data scientists can use the Carbon Arc Python SDK to query, download, or stream web content feeds programmatically.
Need Help?
For pricing details, purchase assistance, or help selecting the right feed, contact our team at sales@carbonarc.co.
Getting Started with Web Content
This guide walks you through installing the Carbon Arc Python SDK, authenticating, and pulling web content data using the HubAPIClient.
Step 1: Install the Carbon Arc SDK
pip install carbonarc
Requires Python 3.8+
Step 2: Authenticate with an API Token
You’ll need your Carbon Arc API token. If you don’t have one, this can be pulled by navigating to the User Portal and clicking the Developers tab.
from carbonarc.hub import HubAPIClient
import os
host = "https://api.carbonarc.co"
TOKEN = os.getenv("TOKEN")
client = HubAPIClient(token=TOKEN, host=host)
Step 3: List All Available Web Content Feeds
Print a list of your subscribed web content feeds
feeds = client.get_subscribed_feeds()
print("Subscribed Feeds:")
for feed in feeds:
print(f"- {feed['webcontent_name']} (id: {feed['webcontent_id']})")
Or check all availble feed names and IDs
feeds = client.get_webcontent_feeds()
# Print available feed names and IDs
for feed in feeds["feeds"]:
print(f"{feed['id']}: {feed['name']}")
Step 4: Pull a Web Content Manifest
Print the manifest that shows all available files for a given feed.
manifest = client.get_webcontent_manifest(webcontent_id=6) #6 is the Web Content feed for ZIllow used for illustration
print(files)
Step 5: ✅ Step 5: Download Files
# Download a specific file
client.download_webcontent_file("insert file", directory="./data")
# Or download all files in a feed
for file in manifest["files"]:
client.download_webcontent_file(file["file_name"], directory="insert file")
Available Web Content Function Reference
Here are the available functions available in the latest version.
Function | Description |
---|---|
get_webcontent_feeds() | Retrieve all available web content feeds (regardless of subscription status). |
get_subscribed_feeds() | Retrieve only the feeds you've subscribed to. |
get_webcontent_manifest(webcontent_id, webcontent_date=None) | Retrieve the list of files (manifest) for a given feed ID, optionally filtered by date. |
get_webcontent_dataframe(webcontent_name) | Return a tabular DataFrame version of a feed by name (if supported). |
get_webcontent_file(file_name) | Retrieve raw JSON contents of a file from a feed by filename. |
download_webcontent_file(file_name, directory="./") | Download a JSON file from a feed to a local folder. |
Contact us at support@carbonarc.co if you have any questions!