Skip to main content

Introduction to Web Content

Carbon Arc’s Web Content offering grants users access to a set of open-source data feeds available for purchase and analysis through the Carbon Arc Hub. These feeds capture a broad range of publicly available online data, including content from company websites, blogs, product listings, and more.

What’s Included

Web Content feeds typically include structured, time-stamped data scraped from public web sources. Depending on the feed, this may include:

  • Product descriptions and metadata
  • SaaS ecosystem metadata
  • Ecommerce product information and SKUs
  • Real estate listings
  • Automotive auction site metadata and listings

How to Access Web Content Feeds

  1. Browse Feeds in the Hub:
    Navigate to the Web Content section of the Hub Carbon Arc Hub to explore the available feeds and associated metadata.

  2. Purchase a Feed:
    Each feed can be purchased using tokens directly in the Hub UI. Users can review historical coverage, pricing, and file manifest before purchasing.

  3. Programmatic Access (Optional):
    Developers and data scientists can use the Carbon Arc Python SDK to query, download, or stream web content feeds programmatically.

Need Help?

For pricing details, purchase assistance, or help selecting the right feed, contact our team at sales@carbonarc.co.

Getting Started with Web Content

This guide walks you through installing the Carbon Arc Python SDK, authenticating, and pulling web content data using the HubAPIClient.

Step 1: Install the Carbon Arc SDK

pip install carbonarc

Requires Python 3.8+

Step 2: Authenticate with an API Token

You’ll need your Carbon Arc API token. If you don’t have one, this can be pulled by navigating to the User Portal and clicking the Developers tab.

from carbonarc.hub import HubAPIClient
import os

host = "https://api.carbonarc.co"
TOKEN = os.getenv("TOKEN")
client = HubAPIClient(token=TOKEN, host=host)

Step 3: List All Available Web Content Feeds

Print a list of your subscribed web content feeds


feeds = client.get_subscribed_feeds()
print("Subscribed Feeds:")
for feed in feeds:
print(f"- {feed['webcontent_name']} (id: {feed['webcontent_id']})")

Or check all availble feed names and IDs

feeds = client.get_webcontent_feeds()

# Print available feed names and IDs
for feed in feeds["feeds"]:
print(f"{feed['id']}: {feed['name']}")


Step 4: Pull a Web Content Manifest

Print the manifest that shows all available files for a given feed.

manifest = client.get_webcontent_manifest(webcontent_id=6) #6 is the Web Content feed for ZIllow used for illustration
print(files)


Step 5: ✅ Step 5: Download Files

# Download a specific file
client.download_webcontent_file("insert file", directory="./data")

# Or download all files in a feed
for file in manifest["files"]:
client.download_webcontent_file(file["file_name"], directory="insert file")

Available Web Content Function Reference

Here are the available functions available in the latest version.

FunctionDescription
get_webcontent_feeds()Retrieve all available web content feeds (regardless of subscription status).
get_subscribed_feeds()Retrieve only the feeds you've subscribed to.
get_webcontent_manifest(webcontent_id, webcontent_date=None)Retrieve the list of files (manifest) for a given feed ID, optionally filtered by date.
get_webcontent_dataframe(webcontent_name)Return a tabular DataFrame version of a feed by name (if supported).
get_webcontent_file(file_name)Retrieve raw JSON contents of a file from a feed by filename.
download_webcontent_file(file_name, directory="./")Download a JSON file from a feed to a local folder.

Contact us at support@carbonarc.co if you have any questions!