BigQuery

Arcfeed provides direct access to historical trade data via Google BigQuery. Data is written to the dataservice dataset in real time — you query it with standard SQL using your own GCP project and pay only for your own BigQuery compute. No ETL pipelines, no data exports, no intermediate storage to manage.

BigQuery access is available at $349/month per GCP project (flat fee). Contact hello@arcfeed.finance to get started.

Table reference

Dataset	`dataservice`
Table	`trade_v1`
Full reference	`your_project.dataservice.trade_v1`
Partitioning	DAY on `publish_time`

Schema

Column	Type	Description
`price`	STRING	Trade price as a decimal string (e.g. `"42000.50"`) — full precision. Cast to `BIGNUMERIC` for arithmetic.
`quantity`	STRING	Trade quantity as a decimal string (e.g. `"0.00123"`) — full precision. Cast to `BIGNUMERIC` for arithmetic.
`timestamp`	BIGNUMERIC	Trade timestamp in Unix nanoseconds. Convert with `TIMESTAMP_MICROS(CAST(timestamp / 1000 AS INT64))`.
`subscription_name`	STRING	Pub/Sub subscription name. Internal — not needed for most queries.
`message_id`	STRING	Pub/Sub message ID. Stable across redeliveries — use for deduplication.
`publish_time`	TIMESTAMP	Partition column. When the message was published to Pub/Sub. Always filter by this column to avoid full table scans.
`attributes`	JSON	Pub/Sub message attributes. Contains `source` (exchange) and `pair` (trading pair). Extract with `JSON_VALUE(attributes, '$.source')`.

price and quantity are STRING columns. Cast to BIGNUMERIC before doing any arithmetic — not FLOAT64, which loses precision on financial values.

SQL examples

Basic query with partition filter

Always include a publish_time filter. Without it, BigQuery scans every partition in the table.

sql

-- Fetch recent BTC-USDT trades from Binance
-- Always filter by publish_time to use partition pruning
SELECT
  JSON_VALUE(attributes, '$.source')                    AS source,
  JSON_VALUE(attributes, '$.pair')                      AS pair,
  price,
  quantity,
  TIMESTAMP_MICROS(CAST(timestamp / 1000 AS INT64))     AS trade_time,
  publish_time
FROM `your_project.dataservice.trade_v1`
WHERE
  publish_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
  AND JSON_VALUE(attributes, '$.source') = 'binance'
  AND JSON_VALUE(attributes, '$.pair')   = 'BTC-USDT'
ORDER BY publish_time DESC
LIMIT 1000;

1-minute OHLCV candles

Aggregate trades into OHLCV bars. timestamp is nanoseconds (BIGNUMERIC) — divide by 1000 before passing to TIMESTAMP_MICROS().

sql

-- 1-minute OHLCV candles for BTC-USDT from Binance
SELECT
  TIMESTAMP_TRUNC(
    TIMESTAMP_MICROS(CAST(timestamp / 1000 AS INT64)),
    MINUTE
  )                                                     AS candle_open_time,
  MIN(CAST(price AS BIGNUMERIC))                        AS low,
  MAX(CAST(price AS BIGNUMERIC))                        AS high,
  SUM(CAST(quantity AS BIGNUMERIC))                     AS volume,
  COUNT(*)                                              AS trade_count
FROM `your_project.dataservice.trade_v1`
WHERE
  publish_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
  AND JSON_VALUE(attributes, '$.source') = 'binance'
  AND JSON_VALUE(attributes, '$.pair')   = 'BTC-USDT'
GROUP BY candle_open_time
ORDER BY candle_open_time;

VWAP calculation

Volume-weighted average price across exchanges. Cast both price and quantity to BIGNUMERIC to preserve precision in the product.

sql

-- VWAP for BTC-USDT across all sources for the last 24 hours
SELECT
  JSON_VALUE(attributes, '$.source')                              AS source,
  SUM(CAST(price AS BIGNUMERIC) * CAST(quantity AS BIGNUMERIC))
    / SUM(CAST(quantity AS BIGNUMERIC))                           AS vwap,
  SUM(CAST(quantity AS BIGNUMERIC))                               AS total_volume,
  COUNT(*)                                                        AS trade_count
FROM `your_project.dataservice.trade_v1`
WHERE
  publish_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
  AND JSON_VALUE(attributes, '$.pair') = 'BTC-USDT'
GROUP BY source
ORDER BY total_volume DESC;

Trade count by exchange and pair

Summarise activity across all exchanges and pairs for a given time window.

sql

-- Trade count and volume by exchange and pair for the last hour
SELECT
  JSON_VALUE(attributes, '$.source')  AS source,
  JSON_VALUE(attributes, '$.pair')    AS pair,
  COUNT(*)                            AS trade_count,
  SUM(CAST(quantity AS BIGNUMERIC))   AS total_quantity
FROM `your_project.dataservice.trade_v1`
WHERE
  publish_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
GROUP BY source, pair
ORDER BY trade_count DESC
LIMIT 50;

Deduplication via message_id

Pub/Sub delivers messages at least once. Use message_id to deduplicate if your analysis requires exactly-once semantics.

sql

-- Deduplicate using message_id
-- Pub/Sub guarantees at-least-once delivery; message_id is stable across redeliveries
SELECT
  message_id,
  JSON_VALUE(attributes, '$.source')                    AS source,
  JSON_VALUE(attributes, '$.pair')                      AS pair,
  price,
  quantity,
  TIMESTAMP_MICROS(CAST(timestamp / 1000 AS INT64))     AS trade_time
FROM (
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY message_id ORDER BY publish_time) AS rn
  FROM `your_project.dataservice.trade_v1`
  WHERE
    publish_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
    AND JSON_VALUE(attributes, '$.pair') = 'ETH-USDT'
)
WHERE rn = 1
ORDER BY trade_time DESC;

Cost tips

Always filter by publish_time. It is the partition column. A query without a publish_time filter scans the entire table.
Cast to BIGNUMERIC, not FLOAT64. Financial calculations require exact decimal arithmetic. FLOAT64 introduces rounding errors on values like 0.1.
timestamp is nanoseconds. Divide by 1000 before calling TIMESTAMP_MICROS() — the function expects microseconds, not nanoseconds.
JSON_VALUE() is safe in WHERE. BigQuery evaluates partition pruning before JSON extraction, so filtering on attributes does not prevent partition pruning.
GCP includes 1 TB/month free. Most analytical queries on a single day's partition cost cents. Use the BigQuery query validator to preview bytes scanned before running large queries.

Access setup

Access is granted as roles/bigquery.dataViewer on the dataservice dataset. This allows your GCP project to query the table directly — no data is copied to your project.

To request access, email hello@arcfeed.finance with your GCP project ID. Access is provisioned within one business day.

Once access is granted, replace your_project in the examples above with your GCP project ID. No additional configuration is required — query the table directly from BigQuery Studio, bq CLI, or any GCP data tool.