One way to access blockchain bigquery data with Google BigQuery is to use the pandas-gbq library.

It makes it really easy to take a BigQuery SQL query and download its results as a pandas DataFrame.

Big Query authentication

In order to use this library, you need to authenticate a credential with BigQuery. I found creating a 'Service account' to be the most meaningful. You can follow the steps here.

Installation

Installation with conda is straightforward.

$ conda install pandas-gbq --channel conda-forge

Basic usage

We can download a table by defining an SQL query and passing it to the read_gbq method.

import altair as alt
import pandas_gbq

As Ethereum data is big, we constrain our query to a single hour.

sql = """
SELECT block_timestamp, value
FROM `bigquery-public-data.ethereum_blockchain.token_transfers`
WHERE
    EXTRACT(YEAR FROM block_timestamp) = 2019 AND
    EXTRACT(MONTH FROM block_timestamp) = 09 AND
    EXTRACT(DAY FROM block_timestamp) = 24 AND
    EXTRACT(HOUR FROM block_timestamp) = 12
"""
token = pandas_gbq.read_gbq(sql, project_id="pandas-gbq-test-290508")
Downloading: 100%|██████████| 15392/15392 [00:03<00:00, 4846.48rows/s]
token
block_timestamp value
0 2019-09-24 12:47:43+00:00 50058584428
1 2019-09-24 12:47:43+00:00 30000000
2 2019-09-24 12:47:43+00:00 14069000000
3 2019-09-24 12:47:43+00:00 170000000
4 2019-09-24 12:47:43+00:00 59478199
... ... ...
15387 2019-09-24 12:40:54+00:00 3577720000000000000000
15388 2019-09-24 12:40:54+00:00 2314760000000000000000
15389 2019-09-24 12:40:54+00:00 2399000000000000000000
15390 2019-09-24 12:40:54+00:00 5701000000000000000000
15391 2019-09-24 12:40:54+00:00 3608000000000000000000

15392 rows × 2 columns

Data transformation

After loading the dataset, it requires some transformations.

First, we want to sort it by timestamps.

token.dtypes
block_timestamp    datetime64[ns, UTC]
value                           object
dtype: object
token = token.sort_values('block_timestamp')

Then we want to convert the values column to float.

token['value'] = token['value'].astype(float)

Visualization

As altair does not allow visualization with more than 5000 rows, we need to manually set it possible.

alt.data_transformers.disable_max_rows()
DataTransformerRegistry.enable('default')

We plot the average transaction values by minute. As we there are a few number of very high value transactions, we use a log scale.

alt.Chart(token).mark_line().encode(
    alt.X('utchoursminutes(block_timestamp):T'),
    alt.Y('average(value):Q', scale=alt.Scale(type='log'))
).properties(width=800, title='Average token transaction values (log scale)')