Accessing Ethereum token transaction data with ``pandas-gbq``
One way to access blockchain bigquery data with Google BigQuery is to use the pandas-gbq
library.
It makes it really easy to take a BigQuery SQL query and download its results as a pandas DataFrame.
In order to use this library, you need to authenticate a credential with BigQuery. I found creating a 'Service account' to be the most meaningful. You can follow the steps here.
Installation with conda is straightforward.
$ conda install pandas-gbq --channel conda-forge
We can download a table by defining an SQL query and passing it to the read_gbq
method.
import altair as alt
import pandas_gbq
As Ethereum data is big, we constrain our query to a single hour.
sql = """
SELECT block_timestamp, value
FROM `bigquery-public-data.ethereum_blockchain.token_transfers`
WHERE
EXTRACT(YEAR FROM block_timestamp) = 2019 AND
EXTRACT(MONTH FROM block_timestamp) = 09 AND
EXTRACT(DAY FROM block_timestamp) = 24 AND
EXTRACT(HOUR FROM block_timestamp) = 12
"""
token = pandas_gbq.read_gbq(sql, project_id="pandas-gbq-test-290508")
token
After loading the dataset, it requires some transformations.
First, we want to sort it by timestamps.
token.dtypes
token = token.sort_values('block_timestamp')
Then we want to convert the values
column to float.
token['value'] = token['value'].astype(float)
As altair does not allow visualization with more than 5000 rows, we need to manually set it possible.
alt.data_transformers.disable_max_rows()
We plot the average transaction values by minute. As we there are a few number of very high value transactions, we use a log scale.
alt.Chart(token).mark_line().encode(
alt.X('utchoursminutes(block_timestamp):T'),
alt.Y('average(value):Q', scale=alt.Scale(type='log'))
).properties(width=800, title='Average token transaction values (log scale)')