How do I use the TABLE_QUERY() function in BigQuery?
A couple of questions about the TABLE_QUERY function:
- The examples show using
table_id
in the query string, are there other fields available? - It seems difficult to debug. I'm getting "error evaluating subsidiary query" when I try to use it.
- How does
TABLE_QUERY()
work?
Solution 1:
The TABLE_QUERY()
function allows you to write a SQL WHERE
clause that is evaluated to find which tables to run the query over. For instance, you can run the following query to count the rows in all tables in the publicdata:samples
dataset that are older than 7 days:
SELECT count(*)
FROM TABLE_QUERY(publicdata:samples,
"MSEC_TO_TIMESTAMP(creation_time) < "
+ "DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY')")
Or you can run this to query over all tables that have ‘git’ in the name (which are the github_timeline
and the github_nested
sample tables) and find the most common urls:
SELECT url, COUNT(*)
FROM TABLE_QUERY(publicdata:samples, "table_id CONTAINS 'git'")
GROUP EACH BY url
ORDER BY url DESC
LIMIT 100
Despite being very powerful, TABLE_QUERY()
can be difficult to use. The WHERE
clause must be specified as a string, which can be a little bit awkward. Moreover, it can be difficult to debug, since when there is a problem, you only get the error “Error evaluating subsidiary query”, which isn’t always helpful.
How it works:
TABLE_QUERY()
essentially executes two queries. When you run TABLE_QUERY(<dataset>, <table_query>)
, BigQuery executes SELECT table_id FROM <dataset>.__TABLES_SUMMARY__ WHERE <table_query>
to get the list of table IDs to run the query on, then it executes your actual query over those tables.
The __TABLES__
portion of that query may look unfamiliar. __TABLES_SUMMARY__
is a meta-table containing information about tables in a dataset. You can use this meta-table yourself. For example, the query SELECT * FROM publicdata:samples.__TABLES_SUMMARY__
will return metadata about the tables in the publicdata:samples
dataset.
Available Fields:
The fields of the __TABLES_SUMMARY__
meta-table (that are all available in the TABLE_QUERY
query) include:
-
table_id
: name of the table. -
creation_time
: time, in milliseconds since 1/1/1970 UTC, that the table was created. This is the same as thecreation_time
field on the table. -
type
: whether it is a view (2) or regular table (1).
The following fields are not available in TABLE_QUERY()
since they are members of __TABLES__
but not __TABLES_SUMMARY__
. They're kept here for historical interest and to partially document the __TABLES__
metatable:
-
last_modified_time
: time, in milliseconds since 1/1/1970 UTC, that the table was updated (either metadata or table contents). Note that if you use thetabledata.insertAll()
to stream records to your table, this might be a few minutes out of date. -
row_count
: number of rows in the table. -
size_bytes
: total size in bytes of the table.
How to debug
In order to debug your TABLE_QUERY()
queries, you can do the same thing that BigQuery does; that is, you can run the the metatable query yourself. For example:
SELECT * FROM publicdata:samples.__TABLES_SUMMARY__
WHERE MSEC_TO_TIMESTAMP(creation_time) <
DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY')
lets you not only debug your query but also see what tables would be returned when you run the TABLE_QUERY
function. Once you have debugged the inner query, you can put it together with your full query over those tables.
Solution 2:
Alternative answer, for those moving forward to Standard SQL:
- BigQuery Standard SQL doesn't support TABLE_QUERY, but it supports * expansion for table names.
- When expanding table names *, you can use the meta-column _TABLE_SUFFIX to narrow the selection.
- Table expansion with * only works when all tables have compatible schemas.
For example, to get the average worldwide NOAA GSOD temperature between 2010 and 2014:
#standardSQL
SELECT AVG(temp) avg_temp, _TABLE_SUFFIX y
FROM `bigquery-public-data.noaa.gsod_20*` #every year that starts with "20"
WHERE _TABLE_SUFFIX BETWEEN "10" AND "14" #only years between 2010 and 2014
GROUP BY y
ORDER BY y