How to scrape a public tableau dashboard? [closed]
Edit
I've made a tableau scraper library to extract the data from Tableau worksheets
You can get the data from worksheets in a pandas dataframe directly. Also, the parametered values are supported.
The following example get the data from worksheet Simples Demanda Máxima Ano
, then switch to daily mode, shows the worksheet Simples Demanda Máxima Semana Dia
data and then set start date to 01/01/2017
:
from tableauscraper import TableauScraper as TS
url = "https://tableau.ons.org.br/t/ONS_Publico/views/DemandaMxima/HistricoDemandaMxima"
ts = TS()
ts.loads(url)
wb = ts.getWorkbook()
# show dataframe with yearly data
ws = wb.getWorksheet("Simples Demanda Máxima Ano")
print(ws.data)
# switch to daily
wb = wb.setParameter("Escala de Tempo DM Simp 4", "Dia")
# show dataframe with daily data
ws = wb.getWorksheet("Simples Demanda Máxima Semana Dia")
print(ws.data)
# switch to daily
wb = wb.setParameter(
"Início Primeiro Período DM Simp 4", "01/01/2017")
# show dataframe with daily data from 01/01/2017
ws = wb.getWorksheet("Simples Demanda Máxima Semana Dia")
print(ws.data)
Try this on repl.it
Original post
This answer is similar to this one but the initial URL page and tableau base URL differ. The process/algo remains the same essentially but I will details the steps :
The graphic is generated in JS from the result of an API :
POST https://tableau.ons.org.br/ROOT_PATH/bootstrapSession/sessions/SESSION_ID
The SESSION_ID
parameter is located (among other things) in tsConfigContainer
textarea in the URL used to build the iframe.
Starting from https://tableau.ons.org.br/t/ONS_Publico/views/DemandaMxima/HistricoDemandaMxima?:embed=y&:showAppBanner=false&:showShareOptions=true&:display_count=no&:showVizHome=no :
- there is a textarea with id
tsConfigContainer
with a bunch of json values - extract the
session_id
and root path (vizql_root
) - make a POST on
https://tableau.ons.org.br/ROOT_PATH/bootstrapSession/sessions/SESSION_ID
with thesheetId
as form data - extract the json from the result (result is not json)
Code :
import requests
from bs4 import BeautifulSoup
import json
import re
url = "https://tableau.ons.org.br/t/ONS_Publico/views/DemandaMxima/HistricoDemandaMxima"
r = requests.get(
url,
params= {
":embed":"y",
":showAppBanner":"false",
":showShareOptions":"true",
":display_count":"no",
"showVizHome": "no"
}
)
soup = BeautifulSoup(r.text, "html.parser")
tableauData = json.loads(soup.find("textarea",{"id": "tsConfigContainer"}).text)
dataUrl = f'https://tableau.ons.org.br{tableauData["vizql_root"]}/bootstrapSession/sessions/{tableauData["sessionid"]}'
r = requests.post(dataUrl, data= {
"sheet_id": tableauData["sheetId"],
})
dataReg = re.search('\d+;({.*})\d+;({.*})', r.text, re.MULTILINE)
info = json.loads(dataReg.group(1))
data = json.loads(dataReg.group(2))
print(data["secondaryInfo"]["presModelMap"]["dataDictionary"]["presModelHolder"]["genDataDictionaryPresModel"]["dataSegments"]["0"]["dataColumns"])