Error importing data using requests - python
I am trying to access data from this webpage: https://qships.tmr.qld.gov.au/webx/# using the "Ship Movements" tab. I've attempted to use the url service request but keep getting errors.
Currently, I've tried:
import requests
import pandas as pd
import json
url = 'https://qships.tmr.qld.gov.au/webx/services/wxdata.svc/GetDataX'
payload = {
"token": None,
"reportCode": "MSQ-WEB-0001",
"dataSource": None,
"filterName": "Last 7 days",
"parameters": [{
"__type": "ParameterValueDTO:#WebX.Core.DTO",
"sName": "DOMAIN_ID",
"iValueType": 0,
"aoValues": [{"Value": -1}],
}],
"metaVersion": 0,
}
jsonData = requests.post(url, data = payload).json()
Which returns the following error:
{'ExceptionDetail': {'HelpLink': None, 'InnerException': None, 'Message': "The incoming message has an unexpected message format 'Raw'. The expected message formats for the operation are 'Xml', 'Json'. This can be because a WebContentTypeMapper has not been configured on the binding. See the documentation of WebContentTypeMapper for more details.", 'StackTrace': ' at System.ServiceModel.Dispatcher.DemultiplexingDispatchMessageFormatter.DeserializeRequest(Message message, Object[] parameters)\r\n at System.ServiceModel.Dispatcher.UriTemplateDispatchFormatter.DeserializeRequest(Message message, Object[] parameters)\r\n at System.ServiceModel.Dispatcher.DispatchOperationRuntime.DeserializeInputs(MessageRpc& rpc)\r\n at System.ServiceModel.Dispatcher.DispatchOperationRuntime.InvokeBegin(MessageRpc& rpc)\r\n at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage5(MessageRpc& rpc)\r\n at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage11(MessageRpc& rpc)\r\n at System.ServiceModel.Dispatcher.MessageRpc.Process(Boolean isOperationContextSet)', 'Type': 'System.InvalidOperationException'}, 'ExceptionType': 'System.InvalidOperationException', 'Message': "The incoming message has an unexpected message format 'Raw'. The expected message formats for the operation are 'Xml', 'Json'. This can be because a WebContentTypeMapper has not been configured on the binding. See the documentation of WebContentTypeMapper for more details.", 'StackTrace': ' at System.ServiceModel.Dispatcher.DemultiplexingDispatchMessageFormatter.DeserializeRequest(Message message, Object[] parameters)\r\n at System.ServiceModel.Dispatcher.UriTemplateDispatchFormatter.DeserializeRequest(Message message, Object[] parameters)\r\n at System.ServiceModel.Dispatcher.DispatchOperationRuntime.DeserializeInputs(MessageRpc& rpc)\r\n at System.ServiceModel.Dispatcher.DispatchOperationRuntime.InvokeBegin(MessageRpc& rpc)\r\n at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage5(MessageRpc& rpc)\r\n at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage11(MessageRpc& rpc)\r\n at System.ServiceModel.Dispatcher.MessageRpc.Process(Boolean isOperationContextSet)'}
When I alter the json dict, I'm getting the following error:
jsonData = requests.post(url, json={'key':'value'}).json()
{'ExceptionDetail': None, 'ExceptionType': None, 'Message': 'WebX.Security.EAuthError: Request does not belong to an authenticated session.', 'StackTrace': None}
Can the data be accessed through a request or will I have to scrape it?
Following the requests that the linked web UI makes, this script seems to be able to access some data.
The JSON payload for the data query is different from the original post, so something will probably need to be adapted there.
The kicker is, in any case, that you need to use a requests.Session()
, and make a request to the web UI first to acquire a session cookie.
from pprint import pprint
import requests
get_session_url = "https://qships.tmr.qld.gov.au/webx/"
get_data_url = "https://qships.tmr.qld.gov.au/webx/services/wxdata.svc/GetDataX"
get_data_query = {
"token": None,
"reportCode": "MSQ-WEB-0001",
"dataSource": None,
"filterName": "Next 7 days",
"parameters": [
{
"__type": "ParameterValueDTO:#WebX.Core.DTO",
"sName": "DOMAIN_ID",
"iValueType": 0,
"aoValues": [{"Value": -1}],
}
],
"metaVersion": 0,
}
sess = requests.session()
sess.get(get_session_url).raise_for_status()
json_data = sess.post(
get_data_url,
json=get_data_query,
).json()
pprint(json_data)
prints out (e.g.)
{'d': {'BuildVersion': '7.0.0.12590',
'ReportCode': 'MSQ-WEB-0001',
'Tables': [{'AsOfDate': '16:33 on Jan 17',
'BuildVersion': '7.0.0.12590',
'Data': [[132058,
334359,
'EXT',
'STOLT MOMIJI',
'TANKER',
121.52,
...