How to retrieve HTML content from WebView (as a string)

How do I retrieve all HTML content currently displayed in a WebView?

I found WebView.loadData() but I couldn't find the opposite equivalent (e.g. WebView.getData())

Please note that I am interested in retrieving that data for web pages that I have no control over (i.e. I cannot inject a Javascript function into those pages, so that that it would call a Javascript interface in WebView).


Solution 1:

You can achieve this through:

final Context myApp = this;

/* An instance of this class will be registered as a JavaScript interface */
class MyJavaScriptInterface
{
    @SuppressWarnings("unused")
    public void processHTML(String html)
    {
        // process the html as needed by the app
    }
}

final WebView browser = (WebView)findViewById(R.id.browser);
/* JavaScript must be enabled if you want it to work, obviously */
browser.getSettings().setJavaScriptEnabled(true);

/* Register a new JavaScript interface called HTMLOUT */
browser.addJavascriptInterface(new MyJavaScriptInterface(), "HTMLOUT");

/* WebViewClient must be set BEFORE calling loadUrl! */
browser.setWebViewClient(new WebViewClient() {
    @Override
    public void onPageFinished(WebView view, String url)
    {
        /* This call inject JavaScript into the page which just finished loading. */
        browser.loadUrl("javascript:window.HTMLOUT.processHTML('<html>'+document.getElementsByTagName('html')[0].innerHTML+'</html>');");
    }
});

/* load a web page */
browser.loadUrl("http://lexandera.com/files/jsexamples/gethtml.html");

You will get the whole Html contnet in processHTML method. and it wont make another request for webpage. so it is also more efficient way for doing this.

Thanks.

Solution 2:

Unfortunately there is not easy way to do this.

See How do I get the web page contents from a WebView?

You could just make a HttpRequest to the same page as your WebView and get the response.

Solution 3:

webView.evaluateJavascript("(function(){return window.document.body.outerHTML})();", 
      new ValueCallback<String>() {
          @Override
          public void onReceiveValue(String html) {

          }
      });

Solution 4:

Add this to your code:

private String getUrlSource(String site) throws IOException {
    //GNU Public, from ZunoZap Web Browser
    URL url = new URL(site);
    URLConnection urlc = url.openConnection();
    BufferedReader in = new BufferedReader(new InputStreamReader(
    urlc.getInputStream(), "UTF-8"));
    String inputLine;
    StringBuilder a = new StringBuilder();
    while ((inputLine = in.readLine()) != null)
    a.append(inputLine);
    in.close();

    return a.toString();
}

then lets say you what to get Google's source you would do:

getURLSource("http://google.com");

Solution 5:

You can intercept the HTTP requests made by the WebView, and then modify the HTML to include whatever JavaScript functions you need to communicate with the HTML page. You intercept HTTP requests via the WebViewClient shouldInterceptRequest() method.

Using this mechanism you can get access to the loaded page by loading it yourself, modify it before passing it on to the WebView, and even cache it locally if you want.