Get the final generated html source using c# or vb.net
using VB.net or c#, How do I get the generated HTML source?
To get the html source of a page I can use this below but this wont get the generated source, it won't contain any of the html that was added dynamically by the javascript in the browser. How do I get the the final generated HTML source?
thanks
WebRequest req = WebRequest.Create("http://www.asp.net");
WebResponse res = req.GetResponse();
StreamReader sr = new StreamReader(res.GetResponseStream());
string html = sr.ReadToEnd();
if I try this below then it returns the document with out the JavaScript code injected
Public Class Form1
Dim WB As WebBrowser = Nothing
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
WB = New WebBrowser()
Me.Controls.Add(WB)
AddHandler WB.DocumentCompleted, AddressOf WebBrowser1_DocumentCompleted
WB.Navigate("mysite/Default.aspx")
End Sub
Private Sub WebBrowser1_DocumentCompleted(sender As Object, e As WebBrowserDocumentCompletedEventArgs)
'Dim htmlcode As String = WebBrowser1.Document.Body.OuterHtml()
Dim s As String = WB.DocumentText
End Sub
End Class
HTML returned
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
<title></title>
</head>
<body>
<form id="form1" runat="server">
<div id="center_text_panel">
//test text this text should be here
</div>
</form>
</body>
</html>
<script type="text/javascript">
document.getElementById("center_text_panel").innerText = "test text";
</script>
Solution 1:
You can use WebKit.NET
Look here for official tutorials
This can not only grab the source, but also process javascript through the pageload event.
webKitBrowser1.Navigate(MyURL)
Then, handle the DocumentCompleted event, and:
private documentContent = webKitBrowser1.DocumentText
Edit - This might be the better open source WebKit option: http://code.google.com/p/open-webkit-sharp/
Solution 2:
Just put a webbrowser
control to your form and you flowing code:
webBrowser1.Navigate("YourLink");
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
string htmlcode= webBrowser1.Document.Body.InnerHtml;//Or Each Filed Or element..//WebBrowser.DocumentText
}
Edited
for getting also html code that generated dynamically by java script code you have two way:
- run flowing code after
webBrowser1_DocumentCompleted
Event
StringBuilder htmlcode = new StringBuilder(); foreach (HtmlElement item in webBrowser1.Document.All) { htmlcode.Append( item.InnerHtml); }
- write a javascript code for returning
document.documentElement.innerHTML
and using InvolkeScript Function To Return Result:
var htmlcode = webBrowser1.Document.InvokeScript("javascriptcode");