Webbrowser Control Limitations

Let me tell you some disadvantages...

[most of the problems mentioned here has been answered or addressed to some degree in my previous answers section in StackOverflow, if you are curious, feel free to browse my WebBrowser-Control related answers].

Detecting when page is really done loading is very difficult to do reliably, in fact, you have to employ a series of hacks in order to be able to do this, some methods and ideas aren't even talked about online and aren't known, but the years I’ve spent fighting with this control i have figured some things out and have developed a code base to make it work! And it does, if you need help with this, i can provide more detail.
Let me tell you this straight up. The default rendering engine on the webbrowser control is fixed to ensure compatibility across all platforms.

Basically, if you're installed browser is IE 7 - IE 9, then the rendering engine used is IE 7.0 only (by default).

If, however, your installed IE version is IE 6 or below, then the rendering engine used is IE 4.0 (not kidding), unless of course you set it otherwise.

There is a misconception that WebBrowser control uses whatever is currently installed (current IE version) but this is not true, since they do this to reduce backward compatibility issues. You can see (as proof) that this really is your problem by going to www.whatsmyuseragent.com in your normal browser, and then going to that website again in your WebBrowser control, you will see that it says MSIE 7.0 :).

You can set it to use the current installed version of internet explorer, either using a META tag in-page, or editing the Registry on the machine where the webbrowser control will run (editing for Current_User and Local_Machine will both work).

So, for compatibility reasons it will render pages in IE7 Standards mode by default. To prevent this from happening, follow the link I've provided below which will discuss both the META Tag method and the Registry Editing method to solve this problem (for both 32 & 64 bit systems). The solution is contained as an Answer to someone else's question about a feature working incorrectly or unexpectedly. Reading the Question isn't necessary to correctly interpret/understand the Answer. Here is the link:

Script runs slower in the dotnet WebBrowser control (Ctrl + Click to Open in New Tab).
The eventing system is pretty hacky, you really need to know things that haven't been documented properly and some things that haven't been documented at all. In fact, I have declared it one of MS's worst products, in terms of the design of the product and also in terms of the lack of decent documentation that they've made available on it. Their dry MSDN style documentation is laughable.
Bad frames support, if you do a call to document.frames.length, you will only get the frames right under the top level document, not all frames, you will need to write your own functions to get all nested frames (infinitely nested) and i have done this, if you need help with it. Frames detection and referencing is very important and plays a vital role for use in detecting when the page has really finished loading. In that, using .Busy and .ReadyState on the WebBrowser control is not enough. In fact, it's nowhere near enough.
There is no inbuilt system to get rid of the JavaScript dialog boxes that come poping out of every page, including the new IE9 dialog box that pesters people with the "are you sure you want to leave this page" message. I have developed routines to do this and get rid of them, basically, one of the methods involve executing JavaScript sent from the WebBrowser control to the html page directing it to get rid of the alert, confirm, print dialog boxes (and also to get rid of the new IE 9 dialog box that i mentioned earlier). These are potential dialog boxes coming from JS alone, and I basically run JavaScript that tells the browser that the .alert function is Null (ie: an empty method/function which does nothing), and I do the exact same thing for all of these 4 dialog boxes that come from JavaScript. Of course, if you've counted more than 4 boxes (if you have counted more, please do feel free to let me know). Also, there is a second method by which we can do this, and it will not only prevent JavaScritp dialog boxes, but every single dialog box that could/would appear in the webbrowser control, this method uses WinHooks, and intercepts the Dialog box before it is displayed, you can get as much information as you want from the dialog box (its contents as text, title/caption as text etc) and decide if you want it displayed or cancel its display, or even simulate a click on any part of the dialog box (ie: any of its buttons) so the stack thinks the question or information dialog was properly responded to. This is an interesting method that i've read about but haven't tried yet, and I'm really looking forward to understanding the WinHook process once i have some free time. As usual, if you need help, feel free to look into some of my previous answers to various webbrowser control questions, as i've answered many, and if that doesn't do, let me know. Keep in mind that this depends heavily on knowing when the page is fully done loading, which is very difficult to do (but possible, using undocumented methods, in a 100% reliable way). So point 1). Will come into relevance many times over.
There is no reliable or easy way to control the caching information that is perpetuated or saved, once again, you have to develop your own routines to do what you want with caching information, to either filter, delete or try to prevent it for all cache types, including history information, cookies, and actual cache files stored on the local system. If you look into DeleteUrlCacheEntry that will give you a lead on two ways of doing it on your own, also, I'm pretty sure I have some previous answers which talk about how to do this on StackOverflow. With the DeleteUrlCacheEntry, you get to play with cache items begining with the "Cookie: " tag, the "Visited: " tag and items that are just plain website addresses (begining with "http://" and "https://" (and yes, https is cached ;|, at least the location information is anyway). Also note that this information available through DeleteUrlCacheEntry (and the accompanying FindFirstUrlCacheEntry/FindNextUrlCacheEntry which are used to loop through the entire cache) does not include your actual internet explorer history items. The "Visited: " sites list is seperate from your Actual History list, that you see when you click the * symbol on your Internet Explorer Menu Bar and go into the History section (from the favorites section). I'm not sure why they have doen it this way and what the exact, formal difference is (and why there is a difference) but it's on the list of things to find out (feel free to let us know in the comments). Because, the "Visited: " list is a list of sites that you have visited, and the IE History is pretty much a list of websites that you have visited as well. I don't think they make a distinction of sites that you've manually typed in and entered versus bits and pieces that are auto retreived by the HTML page or your browser (such as through iframes etc, and auto-redirects, popups etc)... so I'm finding it difficult to understand what the distinction is, and I will update this bit once I find out.
Overriding the default user agent isn't built in properly, you can pass your own user agent into the navigate method, but once the user navigates there, the site will get your programs user agent details as you've set, however, this won’t perpetuate. So, once user follows a link on the navigated page, the WebBrowser control will continue sending the actual (real) user agent that the WB control is using to render your site, unless of course, you intercept the navigation, cancel it and renavigate using the .navigate method again while sending your own user agent (again). This won't be able to account for things like images and LINK tag files etc., as you don't get BeforeNavigate events for these, so you can't intercept them and modify the headers sent for them. Instead, you need to use an external solution by importing some external functions urlmon.dll - this can do it 100% and works flawlessly, however, it's another added dependency (but urlmon.dll comes included with all relevant windows versions to date).
There is no "redirect all my WB control activity to this particular frame" property or method, though you can and will have to develop that if you want to or need to, the only frame support is a TargetFrameName argument that comes with the .navigate method, and you will need to get a reference to it and direct everything you do there manually, for each action that needs to be happening there, since users can click things from any frame and you'd have no idea or clue unless you check for it.
Cross-frame security for sites with frames pointing to external domains: as you may know, if you have a page on abc.com and it has and iframe that has a source from a domain named xzy.com (as most advertisers do when relaying content from their own servers), you will run into cross-frame domain security issues if you try to access that frame, regardless of what elevated privileges your app is running under. It’s silly, and they won't even tell you about it, instead, your reference document pointing to the frame will just not have any data in it and you will not be able to use it and the WB control will not tell you why. All you will have access to is the source URL of the frame and that’s it, nothing inside it. Solution? Well, there is a TypeLib registrable on your machine that you can use to override this, not built into the WB control, and not even built into your own programming interface either, in fact it’s an external C routine that you need to use by referencing and registering the TypeLib (not sure if there is a new way to do it without this method in .NET now days). However, you will need to write code around this TypeLib in your current programming environment as well (excess code to use the stuff in the TypeLib registration, so it’s not just a matter of calling a function, but writing more code around that function that you'll be using).
Turning JavaScript on/off, turning navigation settings on/off such as navigation sounds etc. If you're writing a web extractor program, the navigation sounds will drive your users crazy, turning these options on or off is not built into the WebBrowser control, you can change things globally using the registry if you need to, and then change them back once done. You will need to look up reg values for each of these internet settings related settings/options. There are ways to do this for your application instance, importing routines from InternetSecuritySettings I believe, but once again, not built into WB, and just another series of hacks to add to the list.
Of course, you will need to detect if an internet connection exists, and if one is available. The WB Control doesn't even give you a glimmer of hope on doing this, even though it’s a vital part of having it work. So, if you don't want the annoying popup windows of MS dialup connection (for those using dialup) or internet wizard for those on other connections, to appear EVERY TIME YOUR WB CONTROL TRIES TO MAKE A CONNECTION or tries to navigate somewhere, then you will need to use a control to try and check for connections manually, and this control will have to be a control outside of MS, and a control which does not have the MS API's at its core (since the MS internet API's are the API's that trigger these popup boxes for internet connection). So, you will need to get an external winsocks type control written from the ground up that is not using winsocks, learn how to use it, and use it to try to check if internet is connected before every time you perform an action with the WB control.
You will get lots of "Automation error" or "Unspecified error" messages, where it doesn't even tell you what went wrong, when you are dealing with elements on a live html document/page, these are usually when there is html that is done in an un-recommended way, even though it is a way that a browser can deal with and read, and deals with on a regular basis. For example, if you have a Anchor link with target=_top and you don't have quotes around the _top part, even though browsers understand this and behave as expected, the webbrowser control will throw its hands in the air and give up, throwing an "Unspecified error" - not even telling you what its being super picky about. So, you'll have to make sure the element is written like this: target="_top" in order for the WB control to behave, and going around making these changes to every live document can be tedious, and you'll need to write general routines to do this for every page if you need to - routines which run after document is full loaded (which you'll have to detect reliably in order to do). If i had to pick the hardest thing to do properly with the WB control, it would have to be detecting when page is done loading fully, reliably. On top of that, it is the most important thing you will need to do also, with the WB control, as almost everything depends on accurate detection of this.
It needs a separate history object, because if you choose "no history" during navigation, or find a way of making history-less navigation work, you can be sure that going back or forward to these pages will not work (i.e.: calling .GoBack or .GoForward to these pages and addresses). Once you delete from history, or specify that no history be kept for this or a particular navigation, going back there is impossible unless you renavigate to that page. They should have kept an in-memory history list that should have been available to go back to even though the page was removed from the global history (which is the only way it does no-history browsing). So, if you try to go back, you will (on top of everything) get a runtime error, and only in the recent .NET days did they provide a method called .CanGoBack to check if you can go back or not, before then (if using pre.NET) you should have to write code around this or try to keep counts of where you were (which isn't easy to do, but doable still).

I can keep going on (i think) but i will leave it at that for now, however, apart from those things, it is a pretty cool control and does open the door to a whole new world of apps and ideas you can make happen. As I’ve noted in a few of these points, these are all problems that I have solved (and there are still more, which i have solved when a solution was needed), so if you have any questions or need any help, let me know as I would be happy to at least try to help you out.

When I was trying to figure this stuff out, there was no one around to help me, as no one really knew much about this control, so I had to figure things out bit by bit, one by one. Since then, it has gained in popularity though, and there are more people using it (especially since the .NET version has provided incremental improvements). So, would be glad to help anyone out that is in the situation I've been in before, as I remember it was a scary and lonely place, and MS did nothing documentation wise. It is just something they developed for in-house use and let others use it, while providing only a list of input/output arguments/parameters & return values list for all properties, methods and events, and that was it - no meaning or context or real code examples associated with it, surely, nothing documentation wise in terms of solving the array of problems that came with it.

Ok, that does it for now, would be interested in people’s opinions of this control and use of it, so feel free to leave a comment. Take care. Erx.

There are no artificial limitations on the WebBrowser control.

However, it uses IE's rendering engine (whatever version is installed on the end-user's computer), so it uses a fair bit of memory.

What are you trying to do?

If you're trying to write a web browser, I recommend that you use a better rendering engine, such as WebKit or Gecko.

Webbrowser Control Limitations

Related

Recent Posts