Category : C# | Author : Chtiwi Malek | First posted : 11/20/2012 | Updated : 7/26/2014
Tags : .net, c#, url, html, parse, streaming, source, code, url

Reading Html page content from an url using C#.net

In many cases you want to programmatically read/parse the content of an URL and retrieve the HTML source code, here's a simple function I wrote to do that :

// Get html page content with c# class
public static string Get_HTML(string Url)
{
    System.Net.WebResponse Result = null;
    string Page_Source_Code;
    try
    {
        System.Net.WebRequest req = System.Net.WebRequest.Create(Url);
        Result = req.GetResponse();
        System.IO.Stream RStream = Result.GetResponseStream();
        System.IO.StreamReader sr = new System.IO.StreamReader(RStream);
        new System.IO.StreamReader(RStream);
        Page_Source_Code = sr.ReadToEnd();
        sr.Dispose();
    }
    catch
    {
        // error while reading the url: the url dosen’t exist, connection problem...
        Page_Source_Code = "";
    }
    finally
    {
        if (Result != null) Result.Close();
    }
    return Page_Source_Code;
}

The above function will parse the server code and will attempt to detect the encoding automatically, it can recognize and parse correctly UTF-8, little-endian Unicode, and big-endian Unicode text.

If the web server uses another coding you can specify that, just replace these two lines :

System.IO.StreamReader sr = new System.IO.StreamReader(ReceiveStream, Encoding.GetEncoding("Windows-1252"));
new System.IO.StreamReader(ReceiveStream, Encoding.GetEncoding("Windows-1252"));

If you don't know what’s the coding used, you'll have to read it from the meta tag, for example:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

And if it's not specified you can just go with the "UTF-8".

Leave a Comment:

Name :	This field is required
Email :	* will not be shown This field is required Email address incorrect
Title :	This field is required

Comment :	This field is required

	Submit comment