HtmlEditor :  Phorum 5 The fastest message board... ever.

This is the discussion forum for the HtmlEditor. See also the HtmlEditor home page, where you can download the control, and the Documentation Wiki, a collaborative project for documenting the control.

Goto Thread: PreviousNext
Goto: Forum ListMessage ListNew TopicSearchLog In
Suggested change to GetDocumentSource
Posted by: Duncan Bayne (---.comworth.co.nz)
Date: Sunday, 08-Oct-2006, 01:44:40

Hi All,

After modifying the HTML Editor source as per a suggestion here (to fix a bug where the Unicode BOM wasn't being properly detected), I found a bug in the GetDocumentSource function in utils.cs; if the document contained a BOM, GetDocumentSource would throw an exception.

I fixed this, but came across a second bug; if the document contained a BOM, GetDocumentSource would remove the correct number of bytes from the start of the document, and then again each time a buffer-sized block was read.

I have fixed this second bug as well, & attached the new GetDocumentSource method below.


public static String GetDocumentSource(ref mshtml.HTMLDocument doc, Encoding enc)
{
    if (doc == null) return null;

    bool IsUnicodeDetermined = false;

    Encoding theEncoding = enc;
    if (theEncoding == null)
    {
        theEncoding = Encoding.GetEncoding(0);
        //Windows default
    }

    if (theEncoding != Encoding.GetEncoding(0))
    {
        //Don't try to detect unicode if we were
        //passed an encoding other than the default
        IsUnicodeDetermined = true;
    }

    // use the routine from htmlwrapper
    MemoryStream memstream = new MemoryStream();
    ComStream cstream = new ComStream(memstream);

    IPersistStreamInit pStreamInit = (IPersistStreamInit)doc;
    pStreamInit.Save(cstream, false);

    StringBuilder Result = new StringBuilder();

    //goto start of stream
    memstream.Seek(0, SeekOrigin.Begin);

    int iSize = 2048;
    byte[] bytedata = new byte[2048];
    int iBOMLength = 0;
    bool skippedBOM = false;

    while (true)
    {
        iSize = memstream.Read(bytedata, 0, bytedata.Length);
        if (iSize > 0)
        {
            if (!IsUnicodeDetermined)
            {
                //look for byte order mark
                bool IsUTF16LE = false;
                bool IsUTF16BE = false;
                bool IsUTF8 = false;
                bool IsBOMPresent = false;

                if ((bytedata[0] == 0xFF) & (bytedata[1] == 0xFE))//UTF16LE
                {
                    IsUTF16LE = true;
                    IsBOMPresent = true;
                }

                if ((bytedata[0] == 0xFE) & (bytedata[1] == 0xFF))// UTF16BE
                {
                    IsUTF16BE = true;
                    IsBOMPresent = true;
                }

                if ((bytedata[0] == 0xEF) & (bytedata[1] == 0xBcool smiley & (bytedata[2] == 0xBF)) //UTF8
                {
                    IsUTF8 = true;
                    IsBOMPresent = true;
                }

                //look for alternate zeroes

                if (!IsUTF16LE & !IsUTF16BE & !IsUTF8)
                {
                    if ((bytedata[1] == 0) & (bytedata[3] == 0) & (bytedata[5] == 0) & (bytedata[7] == 0))
                    {
                        IsUTF16LE = true; //best guess
                    }
                }

                if (IsUTF16LE)
                {
                    theEncoding = Encoding.Unicode;
                }
                else if (IsUTF16BE)
                {
                    theEncoding = Encoding.BigEndianUnicode;
                }
                else if (IsUTF8)
                {
                    theEncoding = Encoding.UTF8;
                }

                if (IsBOMPresent)
                {
                    //strip out the BOM
                    iBOMLength = theEncoding.GetPreamble().Length;
                }

                //don't repeat the test
                IsUnicodeDetermined = true;
            }

            // only skip the BOM once, at the start of the document, rather than at every block
            Result.Append(theEncoding.GetString(
                bytedata,
                skippedBOM ? 0 : iBOMLength,
                skippedBOM ? iSize : iSize - iBOMLength));

            skippedBOM = true;
        }
        else
        {
            break;
        }
    }
    memstream.Close();

    return Result.ToString();
}

Re: Suggested change to GetDocumentSource
Posted by: Tim (---.gotadsl.co.uk)
Date: Monday, 09-Oct-2006, 17:21:32

Many thanks; I hope we're getting on top of this BOM issue!

I'll have a look and update the source.

Tim



Your Name: 
Your Email: 
Subject: 
Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically.
hbE44
This is a moderated forum. Your message will remain hidden until it has been approved by a moderator or administrator
This forum powered by Phorum.