HTMLencoded2Text ( )

Function stats

Average user rating
4.5000
37
178
9999
Support
FileMaker 10.0 +
Date posted
08 January 2009
Last updated
13 September 2016
Version
Recursive function
Yes

Author Info
 Fabrice

74 functions

Average Rating 4.4

author_avatar



 

Function overview

Prototype

HTMLencoded2Text  ( _text )


Parameters

_text  


Description

Tags:  Text   HTML   Encoding  

Translates HTML encoded text into standard text

Examples

Sample input

HTMLencoded2Text ( "Smith&Wesson" )
or
HTMLencoded2Text ( "Smith&Wesson" )


Sample output

Smith&Wesson

 

Function code

/* HTMLencoded2Text ( _text )

by Fabrice Nordmann
http://www.1-more-thing.com - Twitter: @1morethingtweet


v.2.0 - Sept 2016
    - supports hexadecimal entities (® -> ®)
v.1.4.1 - Sep 2011
    - fixed bug causing Substitute to be skipped when using this function multiple times in the same script context (HOnza)
v.1.4 - Sep 2011
    - optimized to run much faster thanks to FM Bench Detective (http://fmbench.com/detective) (HOnza)
v.1.3 - Aug 2011
    - updated list of named character entities from http://alumnus.caltech.edu/~leif/namedchar.html (HOnza)
v.1.2 - Mar 2009
    - handles long unicodes (Clément Hoffmann)
v.1.1.1 - Jan 2009
    - added more HTML entities
v.1.1 - Jan 2009
    - added the HTML entities
v.1.0 - Jan 2009


Translates HTML encoded text into standard text

example :
HTMLencoded2Text ( "Smith&Wesson" ) = "Smith&Wesson"


Requires FileMaker 10 or later

Recursive function
*/



Case ( $cf.mode = "Hex2Num" ;
    //for the sake of ease of implementation, the Hex2Num function is included.
    Case ( IsEmpty ( _text ) ; Let ([ _result = $cf.hex.result; $cf.hex.result = "" ; $cf.hex.depth = "" ; $cf.mode = "" ]; _result ) ;

    Let ([
        _alpha = "0123456789ABCDEF" ;
        _hex = Case ( $cf.hex.depth ; _text ; Filter ( Upper ( _text ) ; _alpha )) ;
        $$debug.hex = List ( $$debug.hex ; _hex ) ;
        $cf.hex.result = $cf.hex.result + ( Position ( _alpha ; Right ( _text ; 1 ) ; 0 ; 1 ) - 1 )* 16^(0+$cf.hex.depth) ;
        $cf.hex.depth = $cf.hex.depth + 1
    ];
        HTMLencoded2Text ( Left ( _hex ; Length ( _hex ) - 1 )) )
) ;




Let ( [ _text = Case( $HTMLencoded2Text_deep; _text; Substitute ( _text ;
    [" "; " "] ;
    ["¡"; "¡"] ;
    ["¢"; "¢"] ;
    ["&pound"; "£"] ;
    ["¤"; "¤"] ;
    ["¥"; "¥"] ;
    ["¦"; "¦"] ;
    ["§"; "§"] ;
    ["¨"; "¨"] ;
    ["©"; "©"] ;
    ["ª"; "ª"] ;
    ["«"; "«"] ;
    ["¬"; "¬"] ;
    ["­"; " ­ "] ;
    ["®"; "®"] ;
    ["¯"; "¯"] ;
    ["°"; "°"] ;
    ["±"; "±"] ;
    ["²"; "²"] ;
    ["³"; "³"] ;
    ["´"; "´"] ;
    ["µ"; "µ"] ;
    ["¶"; "\¶"] ;
    ["·"; "·"] ;
    ["¸"; "¸"] ;
    ["¹"; "¹"] ;
    ["º"; "º"] ;
    ["»"; "»"] ;
    ["¼"; "¼"] ;
    ["½"; "½"] ;
    ["¾"; "¾"] ;
    ["¿"; "¿"] ;
    ["À"; "À"] ;
    ["Á"; "Á"] ;
    ["Â"; "Â"] ;
    ["Ã"; "Ã"] ;
    ["Ä"; "Ä"] ;
    ["Å"; "Å"] ;
    ["Æ"; "Æ"] ;
    ["Ç"; "Ç"] ;
    ["È"; "È"] ;
    ["É"; "É"] ;
    ["Ê"; "Ê"] ;
    ["Ë"; "Ë"] ;
    ["Ì"; "Ì"] ;
    ["Í"; "Í"] ;
    ["Î"; "Î"] ;
    ["Ï"; "Ï"] ;
    ["Ð"; "Ð"] ;
    ["Ñ"; "Ñ"] ;
    ["Ò"; "Ò"] ;
    ["Ó"; "Ó"] ;
    ["Ô"; "Ô"] ;
    ["Õ"; "Õ"] ;
    ["Ö"; "Ö"] ;
    ["×"; "×"] ;
    ["Ø"; "Ø"] ;
    ["Ù"; "Ù"] ;
    ["Ú"; "Ú"] ;
    ["Û"; "Û"] ;
    ["Ü"; "Ü"] ;
    ["Ý"; "Ý"] ;
    ["Þ"; "Þ"] ;
    ["ß"; "ß"] ;
    ["à"; "à"] ;
    ["á"; "á"] ;
    ["â"; "â"] ;
    ["ã"; "ã"] ;
    ["ä"; "ä"] ;
    ["å"; "å"] ;
    ["æ"; "æ"] ;
    ["ç"; "ç"] ;
    ["è"; "è"] ;
    ["é"; "é"] ;
    ["ê"; "ê"] ;
    ["ë"; "ë"] ;
    ["ì"; "ì"] ;
    ["í"; "í"] ;
    ["î"; "î"] ;
    ["ï"; "ï"] ;
    ["ð"; "ð"] ;
    ["ñ"; "ñ"] ;
    ["ò"; "ò"] ;
    ["ó"; "ó"] ;
    ["ô"; "ô"] ;
    ["õ"; "õ"] ;
    ["ö"; "ö"] ;
    ["÷"; "÷"] ;
    ["ø"; "ø"] ;
    ["ù"; "ù"] ;
    ["ú"; "ú"] ;
    ["û"; "û"] ;
    ["ü"; "ü"] ;
    ["ý"; "ý"] ;
    ["þ"; "þ"] ;
    ["ÿ"; "ÿ"] ;
    ["ƒ"; "ƒ"] ;
    ["Α"; "Α"] ;
    ["Β"; "Β"] ;
    ["Γ"; "Γ"] ;
    ["Δ"; "Δ"] ;
    ["Ε"; "Ε"] ;
    ["Ζ"; "Ζ"] ;
    ["Η"; "Η"] ;
    ["Θ"; "Θ"] ;
    ["Ι"; "Ι"] ;
    ["Κ"; "Κ"] ;
    ["Λ"; "Λ"] ;
    ["Μ"; "Μ"] ;
    ["Ν"; "Ν"] ;
    ["Ξ"; "Ξ"] ;
    ["Ο"; "Ο"] ;
    ["Π"; "Π"] ;
    ["Ρ"; "Ρ"] ;
    ["Σ"; "Σ"] ;
    ["Τ"; "Τ"] ;
    ["Υ"; "Υ"] ;
    ["Φ"; "Φ"] ;
    ["Χ"; "Χ"] ;
    ["Ψ"; "Ψ"] ;
    ["Ω"; "Ω"] ;
    ["α"; "α"] ;
    ["β"; "β"] ;
    ["γ"; "γ"] ;
    ["δ"; "δ"] ;
    ["ε"; "ε"] ;
    ["ζ"; "ζ"] ;
    ["η"; "η"] ;
    ["θ"; "θ"] ;
    ["ι"; "ι"] ;
    ["κ"; "κ"] ;
    ["λ"; "λ"] ;
    ["μ"; "μ"] ;
    ["ν"; "ν"] ;
    ["ξ"; "ξ"] ;
    ["ο"; "ο"] ;
    ["π"; "π"] ;
    ["ρ"; "ρ"] ;
    ["ς"; "ς"] ;
    ["σ"; "σ"] ;
    ["τ"; "τ"] ;
    ["υ"; "υ"] ;
    ["φ"; "φ"] ;
    ["χ"; "χ"] ;
    ["ψ"; "ψ"] ;
    ["ω"; "ω"] ;
    ["ϑ"; "ϑ"] ;
    ["ϒ"; "ϒ"] ;
    ["ϖ"; "ϖ"] ;
    ["•"; "•"] ;
    ["…"; "…"] ;
    ["′"; "′"] ;
    ["″"; "″"] ;
    ["‾"; "‾"] ;
    ["⁄"; "⁄"] ;
    ["℘"; "℘"] ;
    ["ℑ"; "ℑ"] ;
    ["ℜ"; "ℜ"] ;
    ["™"; "™"] ;
    ["ℵ"; "ℵ"] ;
    ["←"; "←"] ;
    ["↑"; "↑"] ;
    ["→"; "→"] ;
    ["↓"; "↓"] ;
    ["↔"; "↔"] ;
    ["↵"; "↵"] ;
    ["⇐"; "⇐"] ;
    ["⇑"; "⇑"] ;
    ["⇒"; "⇒"] ;
    ["⇓"; "⇓"] ;
    ["⇔"; "⇔"] ;
    ["∀"; "∀"] ;
    ["∂"; "∂"] ;
    ["∃"; "∃"] ;
    ["∅"; "∅"] ;
    ["∇"; "∇"] ;
    ["∈"; "∈"] ;
    ["∉"; "∉"] ;
    ["∋"; "∋"] ;
    ["∏"; "∏"] ;
    ["∑"; "∑"] ;
    ["−"; "−"] ;
    ["∗"; "∗"] ;
    ["√"; "√"] ;
    ["∝"; "∝"] ;
    ["∞"; "∞"] ;
    ["∠"; "∠"] ;
    ["∧"; "∧"] ;
    ["∨"; "∨"] ;
    ["∩"; "∩"] ;
    ["∪"; "∪"] ;
    ["∫"; "∫"] ;
    ["∴"; "∴"] ;
    ["∼"; "∼"] ;
    ["≅"; "≅"] ;
    ["≈"; "≈"] ;
    ["≠"; "≠"] ;
    ["≡"; "≡"] ;
    ["≤"; "≤"] ;
    ["≥"; "≥"] ;
    ["⊂"; "⊂"] ;
    ["⊃"; "⊃"] ;
    ["⊆"; "⊆"] ;
    ["⊇"; "⊇"] ;
    ["⊕"; "⊕"] ;
    ["⊗"; "⊗"] ;
    ["⊥"; "⊥"] ;
    ["⋅"; "⋅"] ;
    ["⌈"; "⌈"] ;
    ["⌉"; "⌉"] ;
    ["⌊"; "⌊"] ;
    ["⌋"; "⌋"] ;
    ["⟨"; "⟨"] ;
    ["⟩"; "⟩"] ;
    ["◊"; "◊"] ;
    ["♠"; "♠"] ;
    ["♣"; "♣"] ;
    ["♥"; "♥"] ;
    ["♦"; "♦"] ;
    ["""; "\""] ;
    ["&"; "&"] ;
    ["&lt;"; "<"] ;
    ["&gt;"; ">"] ;
    ["&OElig;"; "Œ"] ;
    ["&oelig;"; "œ"] ;
    ["&Scaron;"; "Š"] ;
    ["&scaron;"; "š"] ;
    ["&Yuml;"; "Ÿ"] ;
    ["&circ;"; "ˆ"] ;
    ["&tilde;"; "˜"] ;
    ["&ensp;"; " "] ;
    ["&emsp;"; " "] ;
    ["&thinsp;"; " "] ;
    ["&zwnj;"; "   "] ;
    ["&zwj;"; "   "] ;
    ["&lrm;"; "   "] ;
    ["&rlm;"; "   "] ;
    ["&ndash;"; "–"] ;
    ["&mdash;"; "—"] ;
    ["&lsquo;"; ""] ;
    ["&rsquo;"; ""] ;
    ["&sbquo;"; "‚"] ;
    ["&ldquo;"; "\“"] ;
    ["&rdquo;"; "\”"] ;
    ["&bdquo;"; "\„"] ;
    ["&dagger;"; "†"] ;
    ["&Dagger;"; "‡"] ;
    ["&permil;"; "‰"] ;
    ["&lsaquo;"; "‹"] ;
    ["&rsaquo;"; "›"] ;
    ["&euro;"; "€"]
)) ;
$HTMLencoded2Text_depth = $HTMLencoded2Text_depth + 1 ;
_finalresult = Case ( not Position ( _text ; "&#"; 1; 1 ) ; _text ;
    Let ([
        _pos = Position ( _text ; "&#" ; 1 ; 1 ) ;
        _pos2 = Position ( _text ; ";" ; _pos ; 1 ) - _pos ;
        _word = Middle ( _text ; _pos ; _pos2 + 1 ) ;
        _isCode = Length ( _word ) >= 4 and Length ( _word ) <= 8 and Substitute ( _word ; [ 0 ; "" ] ; [ 1 ; "" ] ; [ 2 ; "" ] ; [ 3 ; "" ] ; [ 4 ; "" ] ; [ 5 ; "" ] ; [ 6 ; "" ] ; [ 7 ; "" ] ; [ 8 ; "" ] ; [ 9 ; "" ]) = "&#;" ;
        _isHex = not _isCode and Left ( _word ; 3) = "&#x" and Right ( _word ; 1 ) = ";" ;
//        If you're a purist or need more iterations, copy the Hex2Num part of this function (at the top) as a new function, uncomment this line and comment out the next one.
//        _result = Debut ( _text ; _pos - 1 ) & Cas ( _isCode ; Caractere ( ObtenirNombre ( _word )) ; _isHex ; Caractere ( Hex2Num ( Extrait ( _word ; 3 ; Longueur ( _word ) - 3 ))) ; "&" ) ;
        _result = Left ( _text ; _pos - 1 ) & Case ( _isCode ; Char ( GetAsNumber ( _word )) ; _isHex ; Let ( $cf.mode = "Hex2Num" ; Char ( GetAsNumber ( HTMLencoded2Text ( Middle ( _word ; 3 ; Length ( _word ) - 3 ))))) ; "&" ) ;
        $debug = List ( $debug ; _result )
        ];
        _result & HTMLencoded2Text ( Right ( _text ; Length ( _text ) - ( _pos + Case ( _isCode ; Length ( _word ) - 1 ; _isHex ; Length ( _word ) - 1 ))))
        ));
    $HTMLencoded2Text_depth = $HTMLencoded2Text_depth - 1
];
    _finalresult
)
)

// ===================================
/*

    This function is published on FileMaker Custom Functions
    to check for updates and provide feedback and bug reports
    please visit http://www.fmfunctions.com/fid/178

    Prototype: HTMLencoded2Text( _text )
    Function Author: Fabrice (http://www.fmfunctions.com/mid/37)
    Last updated: 13 September 2016
    Version: 4.0

*/
// ===================================

 

Comments

Agnès
04 May 2009



Hello Fabrice,

During an import, I found myself with those here who was not in your list
; [ "&le;" ; "≤" ]
; [ "&ge; " ; "≥" ]
; [ "&para;" ; Citation ( "#|^|#¶#|^|#" ) ]
; [ "\"#|^|#" ; "" ]; [ "#|^|#\"" ; "" ]
; [ "&ldquo;" ; "\"" ]
; [ "&rdquo;" ; "\"" ]

in cazou
Thanks a lots for this one !

Agnès
(Edited by Agnès on 04/05/09 )
  General comment
HOnza
29 August 2011



Hey, useful but still not complete. So I created a sample file that generates an updated version of this function using a database of html entities ;-)
Feel free to download it here: http://24usw.com/hent
(Edited by HOnza on 29/08/11 )
  General comment
Fabrice
29 August 2011



Thanks ! that's why I like this site. Updated the function according to your modifications.
  General comment
HOnza
14 September 2011



I have just optimized the custom function to run about 4 times faster when processing 33 kilobytes of text and millions times faster when processing 1.4 MB of text.

Re-download my sample file from http://24usw.com/hent to get the updated custom function.

I am also going to post a video of the optimization as soon as I get some time to cut it…
  General comment
HOnza
14 September 2011



One more update and correction.
I have made a mistake in the optimization, causing the function to not work right when used multiple times in the same context (the $HTMLencoded2Text_deep variable was persisting across multiple calls). If you have already downloaded the optimized version, please re-download the file with this bug fixed.

Correction: it's about 8 times faster on the 33KB text and maybe not millions but only several hundreds times faster on the 1.4MB text ;-)
(Edited by HOnza on 14/09/11 )
  General comment
HOnza
27 January 2012



Further optimized version I demoed at Pause[x]London 2011 is now available at http://24usw.com/529b
  General comment
JohnD
07 July 2014



The input text for this function is commonly referred to as "URL Encoded", rather than "HTML encoded". This function will not decode HTML.
  General comment
Fabrice
07 July 2014



It decodes HTML entities. URL encoding is quite different.
http://www.w3schools.com/tags/ref_urlencode.asp
  General comment
Douglas Alder
10 September 2016



A few more to add to the list:

["&#xA0;"; " "];
["&#x2019;"; "'"];
["&#x201C;"; "\""];
["&#x201D;"; "\""];
["&#x2013;"; "-"];
["&#x2014;"; "—"];
["&#x2022;"; "•"];
["&#x2018;"; "'"];
["&#xAE;"; "®"];
["&#xA9;"; "©"];
["&#xE9;"; "é"];
["&#x2122;"; "™"];
["&#xE7;"; "ç"];
["&#xE0;"; "à"];
["&#xE8;"; "è"];
["&#x2026;"; "…"]
  General comment
Fabrice
13 September 2016



Thanks, you made me waste an evening for something I didn't need :D
Here you are. Now all hexadecimal entities are supported.
  General comment

 

 

 

 

 

Top Tags

Text Parsing  (33)
List  (31)
Date  (28)
XML  (26)
Format  (23)
Sql  (22)
Dev  (20)
Debug  (17)
Interface  (15)
Layout  (15)
Text  (14)
Variables  (12)
Layout Objects  (11)
Filter  (11)
Design  (10)
Array  (7)