FilterASCII ( )

Function stats

Average user rating
37
299
9999
Support
FileMaker 10.0 +
Date posted
19 March 2011
Last updated
19 March 2011
Version
Recursive function
No

Author Info
 Fabrice

74 functions

Average Rating 4.4

author_avatar



 

Function overview

Prototype

FilterASCII  ( _text;   _charTable )


Parameters

_text  


_charTable  "ASCII" / "Extended ASCII" / "UTF-8"


Description

Tags:  Text   Filter  

Filters a text with ASCII, Extended ASCII, or UTF-8 table.
Useful to get rid of unexpected characters copied from some crappy word processors like MS Word

Examples

Sample input

some text with a bogus character


Sample output

some clean text

 

Function code

/*
FilterASCII ( _text ; _charTable )
by Fabrice Nordmann, 1-more-thing

http://www.1-more-thing.com

Filters a text with ASCII, Extended ASCII, or UTF-8 table.
Useful to get rid of unexpected characters copied from some crappy word processors like MS Word

_charTable accepted values : "ASCII" / "Extended ASCII" / "UTF-8"


Requires CustomList http://www.fmfunctions.com/fname/customlist

v.2    March 2011
    parameter name change
    handles UTF-8
v.1    March 2011

*/


Let ([
    _code_end = Case ( _charTable = "UTF-8" ; 9830 ; _charTable = "Extended ASCII" ; 256 ; _charTable = "ASCII" ; 128 ) ;
    _filter = CustomList ( 1 ; _code_end ; "char ( [n] )" )
];
Filter ( _text ; _filter )
)

// ===================================
/*

    This function is published on FileMaker Custom Functions
    to check for updates and provide feedback and bug reports
    please visit http://www.fmfunctions.com/fid/299

    Prototype: FilterASCII( _text; _charTable )
    Function Author: Fabrice (http://www.fmfunctions.com/mid/37)
    Last updated: 19 March 2011
    Version: 2.0

*/
// ===================================

 

Comments

comment
19 March 2011



Hi Fabrice,

Seems like a lot of processing goes into producing a string that could easily be hard-coded.
(Edited by comment on 19/03/11 )
  General comment
Fabrice
19 March 2011



Hi Michael,

since the function is not recursive (thanks to native Filter function), using CustomList for 256 iterations is almost nothing..
But actually I should name the parameter differently so you could use a wider range of characters.
By hard coding, you always forget a character or two. As an example, this function http://www.briandunning.com/cf/937 omits punctuation as well as œ and Œ, which is used in French (because it's not in the extended ASCII table - so this current function also misses it, but the fix is easy.)
  General comment
Fabrice
19 March 2011



I now changed the second parameter so it accepts ASCII, Extended ASCII and UTF-8.
Indeed, with UTF-8, it is... slow as hell. :)
I actually needed this function because some characters pasted from who-knows-where were crashing FileMaker and XSLT processor.
(Edited by Fabrice on 19/03/11 )
  General comment
comment
19 March 2011



Well, the idea that CustomList() is somehow NOT recursive is rather naive, if you pardon my saying so. It still needs to process all codes in the given range, one by one - and that takes time.

The point here is that the filterText parameter does not change from call to call (other than selecting this set or another), therefore it can be calculated ONCE, then hard-coded into the CF - thus eliminating the repeated generation of the same string at each call.

I don't quite see the point of UTF-8 here: what characters would be excluded?
(Edited by comment on 19/03/11 )
  General comment
Fabrice
19 March 2011



By not recursive, I meant what the text is not processed character by character, which you know of course.
I would agree on your suggestion though if the font FileMaker uses in the calculation window would render all characters.
The characters that would be excluded are... the ones I've been hunting today, which I can copy or store in a FileMaker field, but not use in a calculation or send to XSL without it to complain.
I can send you a text file containing this character if you want.
  General comment
comment
19 March 2011



You could use something like:
"abc...XYZ" & Char ( n ) & Char ( m ) & ...
to define the filterText.

Incidentally, you could do the same to specify the problematic character/s I asked about.
  General comment
comment
28 March 2011



Well, you haven't replied, so I have posted my proposed version here:

http://www.briandunning.com/cf/1291
(Edited by comment on 28/03/11 )
  General comment
Fabrice
28 March 2011



Hi,

sorry, I had not fully understood your proposition. Your version is of course much better, although there is no UTF-8.
I finally wonder if a simple function that would generate the filter string once (ASCII, Extended, UTF-8) would not be better. It's result could be stored in a global field, the global field would be used in a simple Filter ( text ; globalfield ) function.
Anyway, your calculation is as always so elegant. Makes me a little jealous ;)
  General comment
comment
29 March 2011



Well, the entire ASCII+extended set is only 256 characters - so storing the allowed subset as a string in the function itself is reasonable.

As for UTF-8, I still don't get your point: UTF-8 is not a charset. It is an encoding type, and it can encode the ENTIRE Unicode charset of 1,114,112 characters. IOW, there are no characters to exclude.

Earlier you mentioned XML. XML does have a "legal XML" subset - but again, you are looking at a set of 1,112,033 legal characters. I am not sure how well Filter() would perform with such a large set, even if it was pre-generated into a global.

OTOH, there are only 2,079 illegal XML characters - so ostensibly you could hard-code 2,079 substitutions to avoid recursing on the text character-by-character. However, I suspect this would only be needed when a script is running anyway - so I would have the script do this, rather than a CF.

"Makes me a little jealous "
LOL, that is one of the goals here... :-)
  General comment