Pathnames
Non-ASCII characters
The Issue
Non-ASCII characters in the pathnames (foldernames and file names) of music library tracks can cause problems due to incompatiblities with:
- Macintosh
- Portable music players e.g. Rockbox. These chars (perhaps only Unicode ones) can prevent play of the file and of subsequent files.
- FTP clients and servers (where Unicode characters often appear as a garbage character pairs)
- Some utility programs e.g. FLAC encoder, Beyond Compare 2
- File-sharing programs e.g. SoulSeek v157 (also fails to preserve upper-case, converting to lower-case)
- Librarian programs e.g.
- MediaMonkey 3 builds before 1105
- http://www.mediamonkey.com/forum/viewtopic.php?t=15384&start=15 4290 Fixed Unicode characters in Custom fields cause the field to not be stored in ID3 tags
- MediaMonkey 3 builds before 1105
Some of these cases may be confined to characters outside the Extended ASCII set.
Workarounds
Replace non-ASCII characters with e.g. ASCII stand-ins:
- accented -> non-accented
- left and right quotes -> upright quote
- character with no ASCII lookalike -> underscore
Such replacements can be performed by Mp3tag:
- Action Group containing a Replace action for each known character
- Ditto
- $ansi() (which reduces to Extended ASCII) followed by $regexp() to replace all remaining non-ASCII chars with a lookalike or substitute, e.g. underscore. $ansi() uses ? as a substitute, so for Windows-compatible pathnames (in which ? is disallowed), handle this either in the $regexp() or together with all other pathname-invalid characters by using $validate() .
further restrictions
One could also like to avoid urlencoding in urls. Or avoid that case has a meaning.
- restrictions further as ascii could be:
- prefer usage of only A-Za-z0-9_
- or only a-z_
- . for fileextensions, otherwise undecided, probably avoid
- ' not decided yet, if removed, how to write D'Arienzo?
- , avoid, at least in work titles
- []() maybe reserve for special meaning
- space always replaced with _
- prefer usage of only A-Za-z0-9_
work_name_az
There can be lot of discussion whether a title should include ",!?." or such special chars like the spanish leading question mark. There can also be discussion about correct case (upper/lower). The a-z-title is a title that can be derived from lot of different original opinions about correctness. For filenames and references in databases it could be helpfull to have a-z worktitles.
- remove all diacritics
- make all characters lower case
- replace space, comma, dot, with _
- reduce repeating _ to single _
- examples
- https://tango.info/work.la_maleva - disambiguation page
- https://tango.info/work.la_viruta - work page - unique match
- Un jardín de ilusión -> https://tango.info/work.un_jardin_de_ilusion
- La puñalada -> https://tango.info/work.la_punalada
- Qué falta que me hacés! -> https://tango.info/work.que_falta_que_me_haces
External references
- Most common non-ASCII 8-bit characters: http://www.microsoft.com/GLOBALDEV/Reference/sbcs/1252.mspx