Pathnames: Difference between revisions
No edit summary |
|||
(6 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
[[Category: | [[Category:Computer]] | ||
==Non-ASCII characters== | ==Non-ASCII characters== | ||
===The Issue=== | ===The Issue=== | ||
Line 7: | Line 7: | ||
* FTP clients and servers (where Unicode characters often appear as a garbage character pairs) | * FTP clients and servers (where Unicode characters often appear as a garbage character pairs) | ||
* Some utility programs e.g. FLAC encoder, Beyond Compare 2 | * Some utility programs e.g. FLAC encoder, Beyond Compare 2 | ||
* File-sharing programs | * File-sharing programs e.g. SoulSeek v157 (also fails to preserve upper-case, converting to lower-case) | ||
* Librarian programs e.g. | * Librarian programs e.g. | ||
** http://www.mediamonkey.com/forum/viewtopic.php?t=15384&start=15 4290 Fixed Unicode characters in Custom fields cause the field to not be stored in ID3 tags | ** MediaMonkey 3 builds before 1105 | ||
*** http://www.mediamonkey.com/forum/viewtopic.php?t=15384&start=15 4290 Fixed Unicode characters in Custom fields cause the field to not be stored in ID3 tags | |||
Some of | Some of these cases may be confined to characters outside the [[Extended ASCII]] set. | ||
===Workarounds=== | ===Workarounds=== | ||
Line 17: | Line 18: | ||
# accented -> non-accented | # accented -> non-accented | ||
# left and right quotes -> upright quote | # left and right quotes -> upright quote | ||
# character with no | # character with no ASCII lookalike -> underscore | ||
Such replacements can be performed by [[Mp3tag]]: | Such replacements can be performed by [[Mp3tag]]: | ||
# Action Group containing a Replace action for each known character | # Action Group containing a Replace action for each known character | ||
# Ditto | # Ditto | ||
# $ansi() (which reduces to Extended | # $ansi() (which reduces to Extended ASCII) followed by $regexp() to replace all remaining non-ASCII chars with a lookalike or substitute, e.g. underscore. $ansi() uses ? as a substitute, so for Windows-compatible pathnames (in which ? is disallowed), handle this either in the $regexp() or together with all other pathname-invalid characters by using $validate() . | ||
====further restrictions==== | ====further restrictions==== | ||
Line 45: | Line 46: | ||
*examples | *examples | ||
** | **https://tango.info/work.la_maleva - disambiguation page | ||
** | **https://tango.info/work.la_viruta - work page - unique match | ||
**Un jardín de ilusión -> | **Un jardín de ilusión -> https://tango.info/work.un_jardin_de_ilusion | ||
**La puñalada -> | **La puñalada -> https://tango.info/work.la_punalada | ||
**Qué falta que me hacés! -> | **Qué falta que me hacés! -> https://tango.info/work.que_falta_que_me_haces | ||
===External references=== | ===External references=== | ||
* Most common non-ASCII 8-bit characters: http://www.microsoft.com/GLOBALDEV/Reference/sbcs/1252.mspx | * Most common non-ASCII 8-bit characters: http://www.microsoft.com/GLOBALDEV/Reference/sbcs/1252.mspx | ||
==See also== | |||
* [[TINT based filing]] |
Latest revision as of 2013-06-10T00:23:18
Non-ASCII characters
The Issue
Non-ASCII characters in the pathnames (foldernames and file names) of music library tracks can cause problems due to incompatiblities with:
- Macintosh
- Portable music players e.g. Rockbox. These chars (perhaps only Unicode ones) can prevent play of the file and of subsequent files.
- FTP clients and servers (where Unicode characters often appear as a garbage character pairs)
- Some utility programs e.g. FLAC encoder, Beyond Compare 2
- File-sharing programs e.g. SoulSeek v157 (also fails to preserve upper-case, converting to lower-case)
- Librarian programs e.g.
- MediaMonkey 3 builds before 1105
- http://www.mediamonkey.com/forum/viewtopic.php?t=15384&start=15 4290 Fixed Unicode characters in Custom fields cause the field to not be stored in ID3 tags
- MediaMonkey 3 builds before 1105
Some of these cases may be confined to characters outside the Extended ASCII set.
Workarounds
Replace non-ASCII characters with e.g. ASCII stand-ins:
- accented -> non-accented
- left and right quotes -> upright quote
- character with no ASCII lookalike -> underscore
Such replacements can be performed by Mp3tag:
- Action Group containing a Replace action for each known character
- Ditto
- $ansi() (which reduces to Extended ASCII) followed by $regexp() to replace all remaining non-ASCII chars with a lookalike or substitute, e.g. underscore. $ansi() uses ? as a substitute, so for Windows-compatible pathnames (in which ? is disallowed), handle this either in the $regexp() or together with all other pathname-invalid characters by using $validate() .
further restrictions
One could also like to avoid urlencoding in urls. Or avoid that case has a meaning.
- restrictions further as ascii could be:
- prefer usage of only A-Za-z0-9_
- or only a-z_
- . for fileextensions, otherwise undecided, probably avoid
- ' not decided yet, if removed, how to write D'Arienzo?
- , avoid, at least in work titles
- []() maybe reserve for special meaning
- space always replaced with _
- prefer usage of only A-Za-z0-9_
work_name_az
There can be lot of discussion whether a title should include ",!?." or such special chars like the spanish leading question mark. There can also be discussion about correct case (upper/lower). The a-z-title is a title that can be derived from lot of different original opinions about correctness. For filenames and references in databases it could be helpfull to have a-z worktitles.
- remove all diacritics
- make all characters lower case
- replace space, comma, dot, with _
- reduce repeating _ to single _
- examples
- https://tango.info/work.la_maleva - disambiguation page
- https://tango.info/work.la_viruta - work page - unique match
- Un jardín de ilusión -> https://tango.info/work.un_jardin_de_ilusion
- La puñalada -> https://tango.info/work.la_punalada
- Qué falta que me hacés! -> https://tango.info/work.que_falta_que_me_haces
External references
- Most common non-ASCII 8-bit characters: http://www.microsoft.com/GLOBALDEV/Reference/sbcs/1252.mspx