Pathnames: Difference between revisions
Line 2: | Line 2: | ||
==Non-ASCII characters== | ==Non-ASCII characters== | ||
===The Issue=== | ===The Issue=== | ||
Non-ASCII characters in the | Non-[[ASCII]] characters in the pathnames (foldernames and file names) of music library tracks can cause problems due to incompatiblities with: | ||
* Macintosh | * Macintosh | ||
* Portable music players e.g. Rockbox. These chars (perhaps only Unicode ones) can prevent play of the file and of subsequent files. | * Portable music players e.g. Rockbox. These chars (perhaps only Unicode ones) can prevent play of the file and of subsequent files. | ||
Line 10: | Line 10: | ||
* Librarian programs e.g. | * Librarian programs e.g. | ||
** http://www.mediamonkey.com/forum/viewtopic.php?t=15384&start=15 4290 Fixed Unicode characters in Custom fields cause the field to not be stored in ID3 tags | ** http://www.mediamonkey.com/forum/viewtopic.php?t=15384&start=15 4290 Fixed Unicode characters in Custom fields cause the field to not be stored in ID3 tags | ||
Some of theses may be confined to non-[[Extended ASCII]] characters. | |||
===Workaround=== | ===Workaround=== |
Revision as of 2008-02-21T19:48:01
Non-ASCII characters
The Issue
Non-ASCII characters in the pathnames (foldernames and file names) of music library tracks can cause problems due to incompatiblities with:
- Macintosh
- Portable music players e.g. Rockbox. These chars (perhaps only Unicode ones) can prevent play of the file and of subsequent files.
- FTP clients and servers (where Unicode characters often appear as a garbage character pairs)
- Some utility programs e.g. FLAC encoder, Beyond Compare 2
- File-sharing programs
- Librarian programs e.g.
- http://www.mediamonkey.com/forum/viewtopic.php?t=15384&start=15 4290 Fixed Unicode characters in Custom fields cause the field to not be stored in ID3 tags
Some of theses may be confined to non-Extended ASCII characters.
Workaround
Replace non-ASCII characters with e.g. the nearest ASCII equivalent:
- accented char -> non-accented equivalent
- left and right quotes -> upright quote
Replacement can be performed by Mp3tag using an Action Groups containing a Replace action for each character.
further restrictions
One could also like to avoid urlencoding in urls. Or avoid that case has a meaning.
- restrictions further as ascii could be:
- prefer usage of only A-Za-z0-9_
- or only a-z_
- . for fileextensions, otherwise undecided, probably avoid
- ' not decided yet, if removed, how to write D'Arienzo?
- , avoid, at least in work titles
- []() maybe reserve for special meaning
- space always replaced with _
- prefer usage of only A-Za-z0-9_
work_name_az
There can be lot of discussion whether a title should include ",!?." or such special chars like the spanish leading question mark. There can also be discussion about correct case (upper/lower). The a-z-title is a title that can be derived from lot of different original opinions about correctness. For filenames and references in databases it could be helpfull to have a-z worktitles.
- remove all diacritics
- make all characters lower case
- replace space, comma, dot, with _
- reduce repeating _ to single _
- examples
- http://eng.tango.info/work:la_maleva - disambiguation page
- http://eng.tango.info/work:la_viruta - work page - unique match
- Un jardín de ilusión -> http://eng.tango.info/work:un_jardin_de_ilusion
- La puñalada -> http://eng.tango.info/work:la_punalada
- Qué falta que me hacés! -> http://eng.tango.info/work:que_falta_que_me_haces
External references
- Most common non-ASCII 8-bit characters: http://www.microsoft.com/GLOBALDEV/Reference/sbcs/1252.mspx