Pathnames: Difference between revisions

From tango.info wiki
Jump to navigation Jump to search
m (moved pathnames to Pathnames)
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[Category:computer]]
[[Category:Computer]]
==Non-ASCII characters==
==Non-ASCII characters==
===The Issue===
===The Issue===
Line 7: Line 7:
* FTP clients and servers (where Unicode characters often appear as a garbage character pairs)
* FTP clients and servers (where Unicode characters often appear as a garbage character pairs)
* Some utility programs e.g. FLAC encoder, Beyond Compare 2
* Some utility programs e.g. FLAC encoder, Beyond Compare 2
* File-sharing programs
* File-sharing programs e.g. SoulSeek v157 (also fails to preserve upper-case, converting to lower-case)
* Librarian programs e.g.
* Librarian programs e.g.
** http://www.mediamonkey.com/forum/viewtopic.php?t=15384&start=15 4290 Fixed Unicode characters in Custom fields cause the field to not be stored in ID3 tags
** MediaMonkey 3 builds before 1105
*** http://www.mediamonkey.com/forum/viewtopic.php?t=15384&start=15 4290 Fixed Unicode characters in Custom fields cause the field to not be stored in ID3 tags


Some of theses may be confined to non-[[Extended ASCII]] characters.
Some of these cases may be confined to characters outside the [[Extended ASCII]] set.


===Workaround===
===Workarounds===
Replace non-ASCII characters with e.g. the nearest ASCII equivalent:
Replace non-ASCII characters with e.g. ASCII stand-ins:
* accented char -> non-accented equivalent
# accented -> non-accented
* left and right quotes -> upright quote
# left and right quotes -> upright quote
Replacement can be performed by [[Mp3tag]] using an Action Groups containing a Replace action for each character.
# character with no ASCII lookalike -> underscore
 
Such replacements can be performed by [[Mp3tag]]:
# Action Group containing a Replace action for each known character
# Ditto
# $ansi() (which reduces to Extended ASCII) followed by $regexp() to replace all remaining non-ASCII chars with a lookalike or substitute, e.g. underscore. $ansi() uses ? as a substitute, so for Windows-compatible pathnames (in which ? is disallowed), handle this either in the $regexp() or together with all other pathname-invalid characters by using $validate() .


====further restrictions====
====further restrictions====
Line 40: Line 46:


*examples
*examples
**http://eng.tango.info/work:la_maleva - disambiguation page
**https://tango.info/work.la_maleva - disambiguation page
**http://eng.tango.info/work:la_viruta - work page - unique match
**https://tango.info/work.la_viruta - work page - unique match
**Un jardín de ilusión -> http://eng.tango.info/work:un_jardin_de_ilusion
**Un jardín de ilusión -> https://tango.info/work.un_jardin_de_ilusion
**La puñalada -> http://eng.tango.info/work:la_punalada
**La puñalada -> https://tango.info/work.la_punalada
**Qué falta que me hacés! -> http://eng.tango.info/work:que_falta_que_me_haces
**Qué falta que me hacés! -> https://tango.info/work.que_falta_que_me_haces


===External references===
===External references===
* Most common non-ASCII 8-bit characters: http://www.microsoft.com/GLOBALDEV/Reference/sbcs/1252.mspx
* Most common non-ASCII 8-bit characters: http://www.microsoft.com/GLOBALDEV/Reference/sbcs/1252.mspx
==See also==
* [[TINT based filing]]

Latest revision as of 2013-06-09T23:23:18

Non-ASCII characters

The Issue

Non-ASCII characters in the pathnames (foldernames and file names) of music library tracks can cause problems due to incompatiblities with:

  • Macintosh
  • Portable music players e.g. Rockbox. These chars (perhaps only Unicode ones) can prevent play of the file and of subsequent files.
  • FTP clients and servers (where Unicode characters often appear as a garbage character pairs)
  • Some utility programs e.g. FLAC encoder, Beyond Compare 2
  • File-sharing programs e.g. SoulSeek v157 (also fails to preserve upper-case, converting to lower-case)
  • Librarian programs e.g.

Some of these cases may be confined to characters outside the Extended ASCII set.

Workarounds

Replace non-ASCII characters with e.g. ASCII stand-ins:

  1. accented -> non-accented
  2. left and right quotes -> upright quote
  3. character with no ASCII lookalike -> underscore

Such replacements can be performed by Mp3tag:

  1. Action Group containing a Replace action for each known character
  2. Ditto
  3. $ansi() (which reduces to Extended ASCII) followed by $regexp() to replace all remaining non-ASCII chars with a lookalike or substitute, e.g. underscore. $ansi() uses ? as a substitute, so for Windows-compatible pathnames (in which ? is disallowed), handle this either in the $regexp() or together with all other pathname-invalid characters by using $validate() .

further restrictions

One could also like to avoid urlencoding in urls. Or avoid that case has a meaning.

  • restrictions further as ascii could be:
    • prefer usage of only A-Za-z0-9_
      • or only a-z_
    • . for fileextensions, otherwise undecided, probably avoid
    • ' not decided yet, if removed, how to write D'Arienzo?
    • , avoid, at least in work titles
    • []() maybe reserve for special meaning
    • space always replaced with _

work_name_az

There can be lot of discussion whether a title should include ",!?." or such special chars like the spanish leading question mark. There can also be discussion about correct case (upper/lower). The a-z-title is a title that can be derived from lot of different original opinions about correctness. For filenames and references in databases it could be helpfull to have a-z worktitles.

  • remove all diacritics
  • make all characters lower case
  • replace space, comma, dot, with _
  • reduce repeating _ to single _

External references

See also