To be more precise, the combining classes for the combining diacritics U+0314, U+0301, U+0345 in this example are 230, 230, 240, respectively. These classes tend to indicate the position of the diacritic (above, below, ...) and it is assumed that diacritics in different positions can be ordered arbitrarily, while the order of diacritics in the same position is significant. Thus, U+03B1 U+0314 U+0301 U+0345 and U+03B1 U+0314 U+0345 U+0301 and U+03B1 U+0345 U+0314 U+0301 are equivalent, but U+03B1 U+0314 U+0301 U+0345 and U+03B1 U+0301 U+0314 U+0345 are not. The latter is equivalent to U+1FB4 U+0314 (ᾴ̔).
The question is not reasonable, and points to a misunderstanding of Unicode. This misunderstanding has spawned a number of myths and led to debates such as the above.This answer says that there is no semantic difference between the two representations, that they encode the same symbols. That is true, and 666 and DCLXVI encode the same numbers, but I surely hope that no software will silently change one into the other. Changing data is always a very bad idea. It leads to data loss, even when you think both forms are equivalent.
In fact, Unicode declares that there is an equivalence relationship between decomposed and composed sequences, and conformant software should not treat canonically equivalent sequences, whether composed or decomposed or something inbetween, as different.
Unicode further states that software is free to change the character stream from one representation to another, i.e., to decompose, compose and/or re-order, whenever it wants. The only requirement is that the resultant string is canonically equivalent to the original.
So the original question is not reasonable: A team might think they can choose to use NFD (decomposed) for their data, but software just might change the data — and it doesn't even have to say it is doing this, because (by definition) this does not change the meaning of the encoded data in any way.
It is inappropriate to speak of standardizing on one particular representation such as NFD or NFC except in the context of a specific text process or data interchange format.
In the same way that searching or spellchecking may be simpler if the data is normalized first, it may be that keyboard design, or font design, or other user interface elements may be easier to implement if, for that specific process, a particular normal form can be assumed. But this does not imply that the data must always be maintained in that form; it may be transparently transformed to other equivalent representations for other purposes.
The Wikipedia page Unicode normalisation says today:
In one specific instance, the combination of OS X errors handling composed characters, and the samba file- and printer-sharing software (which replaces decomposed letters with composed ones when copying file names), has led to confusing and data-destroying interoperability problems. Applications may avoid such errors by preserving input code points, and only normalizing them to the application's preferred normal form for internal use.
The TEI-P5-Guidelines say
The Unicode Consortium provides four standard normalization forms, of which the Normalization Form C (NFC) seems to be most appropriate for text encoding projects.
The above-cited SIL page recommends
Output data that may become input to unknown processes in NFC.
If you have an option, archive in XML/NFC.
The Linux Unicode FAQ says
NFC is the preferred form for Linux and WWW.
The WWW Character Model says
NFC has the advantage that almost all legacy data (if transcoded trivially, one-to-one, to a Unicode encoding) as well as data created by current software is already in this form; NFC also has a slight compactness advantage and a better match to user expectations with respect to the character vs. grapheme issue. This document therefore chooses NFC as the base for Web-related early normalization.
The WWW argument here seems to be that there is a lot of legacy ISO-8859-1 text, and conversion to Unicode is easiest if these ISO-8859-1 values remain single units and are not decomposed. Note that WWW only hopes, but does not require:
In 2004/2005, the Internationalization Working Group decided that early uniform normalization was dead and that requiring normalization of content (such that applications could assume that content was already normalized) was no longer a reasonable position for Charmod. ... HTML5 does not require NFC. (May 2011)
There is another argument in favor of NFC: it is easier to create and use fonts with precomposed characters. Shaping code is needed for many non-Western applications, but precomposed characters suffice in the West. If nobody normalized this page, you will probably see differences in the rendering of this Greek accented alpha above.
This is somewhat similar to the relation between Chinese/Japanese characters and Latin letters. Whether the code for a word is a single indivisible unit or a sequence that codes information about the elements does not matter as long as one only copies. But as soon as one uses the information in some way (what is the radical? how many strokes? how black is the character? what syllables are there? how can this word be hyphenated?) the monolithic code requires big tables that may not even be available, and the structured code is much easier to use.
For web use where fuzzy search is important, I find that NFD is an order of magnitude faster than NFC, and avoids the need for tables.
NFC is preferred, because the W3C recommends the use of NFC normalized text on the Web.
NFC has the highest probability to work in existing programs. For example, still in 2012, KDE's terminal emulator (konsole) drops accents of Latin characters when it receives them in decomposed form:
$ /usr/bin/printf '\u00D6\n' Ö $ /usr/bin/printf 'O\u0308\n' OFor processing of European languages, it allows for simpler software. Without the recommendation for NFC, the adoption of Unicode would have been slower.
- Ires(XtNcombiningChars, XtCCombiningChars, screen.max_combining, 2), + Ires(XtNcombiningChars, XtCCombiningChars, screen.max_combining, 3),since I have encountered cases with 3 accents, but not yet with 4.
Of course filenames occur in files, and in URLs, in Makefiles and HTML web pages. Changing data, e.g. in text files, to NFC would cause interoperability problems. Always leave data as it is.