"Wget escapes the character ‘/’ and the control characters in the ranges 0–31 and 128–159. This is the default on Unix-like operating systems."
I just downloaded 50000 files where Firefox shows the correct filenames, but wget produces gibberish. For example, the filename 诘棋总动员 is saved as ??%98??%8B?%80??%8A??%91%98 where the former is Unicode U+8bd8 U+68cb U+603b U+52a8 U+5458, that is, UTF-8 e8 af 98 e6 a3 8b e6 80 bb e5 8a a8 e5 91 98, but the saved filename consists of the hex bytes e8 af 25 39 38 e6 a3 25 38 42 e6 25 38 30 bb e5 25 38 41 a8 e5 25 39 31 25 39 38. We see that the hex values 80, 8a, 8b, 91, 98 have become 25 38 30, 25 38 41, 25 38 42, 25 39 31, 25 39 38. Ach.
The question marks are because the resulting characters are not valid UTF-8, and the resulting filenames cannot be used on this system.
will produce junk if URL uses UTF-8 filenames and the local system is UTF-8 as well (as is rather common nowadays), and things are better withwget -r -np -nc URL
(Reminds me of the old days, where ftp would by default destroy the files copied, and one had to say BINARY to get them undamaged.)wget -r -np -nc --restrict-file-names=nocontrol URL
This --restrict-file-names=nocontrol is a misnomer: it is not so bad when control characters are escaped. They may well cause problems. But on a UTF-8 system the values 128-159, that is 0x80-0x9F, are not control characters but parts of ordinary symbols, and escaping them is a bad idea.