Search results
Results from the WOW.Com Content Network
TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this not UTF-8...it is a Unicode string in Python 3.X. TestText2 = TestText.encode('utf8') # this is a UTF-8-encoded byte string. To send UTF-8 to stdout regardless of the console's encoding, use the its buffer interface, which accepts bytes: import sys.
On Linux, in case of sudo, you can try to do pass the -E argument to export the user variables to the sudo process: export PYTHONIOENCODING=utf8. sudo -E python yourprogram.py. If you try this and it did no work, you will need to enter on a sudo shell: sudo /bin/bash. PYTHONIOENCODING=utf8 yourprogram.
A nice alternative to @mark's answer is to set the environment variable PYTHONIOENCODING=UTF-8. c.f. Writing unicode strings via sys.stdout in Python. (Make sure to set it prior to starting Python not in the script.)
To include Unicode characters in your Python source code, you can use Unicode escape characters in the form \u0123 in your string. In Python 2.x, you also need to prefix the string literal with 'u'. Here's an example running in the Python 2.x interactive console: >>> print u'\u0420\u043e\u0441\u0441\u0438\u044f'. Россия.
# Python 2.x >>> print 'Capit\\xc3\\xa1n\n'.decode('string_escape') Capitán The result is a str that is encoded in UTF-8 where the accented character is represented by the two bytes that were written \\xc3\\xa1 in the original string. To get a unicode result, decode again with UTF-8.
It creates Unicode string in Python 3 (good) but it is a bytestring in Python 2 (bad). Either add from __future__ import unicode_literals at the top or use u'' prefix. Don't use non-ascii characters in bytes literals. To get utf-8 bytes, you could utf8bytes = unicode_text.encode('utf-8') later if it is necessary. – jfs.
What I'm trying to do is print utf-8 card symbols (♠,♥,♦,♣) from a python module to a windows console. UTF-8 is a byte encoding of Unicode characters. ♠♥♦♣ are Unicode characters which can be reproduced in a variety of encodings and UTF-8 is one of those encodings—as a UTF, UTF-8 can reproduce any Unicode character. But there ...
I have been trying for hours to solve this UTF-8 issue in Python 2.7.6. I have a list of string with UTF-8 characters, like this: findings=['Quimica Geral e Tecnol\xf3gica I', 'Quimica Geral e Tecnol\xf3gica II', '\xc1lgebra Linear'] I am trying to print the strings: for finding in findings: print finding The output is:
Try using this one; this function will ignore all the non-character sets (like UTF-8) binaries and return a clean string. It is tested for Python 3.6 and above. def bin2str(text, encoding = 'utf-8'): """Converts a binary to Unicode string by removing all non Unicode char. text: binary string to work on.
Try writing the Unicode string for the byte order mark (i.e. Unicode U+FEFF) directly, so that the file just encodes that as UTF-8: import codecs. file = codecs.open("lol", "w", "utf-8") file.write(u'\ufeff') file.close() (That seems to give the right answer - a file with bytes EF BB BF.)