11 Aug 2016

Computer Science: Unicode


A very good day I bid to the readers of my blog. Continuing my latest blog theme, which is on Computer Science, I will be sharing to you today about Unicode.

Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. Developed in conjunction with the Universal Coded Character Set (UCS) standard and published as The Unicode Standard, the latest version of Unicode contains a list of more than 128,000 characters covering 135 modern and historic scripts, as well as multiple symbol sets.

The standard consists of a set of code charts for visual reference, an encoding method and set of standard character encodings, a set of reference data files, and a number of related items, such as character properties, rules for normalization, decomposition, collation, rendering, and bidirectional display order. According to Wikipedia, as of June 2016, the most recent version is Unicode 9.0. The standard is maintained by the Unicode Consortium (the main image of this blog post is the logo of Unicode Consortium).

Unicode can be implemented by different character encodings. One of the most commonly used encodings is the now-obsolete UCS-2. UCS-2 uses a 16-bit code unit for each character but cannot encode every character in the current Unicode standard. UTF-16 extends UCS-2, using one 16-bit unit for the characters that were representable in UCS-2 and two 16-bit units to handle each of the additional characters.

Unicode is developed in conjunction with the International Organization for Standardization and shares the character list with ISO/IEC 10646: the Universal Character Set. Unicode and ISO/IEC 10646 function equally as character encodings, but The Unicode Standard contains much more information for implementers. The Unicode Standard specifies a multitude of character properties, including those needed for supporting bidirectional text.

The Consortium first published The Unicode Standard in 1991 and continues to develop standards based on that original work. The latest version of the standard, Unicode 9.0, was released in June 2016 and is available from the consortium's website. The last of the major versions to be published in book form was Unicode 5.0, but since Unicode 6.0 the full text of the standard is no longer being published in book form.

Unicode covers almost all scripts in current use today. A total of 135 scripts are included in the latest version of Unicode, although there are still scripts that are not yet encoded, particularly those mainly used in historical, liturgical, and academic contexts. Further additions of characters to the already encoded scripts, as well as symbols, in particular for mathematics and music, also occur.

I would like to further type my post but time seems to not allow me to do so. The commenting section is always open for criticisms to be sent in, so do share with me your opinion on my latest post. Thank you for reading and enjoy your day!

1 comment:

  1. Em nampak gaya boleh dapat A+ sc comp ni hahaha inshaAllah aamiin