| Testing of Multi-Language Applications |
|
|
|
| Monday, 17 October 2011 15:58 | ||||||||||||
|
Testing of Multi-Language Applications Theoretical fundamentals of modern encodings, process of interaction of code pages inside the OS, specifics of testing and creation of test data are described in this article.
Written by: Oleg Chuta, Junior Tester of Network Testing Team Contents 2. History of Development from ASCII to Unicode 3. Data Encoding in Local OS and WEB 4. Most Common Problems in Testing 5. Program Tools for Creation of Test Data IntroductionWhen creating first computers and first disk management systems (OS parents), nobody thought of multilingual applications. The main tasks of that period were as follows:
Emergence of personal computers and portable equipment such as laptops, smartphones, pads as well as the increasing information transfer speed and user need in the instant access to this information, influenced the requirements for program systems. A modern user wants to scan mail, news quickly and in his native language and maybe even in its dialect. The main requirements to the software are as follows:
World software and PC manufacturers support these tendencies. But such attempt to satisfy the increasing user needs leads to some mistakes. Incorrect conformance between Windows code pages, old ASCII text or surrogate Unicode pairs created a large field for errors, incorrect correspondence of symbols, “mojibake” from the Internet. As a result, we often see a non-readable text in the software interface or on the Internet pages. As it is known, to detect errors in proper time, testing is performed. We will examine how to test an application to detect errors with symbols encoding. History of Development from ASCII to UnicodeInformation encoding, national encodings Encoding, or code table, is a table of correspondence between symbol graphic representation and its number, a binary code combination. Table 1. Examples of ASCII table code combinations
Depending on the number of symbols we want to encode, we select the size of the code combination. First encodings included up to 7 bits. In the 0000 0000 – 0111 1111 range, we can encode 128 symbols (in the hexadecimal system, range is 00 - 7F). It will be sufficient for big and small letters of English alphabet, punctuation symbols, and non-symbolic operations. ASCII was firstly designed as a 7-bit encoding, it included 128 symbols. The 8th bit could be used as a parity bit. Parity bit is a check bit and it serves for checking the general parity of a binary number. It was widely used for sequential data transfer and served as a marker of distortion in the transferred combination. Thus, we could detect one-bit bugs. Example: The source combination 0101010 is to receive the parity bit of the current combination, let’s combine all bits by a logical operation Exclusive OR. As a result, we receive as follows: 0+1+0+1+0+1+0 = 1 (as we can see, «1» corresponds to odd number of units in the combination and «0» corresponds to even one) Let’s add a parity bit to the source combination: 0101 0101 After the transfer, check of the parity bit is performed. For example, there was a one-bit bug: 0101 0101 -> 1101 0101 The receiver defines it by comparing the source and the expected parity bit: Source = 1 Expected 1+1+0+1+0+1+0 = 0 Later, more complex mathematical algorithms were used, where the Checksum is received instead of the parity bit. Together with standard expansion, a necessity of encoding for the specific national symbols appeared. Due to this, the 8-bit representation of ASCII table, where all 256 symbols were filled, appeared. And the high-order codes 1000 0000 – 1111 1111 (hexadecimal range 80 – FF) corresponded to the national alphabet and additional signs. Special tables were created for each script. In its turn, computer manufacturers added new code tables to support. Many code tables are similar to books by their structure where a certain code page is used for encoding of different languages. The historical name “code page” was firstly assigned to changeable tables of national symbols that correspond to high-order codes of ASCII table. Now, terms “code page” and “encoding” are equal. KOI-8 encoding (code of information exchange) was developed by Russian developers that included both English and Russian symbols. KOI-8 is fully compatible with ASCII by the first part of code combinations 00 – 7F. The high-order part of combinations and Russian symbols are located not in the alphabetical order but according to English letters consonant with it. That is why when losing the high-order bit, we receive «Русский Текст» turned to «rUSSKIJ tEKST». KOI-8-R (original encoding) became an international standard; also, variants for Ukrainian KOI-8-U, for Kazakh, Armenian, and other languages were added. For decades of development, a number of EBCDIC, GB2312, SJIS, RADIX-50 encodings and a number of code tables (CodePages) from Windows were created and introduced. Most of these encodings were created by specific requirements of hardware implementation or data transfer environment. The complete article text is available only for the registered users. Please Log In or Register. |






