UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set, but unlike them it has the special property of being backwards-compatible with ASCII. For this reason, it is steadily becoming the dominant character encoding for files, e-mail, web pages, and software that manipulates textual information.
The UTF-8 encoding is variable-width, with each character represented by 1 to 4 bytes. Each byte has 0-4 leading consecutive '1' bits followed by a '0' bit to indicate its type. Its bit pattern is defined as follows:
Length | Bit pattern |
1 | 0xxxxxxx |
2 | 110yyyyx 10xxxxxx |
3 | 1110yyyy 10yxxxxx 10xxxxxx |
4 | 11110yyy 10yyxxxx 10xxxxxx 10xxxxxx |
where x, y may be 0 or 1, and at least a 1 in the y's.
Here's the problem:
Given a sequence of bytes with some bits missing, find how many valid UTF-8 strings can be constructed by the input sequence.
The input consists of multiple test cases.
For each test case, the first line contains an integer N (1 <= N <= 1000).
Followed by N lines of the form bbbbbbbb, each b could be 0/1/x, where x denotes a missing bit.
Print the number of possible combinations modulo 1,000,000.
1 xxxxxxxx 3 11100000 10x00000 10111111 3 11100000 10000000 10111111 4 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx
128 1 0 778240
Migrated from old NTUJ.
No. | Testdata Range | Score |
---|