Google Suggest is a typical feature of the Google search box. It guesses what you're typing and offers suggestions in real time(Fig 1). Sometimes, we may mistype the keyword. For example, when we want to find the information of "mcgrady"(a famouse NBA star), we mistyped it to "macgrady"(Fig 2). The clever Google can still give you the right suggestions. It is amazing, isn't it?
![]() |
![]() |
Fig 1:Google Suggest | Fig 2:Fuzzy Google Suggest |
In this problem, your job is to implement a similar system. You will be given a data set including a large amount of English words. Your system should provide the search service. For each input query word, find the prefix-fuzzy-match words from the data set and ouput the number of those words. In the following, I will explain you the prefix fuzzy match. When we say the two words are fuzzy-match(without "prefix"), we mean that they look similar and their Edit Distance meets a given threshold(If you have no idea of the Edit Distance, please refer to the Hints below). When we say the word A is prefix-fuzzy-match(with "prefix") to the word B, we mean that there exists a prefix p of word B , and the Edit Distance between p and B meets a given threshold. For Example, the data set contains four English words: "content"、"common"、"onganize"、"case". The input query contains a word "con" and a Edit Distance threshold 1. "con" is prefix-fuzzy-match to the words "content"、"common"、"onganize", but "case" doesn't meet the condition, because the minimun Edit Distance between all of its prefixes and "con" is 2 which is larger than the given threshold 1.
The input file consists of two parts. The first part is a data set with n (1 <= n <= 300,000) English words. The second part is the m (1 <= m <= 300) input queries. Each query has a word and a Edit Distance threshold edth( 0<= edth <= 2), separated by a space. The words in the data set and the queries contain at most 10 small letters(a-z).
For each input query, output the number of words from data set that the query word is prefix-fuzzy-match to.
4 content common onganize case 7 c 0 con 0 con 2 con 1 com 1 comm 2 cog 1
3 1 4 3 2 2 2
Migrated from old NTUJ.
2009Harbin
No. | Testdata Range | Score |
---|